Overview

This page shows a brief overview of (on-line) databases including a short description about the kind of information they contain.

The Language Typology Resource Center lists eight on-line databases and a few others. It is a European collaboration which not only collects data but also wants to stimulate the exchange of knowledge and experience within the linguistic community and the development of standards for databases. On the website of the LTCR you can find typological databases, corpora and tools needed for building them. For every database there is either a direct link to the database or a query form available. We will discuss some of the databases here plus some that are not listed on the LTRC site.

The Surrey Database of Agreement from the Surrey Morphology Group includes information from 15 languages. The following citation is included in their Theoretical Assumptions and gives a good first impression what agreement actually is about:

"The term agreement commonly refers to some systematic covariance between a semantic or formal property of one element and a formal property of another. For example, adjectives may take some formal indication of the number and gender of the noun they modify." (Steele 1978: 610)

The site also provides in documentation about how to make queries (including short descriptions of the most important linguistic expressions) and how to interpret the resulting information.
The Surrey Morphology Group has built four other databases that are on-line browsable:

Most of the Surrey databases have the same sort of interface. The first two are of course highly related to each other; they both deal with syncretism although the second focuses on person syncretism. Syncretism occurs when two or more functions are represented by the same morphological form. For example the lopen ('walk') is the inflectional form for 1st, 2nd and 3rd person plural. Next to person syncretism you can search the 'all type' database for number, case, gender, tense, mood, voice, aspect and definiteness. Syntax, semantics and word class can be considered to play a role as well.
If you visit the homepage of the Suppletion Database, you will immediately encounter a clear definition of 'suppletion' which makes it rather redundant to repeat it here. The Deponency Database categorizes "mismatches between the apparent morphosyntactic value of a morphological form and its actual value in a given syntactic context."

Ethnologue (SIL) covers more than 6,000 languages. In the Introduction a short historical description shows the start and development of the catalogue. In the catalogue itself you find information about languages, dialects, language families, language regions, populations including their statistics. Also geological, ecological, religious background information and more.. The information can be found either by clicking a region an the map or by browsing the database.

The Universals Archive of the University of Konstanz collects the universal 'rules' for language. These are extracted from linguistic literature and have often the form: "If language L has property X, it must also have property Y." Of each universal is listed where it can be found in the literature; if there are any, counterexamples are given, as well as explanatory comments. The laws do not have names, as you often see in physics and math, but a number and up till now there are 2034 of them listed in the archive. A brief guide maybe of help browsing/searching and interpreting the data.
Wherever there are laws and regularities, there is rule breaking and exceptional behaviour. Das Grammatische Rarit├Ątenkabinett is created to store the rarities of human language. The database has almost the same structure ast The Universal Archive but has in addition recorded in which language(s) a rarity has been found and, if so, which universal is violated. The 144 listed phenomena are being typed as rarum (rare), rarissimum (very rare) or singularium (unique).

The Anaphora typology database is a database under construction but already worth to take a look at. It contains data about the binding properties of reflexives and reciprocals, but just for three languages yet (Korean, Peranakan Javanese and Sakha (Yakut)). You can search by language, local coreference strategy or example sentence.

The Graz Database on Reduplication provides data for languages all over the world. Reduplication is a process where one or more syllables are repeated in order to make a longer word. For example, a child learning French may say "jo-jo" which stands for "joli" ('pretty').

Database management

Most of the databases we have seen so far have different interfaces and different ways to search or browse the data; four of the databases from the Surrey Morphology Group form an exception. Each newly created database is designed for a specific purpose and for specific collections of linguistic data. The result is a hotcpotch of database for each of which you have to go through the documentation and spend some time to make yourself familiar with the type of query you can do and with the resulting output. The Typological Database System (TDS) is an initiative of the Netherlands Graduate School of Linguistics (LOT) that aims make variety of resources accessible via a single interface. At the moment the user can query the data from 6 databases and in the future six more are planned to be included. One can find data from minimally 140 and maximally 410 languages (depends from selected topic). Unfortunately the database is still in progress, the project runs at least until December 2007, and does not work fluently yet. The user should be very patient and persistent.

(Back to top)