Morphological Parsing

In exercise 1 you analyzed the morphological structure of a(n) (probably) unknown language by looking at some example sentences with their given translations. The same results can be achieved by making use of a software tool called a morphological analyzer or morphological parser. Just as we have parsers that can analyze the structure of a sentence, we can have a parser that analyzes the structure of words. Morphological parsers aren't as widespread as syntactic parsers, but one of the oldest that is still in use today is the PCKIMMO morphological analyzer originally developed by Kimmo Koskenniemi. Koskenniemi also worked out the theory that is underlying to PCKIMMO, namely two-level morphology, a form of Finite State technology. Explaining the idea of two-level morphology goes a bit too far here, but those with some knowledge of Finite State Automata and Transducers could turn to this online article by Karttunen for instance and should be able to understand the general idea. At Utrecht University, linguistics students could take the bachelor course "Taal- en Spraaktechnologie" to get the required background knowledge.

It will not surprise you to know that morphological parsers do not just do their job in a magical way. PCKIMMO requires both a lexicon and a rules file for each language it needs to analyze. A word that is not in the lexicon will not be analyzed by the analyzer. If a certain rule for correct spelling isn't included in the rules file, the analyzer will make errors. Writing dictionary and rules files in the two-level format takes some work, but it could be done by students as part of a course on Finite State Technology. This has been done before, as you can see here.

For now it will be enough just to use the morphological parser so you understand what it can do.

a.) Below is a small list with some Russian words. Feed them to the PCKIMMO parser by typing them in the field below. Now, just as in the Esperanto exercise, fill in the form to check your answers. You can ignore the feature settings in this case.

Language File
Show Features?
Show Tree?
Mode
Input

zakatu
Word Category
Root
Affix #1
Type of affix #1
Function of affix #1
Case
Number
prudy
Word Category
Root
Affix #1
Type of affix #1
Function of affix #1
Case
Number
viseli
Word Category
Root
Affix #1
Type of affix #1
Function of affix #1
Affix #2
Type of affix #2
Function of affix #2
Number
Tense
porogy
Word Category
Root
Affix #1
Type of affix #1
Function of affix #1
Case
Number
lezu
Word Category
Root
Affix #1
Type of affix #1
Function of affix #1
Person
Number
Tense

b.) Think about what method you prefer: analyzing the language yourself or using a software tool like PCKIMMO. What's the plus side to using language technology such as this? What could be a downside?

The russian files used in this exercise are quite small and cover only a small part of the Russian language. If you want to play some more with PCKIMMO you may take a look at the KIMMO playground page, where you can enter English words and also use the tree feature of the analyzer