Fixing a grammar

Given is the following editable Context-free grammar for a small fragment of English.

@

1S NP VP
2NP Det N
3NP N
4VP V NP
5VP V PP
6VP V
7PP P NP
8N 'hamsters'
9N 'wheels'
10N 'people'
11N 'ruins'
12N 'building'
13N 'children'
14N 'ice crem'
15N 'professor'
16N 'article'
17N 'men'
18N 'musketeers'
19N 'man'
20N 'king'
21V 'run'
22V 'like'
23V 'fell'
24V 'writes'
25V 'became'
26V 'hate'
27P 'in'
28Det 'all'
29Det 'many'
30Det 'some'
31Det 'every'
32Det 'each'
33Det 'an'
34Det 'three'
35Det 'one'
36Det 'the'

Given next are some sentences.

  1. All hamsters run in wheels.
  2. Many people like linguistics.
  3. Every buiding fell in ruins.
  4. Some children hate ice cream.
  5. Each professor writes an article.
  6. Three men became musketeers.
  7. One man became the king.

a.) Parse one (or more) of these sentences to convince yourself they are adequately covered by the grammar. You can feed a sentence to the parser by typing it in the field below.

Sample:
Parse as:
Show Earley parse process?

b.) Try parsing the ungrammatical sentence

  1. * Many people like.

Can you think of a way to have our grammar reject sentences with this kind of ungrammaticality?

Now take a look at these three sentences.

  1. * Every hamsters like musketeers.
  2. * Many man hate linguistics.
  3. * All children writes an article.

c.) See that you understand why these three sentences are ungrammatical. In what way do they share the same problem and in what way are the first two different from the third?

d.) Now try to parse each of the three sentences by typing them in the field of question a.) above.

As you can see we have a problem with our grammar. Somehow we need to put some restrictions on it in an elegant matter. What usually works when trying to restrict the strings a context-free grammar covers, but what is not considered elegant at all, is adding more rules to the grammar. Solving your problems by adding rules easily results in grammars that are twice as large or even worse. Apart from being unelegant a doubling of your grammar size will mean that a parser has to take twice as many rules into account, leading to substantial increases in parsing time. So lets try a different way of dealing with our problem.

A context-free gammar may be enhanced with so called features. With features a lexical item in the grammar can contain extra information which may be passed up into the grammar tree by a process known as unification. Since this is a place for practising your skills we will not go into theoretical details here and just demonstrate how this works.

You can add an attribute with its value to some category X by adding [ATTRIBUTE=value].
For example, the following grammar (Z being the top node), which allows both the strings 'blub blib' and 'blub blob':

1Z X Y
2Y 'blib'
3Y 'blob'
4X 'blub'

can be enhanced with features in the following way to allow only 'blub blob':

1Z[ATT=?x] X[ATT=?x] Y[ATT=?x]
2Y[ATT=q] 'blib'
3Y[ATT=p] 'blob'
4X[ATT=p] 'blub'

Here ?x is a variable which in theory can hold any value. In this very small grammar the attribute value of X and Y is passed on to Z through the unification process.

e.) Fix the problem of our three ungrammatical sentences by modifying our grammar to one that makes use of features. When you are done don't forget to test your sentences with the parser. You may also check if our original sentences are still parsable by this new feature grammar.

Finally, take a look at these additional sentences:

  1. Three/all/many/some of the men like hamsters.
  2. * Three/all/many/some of the men likes hamsters.
  3. * Three/all/many/some of the men like(s) hamsters.
  4. Each of the men likes hamsters.
  5. * Each of the men like hamsters.
  6. * Each of the man like(s) hamsters.
  7. * Every of the man/men like(s) hamsters.

f.) Once more, adjust the grammar so it deals with these sentences in the way one would like it to. Do this by using features and view the "of the x" parts as genetive case NPs in stead of PPs. Hint: you will need to use 2 attribute-value pairs, one for number and one for case.