Statistical techniques for natural language analysis

Lecture



Example

l The dog ate.

  Statistical techniques for natural language analysis

Problem

l Salespeople sold the dog biscuits.

  Statistical techniques for natural language analysis

The principle of choosing a part of speech

  Statistical techniques for natural language analysis

Efficiency

l Blunt - 90%

l Modern - 97%

l Man - 98%

Hidden Markov Models

  Statistical techniques for natural language analysis

Another approach (transformational tagging)

l Apply a dumb algorithm.

l There is a set of rules:

l Change the word tag X to tag Y, if the tag of the previous word - Z.

l Apply these rules a number of times.

l Work faster

l HMM vs. training TT training

(No starting base)

Treebank

l Build trees on the basis of the sentence, using the existing grammatical rules.

l Example:

(s (np (det the) (noun stranger))

(vp (verb ate)

(np (det the) (noun donut)

(pp (prep with) (np (det a) (noun fork)))))

Own Statistical Parser

l Check

l There are ready examples from Pen treebank l Compare with them

l Finding the rules to apply

l Assign probabilities to rules

l Finding the most likely

PCFG (Probabilistic contextfree grammars)

l sp → np vp

(1.0)

l vp → verb np

(0.8)

l vp → verb np np

(0.2)

l np → det noun

(0.5)

l np → noun

(0.3)

l np → det noun noun

(0.15)

l np → np np

(0.05)

We consider the probability of a built tree

  Statistical techniques for natural language analysis

Build your own PCFG. Simple option.

l Take ready Pen treebank

l Read all the trees from it l Read each tree

l Add every new rule.

l P (rule) = number of occurrences divided by total

Two state-of-the-art statistical parsers. Markov grammars

l Solve the problem of the existence of very rare rules.

l Idea - instead of storing rules, we consider the probabilities that, for example

lnp = prep + ...
  Statistical techniques for natural language analysis

Lexicalized parsing p ( s , )  p ( h ( c ) m ( c ), t ( c ))  p ( r ( c ) h ( c ))

c

l Let us assign a word (head) to each vertex of the tree characterizing it.

l p (r | h) is the probability that the rule r will be applied for a node with a given h.

l p (h | m, t) is the probability that such h is a vertex child with head = m and has a tag t.

Lexicalized parsing

l Example

(S (NP The (ADJP most troublesome) report)

(VP may

(VP be

(NP (NP the August merchandise trade deficit)

(ADJP due (ADVP out) (NP tomorrow)))))

l p (h | m, t) = p (be | may, vp)

l p (r | h) = p (posvp → aux np | be)

Lexicalized parsing

l “the August merchandise trade deficit”

l rule = np → det propernoun noun noun noun

Conditioning events

p (“August”)

p (rule)

Nothing

2.7 * 10 ^ (- 4)

3.8 * 10 ^ (- 5)

Part of speech

2.8 * 10 ^ (- 3)

9.4 * 10 ^ (- 5)

h (c) = “deficit”

1.9 * 10 ^ (- 1)

6.3 * 10 ^ (- 3)


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Creating question and answer systems

Terms: Creating question and answer systems