Morphological analysis of the text

Lecture



Morphological and syntactic analysis of texts as a finite state machine implemented by a semantic neural network having a synchronized linear tree structure

The method of extracting knowledge from a natural language text and building a response in a natural language using a semantic network, represented in the form of a semantic tree, is considered.

Abstracts (short review)

Understanding the computational system of the meaning of the text of a natural language is a task of analysis. The analysis of the text is carried out by several successive operations: morphological analysis, syntactic analysis and semantic analysis. The task of parsing the text of a natural language is the definition of all the syntactic features of these words necessary for semantic parsing. To solve problems of morphological and syntactic analysis of the text, as well as the problems of the analysis of inflection, we apply a semantic neural network that is similar in properties to the formal McCulloch-Pitts neural network.

In the subnet of extracting meaning from the text, a separate neuron denotes an elementary concept corresponding to the processing stage to which this neural network sublayer belongs. Elementary concepts are any concepts of a natural language with a complete meaning, such as a symbol, syllable, word, phrase, sentence, paragraph, all text. Different stages of processing correspond to different levels of aggregation of elementary concepts, for example: symbol, syllable, word, phrase.

As a structure of the semantic neural network that performs morphological and syntactic analysis, we use a synchronized linear tree. Linear tree consists of sublayers. Each synchronized sublayer corresponds to the wavefront of processing. The neurons of the first sublayer correspond to the first letter of the word, the second to the second, and so on. The total number of sublayers is equal to the maximum number of letters in one word. The first sublayer consists of neurons that recognize the first letter, the second layer consists of neurons that recognize the first two letters, the third - the first three letters.

Each neuron has one input connection with the neuron from the previous sublayer corresponding to the previous letter of the word, and one input connection with the neuron from the receptor layer corresponding to the current letter. Each neuron can have an output connection with an unlimited number of neurons from the next processing sublayer. The classification functions are implemented using aggregating sublayers consisting of non-synchronized neurons. Aggregating sublayers of non-synchronized neurons that perform disjunction functions are placed between the sublayers of synchronized neurons that perform conjunction functions. The result is a multi-layer structure in which, after each sublayer of the wave front, there is an aggregation sublayer.

The number of neurons in the network is limited, and they have a finite number of states and connections, so the layer of extraction of meaning in the form of a synchronized linear tree can be considered as a finite state machine. The transition from one state to another occurs when the meaning of the next character of the input sequence is applied to the extraction layer. Let one dictionary entry be a group of neurons, or one neural subavtomat in a layer of extraction of meaning. In the case of the presence of ambiguity, in a synchronized linear tree all vocabulary entries and word forms, corresponding to all the individual meanings of the word form, are excited. Let the total number of substates of a dictionary entry be equal to the number of word forms of this article. Let each substate of such a sub-automaton be one excited neuron.

At the same time, in the case of simultaneous excitation of two different neurons of one subautomat, we will say that the subautomatic machine simultaneously has two different substate states. Each entry has a main neuron corresponding to this entry. The main neuron of a dictionary entry is excited whenever the word belonging to its dictionary entry is recognized. Each word form corresponds to a separate neuron. It is excited if the word form is recognized.

In the sense extraction layer, there are neurons that do not belong to individual dictionary entries. These neurons correspond to the features of word forms common to many dictionary entries, such as gender, case, number, time ... The set of excited neurons of a subautomat corresponds to the set of features belonging to a separate wordform recognized by the subautomat. The task of classifying or defining a dictionary entry and word form for a given symbol sequence is reduced to passing an excitation wave through the layer of extracting the meaning and exciting the corresponding subautomat for the corresponding dictionary entry. The task of inflection is reduced to a change in the state of such a subautomat from the initial state — the corresponding word form from which the inflection begins to the final state — the corresponding word form into which the original word form is to be converted.

Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Natural language processing

Terms: Natural language processing