10: Automated programming: from table to program

Lecture



Analysis of the state of affairs

The construction of tables completes the specification stage of our program. Tables 9.1 and 9.2 are another formalized representation of figures 9.6 and 9.7. As always, different formalisms differ in practical orientation. In some cases, the graph can be automatically converted into a prototype of the program (try to do this yourself with the UML specification), but the resulting programs always require manual refinement. The same table representation can be considered as a mini programming language. It is possible to build a translator or interpreter that has already been used in private table types for a long time (binding to specific subject areas and applications has prevented their widespread use, since professional system programmers have looked down on such simple tasks).

Let us turn to the semantics of calculations in the language of tables. The context of a program in the C ++ language (or in another language of the traditional type) is considered. In this context, the value of the variable Entry is defined (the set of its values ​​coincides with the set of state names), then the following actions are performed:

  1. Select an entry in the table that corresponds to the Entry value (current state).
  2. If the Condition is absent, then go to step 4.
  3. Calculate together all the conditions for triggering transitions (cells of the second column for this input):
    • If the results include True values, then select one of the lines whose Condition is true and go to step 4 for the selected line 1 ;
    • If all the values ​​of the Conditions are false and there is a failure line, then select this line and go to step 4 for it;
    • If all the values ​​of the Conditions are false and there is no failure , then complete the calculations.
  4. Execute Action.
  5. As the value of the variable Entry, set the name of the successor state (from the fourth column of the table). Go to step 1.

For the example of the specified semantics of calculations considered now it is not enough. In many cases, it is useful to expand the expressive capabilities of tables by adding actions that are performed at the beginning and at the end of processing the current state. This is achieved, for example, with the help of special service words:

  • start - an indication of the actions that are performed before the main actions and checks of the transition conditions (here such an action is reading the next character from the file);
  • finish — specifying actions that are performed after basic actions and checks, but before the transition.

You can also define local data for states (in the example, such data is defined only for the initial state), but this must be consistent with the rules for localizing the names of the programming language and the general style in which the program is written. Note that the local data of all states of the state machine must be placed in the general context, and assigning them to specific states is a visibility constraint, similar to that used in modular programming (if you have already programmed in Object Pascal Delphi, Java or Oberon, then with modularity). This follows pragmatically from the assumption that the operation of a theoretical finite-state machine does not require the involvement of memory 2 .

We now take care of how to program the required actions when it is not possible to use the table language directly. With tables representing a conversion table, you can do the following.

  1. Transmit manually.
  2. Give the preprocessor to become a normal program.
  3. Use the interpreter.

Consider these three possibilities in sequence.

Manual translation of conversion tables

In solutions 1 and 2, the following question arises: what to translate the transition table ? The options are many, below are just a few of them.

Option 1. It is possible to consider St1 and St2 as functions that implement all the actions of the states. When using the right style and tools, this approach gives an excellent result. Consider, in particular, how our task can be implemented (more precisely, the automaton of table 9.1) in Refal language.

  ENTRY Go {= <Open 'r' 1 'input.txt'> <Open 'w' 2 'output.txt'> <Init <Get 1 >>};
 Letters {= 'abcdefghijklmnopqrstuvwxyz';};
 Init {=;
     e.1 = <St1 e.1>;};
 St1 {
     s.1 e.3, <Letters>: eA s.1 eB = <St2 e.3 (s.1)>;
     = <Init <Get 1 >>;
     s.1 e.3 = <St1 e.3>;
     };
 St2 {(e.2) = <Outstr e.2> <Init <Get 1 >>;
     s.1 e.3 (e.2), <Letters>: eA s.1 eB = <St2 e.3 (e.2 s.1)>;
     s.1 e.3 (e.2) = <Outstr e.2> <St1 e.3>;
     };
 * St3 is not needed
 Outstr {
     e.2, <Lenw e.2>: {s.1 e.2 = <Putout 2 e.2 "-" <Symb s.1 >>;};
     };
 *
 * The second program, a little further from the immediate automaton model
 *
 $ ENTRY Go {= <Open 'r' 1 'input.txt'> <Open 'w' 2 'output.txt'> <Init <Get 1 >>};
 Letters {= 'abcdefghijklmnopqrstuvwxyz';};
 Init {=;
     e.1 = <Parse e.1 ()>;};
 Parse {
     s.1 e.3 (e.2), <Letters>: eA s.1 eB = <Parse e.3 (e.2 s.1)>;
     (e.2) = <Outstr e.2> <Init <Get 1 >>;
     s.1 e.3 (e.2) = <Outstr e.2> <Parse e.3 ()>;
     };
 Outstr {=;
     e.2, <Lenw e.2>: {s.1 e.2 = <Putout 2 e.2 "-" <Symb s.1 >>;};
     }; 
Listing 10.2.1. Word lengths: refal

These programs are so short and natural that they practically require no comments. The only new feature used here is, in principle, superfluous, but it makes the program more beautiful. <Letters> constructions: eA s.1 eB and e.2, <Lenw e.2>: are, respectively, a nested condition check on an expression generated by a <Letters> call, and a call to an anonymous function defined further on the expression that appears after the comma . The standard function <Lenw e.2> adds the first character to the expression its length (the number of terms in it). When checking and calculating the auxiliary function, the variables that have already received the values ​​do not change.

Thus, the transition control structure is in good agreement with the control structure in the concretization version of the sentence programming 3 . Note again that, although some descriptions of functions appear recursive, there is no recursion, since the original call is completed before the calls generated by it are activated. Each step of specification begins with the verification of conditions, so it is natural to compare the actions with transitions.

In this case, it was possible to achieve such a beautiful result because the actions also fully corresponded to the area where the use of the Refal language is most advantageous. But even in such a simple case, after minimal manual revision, the program has become even better.

Formally, the functional representation of states is possible both in the language of the traditional type, and in LISP and Prolog. But in this case we lose in the expressiveness and naturalness of the program, as well as on one more important criterion. In all these cases, what looks like a recursion really is, and recursion in this case only hinders.

Option 2 . Consider St1, St2, St3 as values ​​of some enumerated type State.

  // The implementation of the machine with table.  9.1
 #include <stdlib.h>
 #include <stdio.h>
 #include <time.h>

 char symbol;
 int cnt;

 enum States {St1, St2, St3} State;
 void main (void)
 {fstream a, b;
     a.open ("input.txt", ios :: in);
     b.open ("output.txt", ios :: out);
     State = St1;
     while (true)
     {symbol = a.get ();
         switch (State)
         {case St1: if ('a' <= symbol && symbol <= 'z')
             {b << symbol;
                 cnt = 1;
                 State = St2;
         }
         else if (symbol! = '\ n') {State = St1;}
         else if (symbol == '\ n') State = St3;
         break;
     case St2: if ('a' <= symbol && symbol <= 'z')
             {b << symbol;
             cnt ++;  // State = St2;
             }
         else if (symbol! = '\ n')
             {b << "-" << cnt << endl;
                 State = St1;
             }
     else if (symbol == '\ n')
         {b << "-" << cnt << endl;
             State = St3;
         };
         break;
     case St3: if ('a' <= symbol && symbol <= 'z')
             {b << symbol;
                 cnt = 1;
                 State = St2;
         }
         else if (symbol! = '\ n') {State = St1;}
         else if (symbol == '\ n')
             {a.close ();  b.close ();  return;};
     }}
 } 
Listing 10.2.2. Word lengths: explicit thread processing loop

Program 10.2.2 is good only if we confine ourselves to automaton models in a rather narrow sense. The program exactly corresponds to the tabular representation of the automaton, almost all duplication of actions associated with such a presentation remained. There is some dissatisfaction with the fact that an extra action has appeared: the choice of the executed fragment by the value of the State variable. If we confine ourselves to managing means of the structured programming methodology, then this is the best way out. Attempts to apply cycles within fragments labeled as case St1: and case St2 :, only lead to a decrease in visibility compared with the table view.

There is one more question that needs to be solved with this approach - the choice of the type of result of the function-states and the method of returning the result. The natural result of such a function is a new state into which the automaton falls upon the transition. It can be represented, for example, by a global variable (such as State in program 10.2.2, which is assigned a new value when the state function is executed). But it is preferable not to touch State in each of the functions, but to provide the task of the actual state change to their general context. The most correct implementation of states with this approach are functions that produce as a result a value of a type that lists all states.

The following program 10.2.3 almost literally implements this idea. Unlike program 10.2.2, the states type contains four values: St1, St2, St3 and Exit. The last of them does not correspond to any function due to the triviality of transitions and actions in this state.

  #include <stdlib.h>
 #include <stdio.h>
 #include <time.h>

 char symbol;
 int cnt;
 enum States {St1, St2, St3, Exit} State;

 inline States f_St1 ()
 {
     if ('a' <= symbol && symbol <= 'z')
         printf ("% c", symbol);
         cnt = 1;
         symbol = getchar ();  return St2;
     }
     else if (symbol! = '\ n') {
             symbol = getchar ();  return
         }
     else {symbol = getchar ();  return
 }

 inline States f_St2 ()
 {
     if ('a' <= symbol && symbol <= 'z')
         printf ("% c", symbol);
         cnt ++;
         symbol = getchar ();  return St2;
     }
     else if (symbol! = '\ n') {
         printf ("-% i \ n", cnt);
         symbol = getchar ();  return St1;
     }
     else {
         printf ("-% i \ n", cnt);
         symbol = getchar ();  return St3;
     }
 }

 inline States f_St3 ()
 {
     if ('a' <= symbol && symbol <= 'z') {
         printf ("% c", symbol);
         cnt = 1;
         symbol = getchar ();  return St2;
     }
     else if (symbol! = '\ n') {
             symbol = getchar ();  return St1;
         }
     else return Exit;
 }

 void main (void)
 {
     symbol = getchar ();
     State = St1;
     for (;;)
         {switch (State) {
             case St1: State = f_St1 ();
                 break;
             case St2: State = f_St2 ();
                 break;
             case St3: State = f_St3 ();
                 break;
             default: return;
         }
     }
 } 
Listing 10.2.3. Word lengths: use procedures to describe states.

It should be noted that the program is vivid, strictly corresponds to the table of the machine. We can say that the graph of the automaton determines the state-distributed flow processing scheme. In fact, it is much better suited for the computational model in states rather than transitions. Moreover, this program, with a small increase in the number of states, quickly loses visibility if programmed on transitions, since the following states are hidden inside the corresponding procedures, but will save it if programmed in states.

Thus, we came to the conclusion that in the framework of the standard programming methodology, all solutions have some drawbacks. It is necessary to look for another, and here it is worth remembering the proverb (actually reflecting one of the techniques of creative thinking): "The new is well forgotten old." Let's look for a solution among what is rejected by standard techniques.

After analyzing the graph of the automaton or table, you can see that in this example, along with the stream processing cycle, there is another cycle: the transfer of control between states. This is the reason why in the two previous versions the assignment of the State variable to the value and the selection of the executed fragment by this value appeared.

Feature sequences of actions

State = <value>;

switch (State)

each time at the end of actions associated with states (more precisely, implementations of actions in programs), in that at every point of the program where this assignment occurs, you can accurately specify the result of the dynamically following switching operator, and this information does not depend on the processed flow and can be determined statically before computing.

Is it possible to use this explicitly for organizing the management of the state machine? It is possible, this demonstrates the fourth solution to the problem under discussion.

Option 4 . The transition matrix and vector functions corresponding to the states 4

  program funcgoto;
 {$ APPTYPE CONSOLE} {$ T +}
 uses SysUtils;

 type P = procedure;
 type Pp = ^ P;
 const maxstate = 7;
 const maxcond = 3;
 type states = 1 .. maxstate;
 conds = 1 .. maxcond;
 type table = array [states, conds] of states;
 type act = array [states] of P;
 const gotos: table = ((2,2,2), (3,2,4), (3,5,6), (3,2,7), (3,2,4), (3,2 , 7), (1,1,1));
 var Symbol: char;
 var Cnt: integer;
 var Inf, Outf: text;
 var state: states;

 procedure start;
 begin
     Cnt: = 0;
     AssignFile (Inf, 'input.txt');
     Reset (Inf);
     AssignFile (Outf, 'output.txt');
     Rewrite (outf);
 end;

 procedure Finish;
 begin
     Closefile (Inf);
     Closefile (Outf);
     Abort;
 end;

 procedure St1;
 begin
     {No actions}
 end;

 procedure St2;
 begin
     write (outf, Symbol);
     Inc (Cnt);
 end;

 procedure St4;
 begin
     writeln (outf, '-', Cnt);
     Cnt: = 0;
 end;

 const actions: act = (Start, St1, St2, St1, St4, St4, Finish);

 begin
     state: = 1;
     while true do begin
         actions [state];
         if (state <> 1) and (state <> 7) then begin read (inf, Symbol);
         if Ord (Symbol) = 10 then read (inf, Symbol) end;
         if Symbol in ['a' .. 'z'] then state: = gotos [state, 1];
         if not (Symbol in ['a' .. 'z']) and (Ord (Symbol) <> 13)
             then state: = gotos [state, 2];
         if Ord (Symbol) = 13 then state: = gotos [state, 3];
     end;
 end. 
Listing 10.2.4. Word lengths: array of functions and transitions

This solution is not bad, but it is only suitable for state calculations. In fact, it sharply diverges from the canons of structured programming, where switching between functions within the array of functions was not even supposed. It is somewhat weighed down by the details associated with the development of special states of input flow input. But repeated actions are not duplicated.

Consider the last option.

Option 5 . Use static information about branching calculations.

  #include <stdlib.h>
 #include <stdio.h>
 #include <time.h>

 char symbol;
 int cnt;
 void main (void)
 {
     symbol = getchar ();
 St1: if ('a' <= symbol && symbol <= 'z') {
                 printf ("% c", symbol);
                 cnt = 1;
                 symbol = getchar ();  goto St2;
             }
             else if (symbol! = '\ n') {
                 symbol = getchar ();  goto St1;
         }
         else / * (symbol == '\ n') * / {symbol = getchar ();  goto St3;};
 St2: if ('a' <= symbol && symbol <= 'z') {
                 printf ("% c", symbol);
                 cnt ++;
                 symbol = getchar ();  goto St2;
             }
             else if (symbol! = '\ n') {
                 printf ("-% i \ n", cnt);
                 symbol = getchar ();  goto St1;
             }
             else {
                 printf ("-% i \ n", cnt);
                 symbol = getchar ();  goto St3;
             };
 St3: if ('a' <= symbol && symbol <= 'z') {
                 printf ("% c", symbol);
                 cnt = 1;
                 symbol = getchar ();  goto St2;
             }
             else if (symbol! = '\ n') {
                 symbol = getchar ();  goto St1;
             }
             else / * (symbol == '\ n') * / return;
 } 
Listing 10.2.5. Word lengths: states - labels in the program.

In this variant, the need for a calculated transition (the result of embedding static information in the program text) disappears, and, as a result, descriptions of the States type and the State variable of this type become redundant. Unconditional jump operators appear in the program, because of this, the program management structure is completely at odds with the canons of structured programming, which makes it possible for dogmatically thinking programmers and theorists to criticize this version of the program. But in this case, deviation from the canons of structured programming is fully justified, since due to the special arrangement of text fragments, the whole program turned out to be similar to the state machine table, and the structure of the control gear copies the state machine graph. Moreover, such a view is neutral with respect to the Moore and Mile models. Thus, only after a complete departure from the canons of structure, the program became adequate to its specification.

Automated conversion table conversion

If you automate the work with a tabular view, then first of all you need to strictly define the structure of the data representing the transition table. The way the table is set depends on the strategy for further processing. In particular, if a textual representation of a table is manually constructed, which is subsequently converted to an executable form, then, for example, such a representation is acceptable.

  _1 char symbol;  // Reading implicit character stream
            int cnt;
                .  .  .  // Initialize _4 St1 _5
 _1 St1 _2 'a' <= symbol && symbol <= 'z' _3 printf ("% c", symbol);
            cnt = 1;  _4 St2 _5
 _1 _2 // symbol <'a' && 'z' <symbol && symbol! = '\ N'
                                         _3
            // Since you only need to type the words,
            // no action _4 St1 _5
 _1 _2 failure _3
            // symbol == '\ n' no need to read _4 St3 _5
 _1 St2 _2 'a' <= symbol && symbol <= 'z' _3 printf ("% c", symbol);
                                            cnt ++;  _4 St2 _5
 _1 _2 // (symbol <'a' || 'z' <symbol) && * / symbol! = '\ N'
                                         _3 printf ("-% i \ n", Cnt);  _4 St1 _5
 _1 _2 failure _3 printf ("-% i \ n", Cnt);  _4 St3 _5
 _1 St3 _2 'a' <= symbol && symbol <= 'z' _3 printf ("% c", symbol);
                                            cnt = 1;  _4 St2 _5
 _1 _2 // symbol <'a' && 'z' <symbol && symbol! = '\ N'
                                         _3
            // Since you only need to type the words,
            // no action _4 St1 _5
 _1 _2 failure _3
            // symbol == '\ n' no need to read _4 exit_5
 _1 exit_2 // condition is true, but without implicit reading of the stream
                                         _3 return 0;  _4 _5 
Listing 10.3.1. The program in the form of marked up text.

Here <_i>, i = 1,. . . , 5 denote table positions. These combinations of characters, not found anywhere in the normal program, can be easily recognized by the preprocessor. The sequences of symbols placed between them are distributed along the corresponding fields of the desired structure, which, depending on the chosen strategy, is interpreted or translated. With the help of these combinations, the text is marked up, which makes it possible to recognize and interpret its fragments. It is worth paying attention to the fact that due to the special mutual arrangement of symbols in this text, the table of the automaton represented by it is quite observable. If there is no more suitable representation, then this structure can be recommended for input.

However, the immediate interpretation of the universal text-based markup is rather difficult. Preferably, the units of interpretation would be the table fields themselves. In general, this is contradicted by the presence of fields in the table, the values ​​of which are fragments of executable code in a programming language (such a record is convenient for a person, since one does not have to resort to any additional agreements to fill in the table). In fact, a contradiction arises only for the action field, since the sequences of characters between <_2> and <_3> have clearly expressed semantics: this is a test of the condition. If we always consider conditions as a test of what the input symbol is equal to, then it is quite understandable, special designations are easily defined, recognized and interpreted: enumeration of values, their ranges, etc. The interpretation of these designations does not cause complications, and therefore similar coding techniques are used quite often.

If we talk about action fields, then their representation for most programming languages ​​is poorly designed. This circumstance prevents the use of automaton programming: the coding of actions complicates the processing. But if subroutines can be defined in the language as data, then the description of the structure of the table, which will be coordinated with further translation or interpretation, is possible, in particular, in functional programming languages ​​(LISP and ML family) and in Prolog. But functional and sentence programming well interpret only special cases of tables.

Let type T_StructTableLine describe the structure of a single row of a table, and let the entire table be represented as an array of Table of such structures. Пусть далее iSt - переменная, используемая для индексирования массива Table, некоторые из ее значений представляют состояния конечного автомата (но не все, а только те, к которым определен переход!). Наконец, пусть

int handler ( int );

-обработчик строки (ее интерпретатор или транслятор), <понимающий> структуру T_StructTableLine, специфицируемый как функция, при выполнении которой должно быть определено очередное значение iSt. Значение iSt, не указывающее ни на какую строку таблицы, служит сигналом для завершения обработки (при желании можно заняться кодированием того, как завершилась обработка с помощью задания различных таких значений).

Тогда схема программы, которая оперирует с таблицей с помощью функции handler, может быть задана следующим циклом:

 iSt = 0;
 do {
    iSt = handler ( iSt );
        }
while ( OUTSIDE ( iSt ) ); 

Понятно, что выбор массива в качестве представления автомата непринципиален. Допустимо, а иногда и предпочтительнее, работать со списком строк. Также непринципиален способ вычисления предиката OUTSIDE, проверяющего условие окончания цикла, хотя он и зависит от выбранного представления автомата. Возможность не рассматривать явно имена состояний, напротив, принципиальна: тот факт, что строки таблицы сгруппированы вокруг (поименованных) состояний, ничего не значит ни для обработки, ни для интерпретации, поскольку у конечного автомата следующее состояние всегда определяется явно.

Логика функции handler, интерпретирующей таблицу, проста. Для правильной таблицы, не содержащей состояний, в которых возможны неопределенные переходы, она строится следующим образом:

  1. Извлечь значение Table[iSt] ;
  2. Intensify the calculation of the condition of the processed table row;
  3. If the condition is true, then
    • activate the action routine;
    • read the next character;
    • terminate the handler with the value retrieved from the next state field;
  4. If the condition is false, then
    • iSt ++;
    • Go to 1;

It remains to provide details for the final states, but this is already easy to do.

Thus, you need to learn how to describe the structure of the row table. In C ++ / С #, this problem can be solved quite simply:

 struct T_StructTableLine
 {                            
    // field for state name is redundant
    int (* condition) (); // condition field
    void (* action) (); // action field
    int ref;                
    // поле перехода: индекс строки таблицы,
    // которая будет обрабатываться следующей,
    // или признак завершения процесса
 } 

Сама таблица описывается следующим образом:

T_StructTableLine Table[]

or

T_StructTableLine Table [Размер таблицы]

если значения строк задаются вне этого описания.

Однако сразу же видно ограничивающее соглашение, которое пришлось принять в связи с особенностями языка реализации: все условия и все действия оформлены как функции, тогда как задача, по своей сути, этого не требует. Более того, отделенное от таблицы описание функций, предписываемое языком С++/C#, резко снижает выразительность представления:

int Cond_St1_1() - условие в Table[1]

and

void Act_St1_1() - действие в Table[1] (строка 1 состояния St1 );

int Cond_St1_2() - условие в Table[2]

and

void Act_St1_2() - действие в Table[2] (строка 2 состояния St1 );

etc.

Еще один неприятный момент данного представления - невозможность единообразно с действиями представить их общий контекст (нулевая строка таблицы), а поскольку инициализацию естественно рассматривать как задание предварительных действий, готовящих собственно обработку, то ее предпочтительнее соединять с контекстом, а не с функциями, реализующими проверки и действия.

Последний недостаток неустраним ни в каком языке, если проверки и действия трактовать как процедуры, которые работают в общем контексте, задаваемом в заголовке таблицы. Так что лучше сразу отказаться от описания этого контекста в рамках таблицы (задание инициализации в ней возможно). Что касается первого недостатка, то он носит почти синтаксический характер, и если бы можно было задавать тела процедур как значения полей структур, то наглядность представления можно было бы сохранить. В языковом оформлении, похожем на С/С++/C#, это выглядело бы как следующее присваивание значения Table в качестве инициализации.

 Table[]= {
    {{return true;}, /* инициализация */, 1},
/*St1*/{{return 'a'<=symbol &&
    symbol <= 'z';}, {printf ("%c", symbol);
        cnt = 1;}, 4},
    {{return symbol != '\n';}, 
    /* Так как нужно печатать только слова,
       действия не заполняются */,
        1},
    {{return true}, 
    /* Переход не требует чтения, 
	   symbol == '\n' не нужно читать */, 0},
/*St2*/{{return 'a'<=symbol &&
    symbol <= 'z';}, {printf ("%c", symbol);
        cnt++;}, 4},
    {{return symbol!='\n';},
        {printf ("-%i\n",cnt);},
    1},
    {{return true }, 
	    {printf ("- %i\n",cnt);}, 0}
 }; 
Листинг 10.3.2.

Но для С/С++/C#, а также для других языков, не ориентированных на работу с таблицами, говорить о естественности представления не приходится. По-видимому, наиболее разумно строить автоматический преобразователь (типа препроцессора, но не путать этот термин с С!), который составляет текст программы по таблице. В этом случае снимается много проблем:

  • для программиста сохраняется наглядность;
  • при исполнении достигается алгоритмичность;
  • легко разносить фрагменты таблицы по разным участкам программы;
  • можно жертвовать наглядностью результирующего текста, поскольку с ним никто, кроме транслятора, не работает (пример: вставка новых строк в таблицу - проблема для человека, но не для препроцессора);
  • по той же причине можно жертвовать даже структурностью результирующего текста (в рассматриваемом примере, в частности, можно отказаться от представления условий и действий процедурами и функциями).

Это решение часто является приемлемым, однако и у него есть свои недостатки. Прежде всего, поиск ошибок, допущенных в текстах условий и действий, затруднен, возникает проблема двойной интерпретации (приходится понимать как исходное представление, так и внутреннее). Так что ручной перевод остается важным методом преобразования таблиц.

Проанализировав исходную таблицу, легко заметить, что используется фактически два варианта функций-условий и три - функций-действий. Из-за нестандартности работы с инициализацией (нулевая строка таблицы) удобно пополнить этот набор еще двумя функциями: <условием> и <действием>, выполняемыми при переходе к этой строке, которая интерпретируется как завершение работы автомата (таким образом, предикат OUTSIDE (iSt) можно представить как равенство аргумента нулю).

Теперь почти все готово для составления программы 10.3.3, реализующей интерпретатор таблицы переходов. Осталось решить вопрос о неявном для таблицы, но с необходимостью явном для программы способе чтения потока символов. Этот вопрос имеет три аспекта:

  • Когда читать? Поскольку прочитанный символ используется после его распознавания, естественно "не терять" его, пока выполняется действие. Следовательно, читать очередной символ целесообразно после отработки действия.
  • Как поступать в начале процесса: должен ли быть прочитан символ перед первым вызовом функции handler? Понятно, что чтение символов является частью функциональности этой функции. Если в ней реализовать чтение до вызовов условия и действия, то это противоречит предыдущему соглашению, а противоположное решение делает нестандартным начало процесса. В предлагаемой программе принято решение об особой обработке нулевой строки. Поскольку, являясь инициализацией, она действительно особая, это соответствует сделанному соглашению.
  • What to do at the end of the process? Reading the final character ('\ n') should stop possible attempts at subsequent access to reading. Therefore, it is necessary to take care of the final transitions. In the proposed program, a decision was made on the final transition to the zero line, which, according to the previous one, is non-standard.

Note that these agreements are not absolute, and the questions asked need to be resolved whenever the transition from the table view to the program is implemented . In the previous paragraph, you already saw how in almost every implementation the reading method changed slightly.

Taking into account the comments made, the program that solves the problem of counting the lengths of lines using the table interpretation method can be written as follows.

  #include <stdio.h>
 char symbol;  // variable to read character stream
 int cnt;  // variable for counting word lengths

 int c0 (), c1 (), c2 ();  // condition functions
 void a0 (), a1 (), a2 (), a3 ();  // action functions
 int handler (int i);  // table row handler
 struct T_StructTableLine
 {// the field for the state name is redundant
     int (* condition) ();  // condition field
     void (* action) ();  // action field
     int ref;  // jump field: the index of the table row
                                // which will be processed next
                                // or a sign of completion of the process
 }
     T_StructTableLine
     Table [] = {{c0, a0,1}, // the table is initialized statically,
     {c1, a1,2}, // alternative solution - special
     {c2, a0,1}, // function for setting initial values.
     {c0, a0,3}, // It is more flexible, but less
     {c1, a2,2}, // effectively.
     {c2, a3,1}, // For clarity, see comment
     {c0, a3,3};  // in the text.
     {c1, a2,2},
     {c2, a3,1},
     {c0, a3,0}};
 void main ()
 {
     int iSt = 0;
     do {
         iSt = handler (iSt);
         }
     while (iSt);
 }

 int handler (int i)
 {
     if (Table [i] .condition ()) {
         Table [i] .action ();
         if (Table [i] .ref) {symbol = getchar ();
         return Table [i] .ref;  }
     } else return ++ i;
 }

 // Descriptions of the functions used:
 int c0 () {return symbol == '\ n';}
 int c1 () {return 'a' <= symbol && symbol <= 'z';}
 int c2 () {return 'a'> symbol || symbol> 'z';}

 void a0 () {}
 void a1 () {printf ("% c", symbol); cnt = 1;}
 void a2 () {printf ("% c", symbol); cnt ++;}
 void a3 () {printf ("-% i \ n", cnt);} 
Listing 10.3.3. Word lengths: finite state machine interpreter.

Discussion of the decision

As already noted, the fragments describing the conditions and actions in the transition table and implemented as procedural inserts, from the point of view of the preprocessor (not a preprocessor of the programming system, but a special converter that generates the representation of the table for interpretation), are unrecognizable data that it simply skips without changes for compiler processing. The interpreter, on the contrary, is unable to execute the procedures by itself, and therefore treats references to them (in the corresponding fields) as data. Thus, the conditions and actions in the table are dual: they are both data and program fragments. The automatic table converter, without understanding the language of such dual data, tries, nevertheless, to combine them with ordinary data (in the case under consideration these are indexes of transition lines) in one structure.

At first glance, it seems that the situation would be significantly simplified in a language in which it is possible to perceive the data as executable fragments. An example of this kind is the command <Calculate string>. This command would allow the interpreter to understand data programs without leaving them unrecognized. Such a command is sometimes found in scripting languages ​​that are oriented towards interpretation. And in the LISP language, in which the data structures and programs simply coincide, this duality does not arise. But from experience it is well known that such a general solution is fraught with equally common and omnipresent troubles, and one must look for a private version of it that is adequate to this task.

The method of solving problems using the construction of tables and state and transition graphs is often convenient for processing data streams. In the books [32], [33], and especially in the problem book [2] and on the website http://is.ifmo.ru, there are a number of tasks of this kind. Decisions should be evaluated quantitatively and qualitatively: how many states and actions are required, how best it turns out in this case (in states or on transitions). It may be necessary to use tools that go beyond the methods described. This is permissible, one should only be aware of the distinction between using a finite state machine and using another suitable method, and also clearly separate the possibilities within the program relating to different programming styles.

Let's sum up

A table and a graph are two theoretically equivalent (but practically different) representations. Each of them has its advantages and disadvantages (information is poorer on the graph, but visibility can be better). One of these specifications often makes the other unnecessary.

Tables and graphs are much better than the program itself in terms of modifiability and transformations.

Thus, the tabular and graphical representations of the automaton are specifications No. 1, which need to be thought of first of all when the automaton program is compiled.

The main disadvantage of tabular and graph methods for representing automata is that they have to be further converted into a program. And in the future there is a temptation to modify the program directly, and the specification ceases to be adequate to the program.

In this connection, the question arises of how to better support the methods of the table and graph representation of automaton programs.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Programming Styles and Techniques

Terms: Programming Styles and Techniques