1.4. Text encoding

Lecture



Text information is coded with a binary code through the designation of each character of the alphabet by a specific integer. Using eight binary digits, it is possible to encode 256 different characters. This number of characters is enough to express all the characters of the English and Russian alphabets.

In the early years of computer technology, the difficulties in coding textual information were caused by the lack of necessary coding standards. At the present time, on the contrary, the existing difficulties are connected with a multitude of simultaneously acting and often contradictory standards.

For English, which is an unofficial international communication tool, these difficulties have been resolved. The US Institute for Standardization has developed and introduced the American Standard Code for Information Interchange (ASCII) coding system .

For encoding the Russian alphabet, several variants of encodings were developed:

1) Windows-1251 - introduced by Microsoft; given the wide distribution of operating systems (OS) and other software products of this company in the Russian Federation, it has found wide distribution;

2) KOI-8 (eight-digit Information Exchange Code) is another popular encoding of the Russian alphabet, common in computer networks in the Russian Federation and in the Russian Internet sector;

3) ISO (International Standard Organization - International Institute of Standardization) is an international standard for encoding symbols of the Russian language. In practice, this encoding is rarely used.

The limited set of codes (256) creates difficulties for the developers of a unified system for encoding textual information. As a consequence, it was proposed to encode characters not with 8-bit binary numbers, but with numbers with a large digit, which caused an expansion of the range of possible code values. The system of 16-bit character encoding is called universal - UNICODE. Sixteen digits allows you to provide unique codes for 65,536 characters, which is quite enough to accommodate the characters of most languages ​​in one table.

Despite the simplicity of the proposed approach, the practical transition to this coding system for a long time could not be realized due to the lack of resources of computer equipment, since in the UNICODE coding system all text documents become automatically twice as large. In the late 1990s. the technical tools have reached the required level, the gradual transfer of documents and software to the UNICODE coding system has begun.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Informatics

Terms: Informatics