6.6. Problems of working in the Internet with Cyrillic texts

Lecture



For Cyrillic texts, various encoding systems were used in DOS and Windows systems. DOS used ASCII codes that corresponded to code page 866, and on Windows, the encoding corresponding to code page 1251. Therefore, texts prepared in a text editor running under DOS could not be read directly into Windows and required recoding. The texts that were prepared by the editors of Windows, looked abracadabra, if they tried to read the encoding in DOS. To eliminate this problem, transcoders were created, which were embedded in some text editors and provided transcoding from DOS to Windows and back.

In the case of working with the Internet, the problem has worsened. This was due to the fact that Cyrillic characters were encoded in the third way, using the code table KOI8. It has traditionally been used in computers that are running the UNIX operating system. Initially, the Internet servers were built exclusively on the basis of UNIX, as a result of which Russian-language texts were encoded using KOI8 only. This explains the fact that the Russian-language text on the Internet was an abracadabra when reproduced in an encoding different from the one in which it was originally created. This problem can be eliminated when working in WWW with the help of on-screen buttons that allow you to re-display the page of the document in a different encoding.

Difficulties with Cyrillic texts also arise when they are saved. This may occur with further autonomous (offline) text processing.

Save WWW pages in two ways:

1) saving in the same HTML format in which it was present in the Internet. In this case, the similar file can be viewed and edited, firstly, by the same software tools that provided its viewing when working directly on the Internet, and secondly, by other specialized editors focused on working with the HTML format;

2) save the document in the form of a plain text file. In this case, text information is saved without formatting elements. The document is stored in ASCIL codes, if it was created using code pages 866 or 1251 (in DOS or Windows). Such a document can be read and edited both in DOS and Windows, but when it is re-encoded at the time of loading into Word, it is necessary to specify “Text Only” as the method of transcoding, and not “DOS Text”.

Protocols can be used for the following purposes:

1) implementation in the global network of the specified addressing system of hosts;

2) organization of reliable information transfer;

3) transformation and representation in accordance with the method of its organization.

The main protocol used when working on the Internet is TCP / IP, which combines transmission protocols (TCP) and host identification (IP). In fact, work on the Internet when accessing the provider using a modem over a dial-up telephone line is performed using one of two modifications of the TCP / IP protocol: via SLIP or PPP (a more modern protocol).

When a user uses only e-mail, without realizing all means of the Internet, it is enough for him to work using the UUCP protocol. This is a bit cheaper, but the user's capabilities are deteriorating.

For some information services, besides network-wide protocols, their own protocols are used.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Informatics

Terms: Informatics