Strings in javascript

Lecture




  1. Creating strings
    1. Special symbols
    2. Escaping special characters
  2. Methods and properties
    1. Length length
    2. Character access
    3. Row changes
    4. Register change
    5. Substring search
    6. Search all occurrences
    7. Substring capture: substr , substring , slice .
    8. Negative arguments
  3. Unicode Encoding
  4. String comparison
  5. Total

In JavaScript, any textual data is a string. There is no separate type of “symbol” that exists in a number of other languages.

The internal format of strings, regardless of page encoding, is Unicode.

Creating strings

Strings are created using double or single quotes:

1 var text = "моя строка" ;
2
3 var anotherText = 'еще строка' ;
4
5 var str = "012345" ;

In JavaScript, there is no difference between double quotes and single quotes .

Special symbols

Strings may contain special characters. The most frequently used of these characters is the line break .

It is denoted as \n , for example:

1 alert( 'Привет\nМир' ); // выведет "Мир" на новой строке

There are more rare characters, here is their list:

Special symbols
Symbol Description
\ b Backspace
\ f Form feed
\ n New line
\ r Carriage return
\ t Tab
\ unNNN Unicode character with hexadecimal code NNNN . For example, \u00A9 - Unicode representation of the copyright symbol ©

Escaping special characters

If the string is in single quotes, internal inner quotes must be escaped , that is, provided with a backslash, like this:

var str = ' I\'m a JavaScript programmer' ;

In double quotes - internal double quotes are escaped:

1 var str = "I'm a JavaScript \"programmer\" " ;
2 alert(str);

Escaping is intended solely for the correct perception of the JavaScript string. In memory, the string will contain the character itself without the '\' . You can see this by running the example above.

The backslash character '\' is a service one, therefore it is always escaped, that is, it is written as \\ :

1 var str = ' символ \\ ' ;
2
3 alert(str); // символ \

You can shield any character. If it is not special, then nothing will happen:

1 alert( "\a" ); // a
2 // идентично alert( "a" );

Methods and properties

Here we look at the methods and properties of strings, some of which we met earlier in the chapter Methods and Properties.

Length length

One of the most frequent actions with a string is getting its length:

1 var str = "My\n" ; // 3 символа. Третий - перевод строки
2
3 alert(str.length); // 3

Character access

To get a character, use the charAt(позиция) call. The first character has a position of 0 :

1 var str = "jQuery" ;
2 alert( str.charAt(0) ); // "j"

There is no separate “character” type in JavaScript, so charAt returns a string consisting of the selected character.

In modern browsers (not IE7-), you can also use square brackets to access a symbol:

1 var str = "Я - современный браузер!" ;
2 alert(str[0]); // "Я", IE8+

The difference between this method and charAt is that if there is no character, the charAt an empty string, and the parentheses are undefined :

1 alert( "" .charAt(0) ); // пустая строка
2 alert( "" [0] ); // undefined, IE8+

The method call is always with parentheses.

Note that str.length is a property of the string, and str.charAt(pos) is a method , i.e. function.

The call to the method always comes with brackets, and the property without brackets.

Row changes

Strings in JavaScript cannot be changed. You can read the symbol, but you cannot replace it. Once a string is created, it is forever.

To get around this, a new line is created and assigned to a variable instead of the old one:

1 var str = "строка" ;
2
3 str = str.charAt(3) + str.charAt(4) + str.charAt(5);
4
5 alert(str); // ока

Register change

The toLowerCase() and toUpperCase() methods change the case of a string to lower / upper:

1 alert( "Интерфейс" .toUpperCase() ); // ИНТЕРФЕЙС

The example below gets the first character and brings it to lowercase:

alert( "Интерфейс" .charAt(0).toLowerCase() ); // 'и'

Importance: 5

Write the function ucFirst(str) , which returns the string str with a capital first character, for example:

ucFirst( "вася" ) == "Вася" ;
ucFirst( "" ) == "" ; // нет ошибок при пустой строке

PS In JavaScript, there is no built-in method for this. Create a function using toUpperCase() and charAt() .

Decision

We cannot simply replace the first character, since JavaScript strings are immutable.

The only way is to re-create a line based on the existing one, but with a capital first character:

01 function ucFirst(str) {
02    var newStr = str.charAt(0).toUpperCase();
03
04    for ( var i=1; i<str.length; i++) {
05      newStr += str.charAt(i);
06    }
07
08    return newStr;
09 }
10
11 alert( ucFirst( "вася" ) );

PS Other solutions are possible using the str.slice method and
str.replace.

[Open task in new window]

Substring search

To search for a substring, there is an indexOf method (a substring [, starting_position]).

It returns the position where the подстрока is located, or -1 if nothing is found. For example:

1 var str = "Widget with id" ;
2
3 alert( str.indexOf( "Widget" ) ); // 0, т.к. "Widget" найден прямо в начале str
4 alert( str.indexOf( "id" ) ); // 1, т.к. "id" найден, начиная с позиции 1
5 alert( str.indexOf( "Lalala" ) ); // -1, подстрока не найдена

The optional second argument allows you to search starting from the specified position. For example, the first time "id" appears at position 1 . To find its next appearance, run the search from position 2 :

1 var str = "Widget with id" ;
2
3 alert( str.indexOf( "id" , 2) ) // 12, поиск начат с позиции 2

There is also a similar lastIndexOf method that searches not from the beginning, but from the end of the line.

For a nice call to indexOf , the bitwise operator is NOT '~' .

The fact is that the call ~n equivalent to the expression -(n+1) , for example:

1 alert( ~2 ); // -(2+1) = -3
2 alert( ~1 ); // -(1+1) = -2
3 alert( ~0 ); // -(0+1) = -1
4 alert( ~-1 ); // -(-1+1) = 0

As you can see, ~n is zero only in the case when n == -1 .

That is, the if ( ~str.indexOf(...) ) check means that the indexOf result is different from `-1, i.e. there is a coincidence.

Like this:

1 var str = "Widget" ;
2
3 if ( ~str.indexOf( "get" ) ) {
4    alert( 'совпадение есть!' );
5 }

In general, using the capabilities of the language in an unobvious way is not recommended, since it degrades the readability of the code.

However, in this case, everything is in order. Just remember: '~' is read as “not minus one”, but "if ~str.indexOf" is read as "если найдено" .

Importance: 5

Write the function checkSpam(str) , which returns true if the string str contains 'viagra' or 'XXX'.

The function must be case insensitive:

checkSpam( 'buy ViAgRA now' ) == true
checkSpam( 'free xxxxx' ) == true
checkSpam( "innocent rabbit" ) == false

Decision

The indexOf method searches case-sensitive. That is, in the string 'xXx' he will not find 'XXX' .

For verification, we will result in lowercase and the string str and what we will look for:

1 function checkSpam(str) {
2    str = str.toLowerCase();
3        
4    return str.indexOf( 'viagra' ) >= 0 || str.indexOf( ) >= 0 || str.indexOf( 'xxx' ) >= 0;
5 }

Complete solution: tutorial / intro / checkSpam.html.

[Open task in new window]

Search all occurrences

To find all occurrences of a substring, you need to run indexOf in a loop. As soon as we get the next position, we start the next search with the next one.

An example of such a cycle:

01 var str = "Ослик Иа-Иа посмотрел на виадук" ; // ищем в этой строке
02 var target = "Иа" ; // цель поиска
03
04 var pos = 0;
05 while ( true ) {
06    var foundPos = str.indexOf(target, pos);
07    if (foundPos == -1) break ;
08
09    alert(foundPos); // нашли на этой позиции
10    pos = foundPos + 1; // продолжить поиск со следующей
11 }

Such a cycle starts the search from position 0 , then finding the substring at the position foundPos , the next search will continue from position pos = foundPos+1 , and so on until it finds something.

However, the same algorithm can be written and shorter:

1 var str = "Ослик Иа-Иа посмотрел на виадук" ; // ищем в этой строке
2 var target = "Иа" ; // цель поиска
3
4 var pos = -1;
5 while ( (pos = str.indexOf(target, pos+1)) != -1) {
6    alert(pos);
7 }

Substring capture: substr , substring , slice .

In JavaScript, there are as many as 3 (!) Methods for taking a substring, with a few differences between them.

substring(start [, end])
The substring(start, end) method returns the substring from the start to position, but not including the end .

1 var str = " s tringify" ;
2 alert(str.substring(0,1)); // "s", символы с позиции 0 по 1 не включая 1.

If the end argument is omitted, then it goes to the end of the line:

1 var str = "st ringify " ;
2 alert(str.substring(2)); // ringify, символы с позиции 2 до конца

substr(start [, length])
The first argument has the same meaning as in substring , and the second contains not the final position, but the number of characters.

1 var str = "st ring ify" ;
2 str = str.substr(2,4); // ring, со 2й позиции 4 символа
3 alert(str)

If there is no second argument, it is implied “to the end of the line”.

slice(start [, end])
Returns part of a string from the start position to, but not including, the end position. The meaning of the parameters is the same as in substring .

Negative arguments

The difference between substring and slice is how they work with negative and out-of-line arguments:

substring(start, end)
Negative arguments are interpreted as equal to zero. Too large values ​​are truncated to the length of the string:

1 alert( "testme" .substring(-2) ); // "testme", -2 становится 0

In addition, if start > end , then the arguments are reversed, i.e. returns the section of the line between start and end :

1 alert( "testme" .substring(4, -1) ); // "test"
2 // -1 становится 0 -> получили substring(4, 0)
3 // 4 > 0, так что аргументы меняются местами -> substring(0, 4) = "test"

slice
Negative values ​​are counted from the end of the line:

1 alert( "testme" .slice(-2) ); // "me", от 2 позиции с конца

1 alert( "testme" .slice(1, -1) ); // "estm", от 1 позиции до первой с конца.

This is much more convenient than the strange logic substring .

The negative value of the first parameter is supported in substr in all browsers except IE8-.

Findings.

The most convenient method is slice(start, end) .

Alternatively, you can use substr(start, length) , remembering that IE8 does not support negative start .

Importance: 5

Create a truncate(str, maxlength) function that checks the length of the string str , and if it exceeds maxlength , replaces the end of str with '…' , so that its length becomes equal to maxlength .

The result of the function should be (if necessary) a truncated string.

For example:

truncate( "Вот, что мне хотелось бы сказать на эту тему:" , 20) = "Вот, что мне хотело…"
truncate( "Всем привет!" , 20) = "Всем привет!"

This feature has an application in life. It is used to truncate message threads that are too long.

Decision

Since the final length of the string should be maxlength , you need to cut it a little shorter to give room for the three-dot.

01 function truncate(str, maxlength) {
02    if (str.length > maxlength) {
03      return str.slice(0, maxlength - 3) + '...' ;
04      // итоговая длина равна maxlength
05    }
06
07    return str;
08 }
09
10 alert(truncate( "Вот, что мне хотелось бы сказать на эту тему:" , 20));
11 alert(truncate( "Всем привет!" , 20));

Another best option would be to use instead of the three points a special “ellipsis” symbol: ( &hellip; ), then you can cut one character.

01 function truncate(str, maxlength) {
02    if (str.length > maxlength) {
03      return str.slice(0, maxlength - 1) + '…' ;
04    }
05
06    return str;
07 }
08
09 alert(truncate( "Вот, что мне хотелось бы сказать на эту тему:" , 20));
10 alert(truncate( "Всем привет!" , 20));

One could write this code even shorter:

1 function truncate(str, maxlength) {
2    return (str.length > maxlength) ?
3      str.slice(0, maxlength - 1) + '…' : str;
4 }
5
6 alert(truncate( "Вот, что мне хотелось бы сказать на эту тему:" , 20));
7 alert(truncate( "Всем привет!" , 20));

[Open task in new window]

Unicode Encoding

If you are familiar with string comparisons in other languages, let me suggest one little riddle. Not even one, but two.

As we know, the characters are compared in alphabetical order 'А' < 'Б' < 'В' < ... < 'Я' .

But there are a few oddities ..

  1. Why is the letter 'а' small more than the letter 'Я' big?
    1 alert( 'а' > 'Я' ); // true
  2. The letter 'ё' is in the alphabet between е and ж : абвгде ё жз.. But why then 'ё' more 'я' ?
    1 alert( 'ё' > 'я' ); // true

To deal with this, let's turn to the internal representation of strings in javascript.

All strings are internally encoded Unicode.

It doesn't matter what language the page is written in, whether it is in windows-1251 or utf-8. Inside the JavaScript interpreter, all strings are reduced to a single “unicode” form. Each character has its own code.

There is a method for getting a character by its code:

String.fromCharCode (code)
Returns the character code code :
1 alert( String.fromCharCode(1072) ); // 'а'

... And a method for obtaining a digital code from a symbol:

str.charCodeAt (pos)
Returns the character code at position pos . The countdown starts from zero.
1 alert( "абрикос" .charCodeAt(0) ); // 1072, код 'а'

Now back to the examples above. Why do comparisons of 'ё' > 'я' and 'а' > 'Я' give such a strange result?

The fact is that the characters are not compared alphabetically, but by code . Who has more code - one and more. There are many different characters in Unicode. Only a small part of them correspond to the Cyrillic letters, in more detail - Cyrillic in Unicode.

Let's output a segment of unicode characters with codes from 1034 to 1113 :

1 var str = '' ;
2 for ( var i=1034; i<=1113; i++) {
3    str += String.fromCharCode(i);
4 }
5 alert(str);
Result:

ЊЋЌЍЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяѐёђѓєѕіїјљ

We can see from this section two important things:

  1. Lower case letters come after capital letters, so they are always larger.
    In particular, 'а'(код 1072) > 'Я'(код 1071) .
    The same thing happens in the English alphabet, there is 'a' > 'Z' .
  2. A number of letters, such as ё , are outside the main alphabet.
    In particular, the small letter ё has a code that is larger than я , therefore 'ё'(код 1105) > 'я'(код 1103) .
    By the way, the capital letter Ё is located in Unicode to А , therefore 'Ё' (code 1025) < 'А' (code 1040) . Amazing: there is a letter less than А

HTML Unicode

By the way, if we know the character code in Unicode, then we can add it to HTML using the “numeric character reference”.

To do this, first write &# , then code, and terminate with a semicolon ';' . For example, the character 'а' in the form of a numeric link: &#1072; .

If they want to give the code in hexadecimal notation, then start with &#x .

There are many funny and useful characters in Unicode, for example, the scissors symbol: ✂ ( &#x2702; ), fractions: ½ ( &#xBD; ) ¾ ( &#xBE; ) and others. They can be conveniently used instead of pictures in the design.

String comparison

The strings are compared lexicographically , in the order of the “telephone directory”.

Comparison of strings s1 and s2 processed according to the following algorithm:

  1. The first characters are compared: a = s1.charAt(0) and b = s2.charAt(0) . If they are the same, then the next step, otherwise, depending on the result of their comparison, return true or false
  2. The second characters are compared, then the third, and so on ... If there are no more characters in one line, then it is smaller. If in both ended - they are equal.

The language specification defines this algorithm in more detail, but the meaning exactly corresponds to the order in which the names are entered into the telephone directory.

"Z" > "A" // true
"Вася" > "Ваня" // true, т.к. с > н // true, т.к. с > н
"aa" > "a" // true, т.к. начало совпадает, но в 1й строке больше символов // true, т.к. начало совпадает, но в 1й строке больше символов

Numbers as strings are compared as strings.

It happens that the numbers come to the script as strings, for example, as the result of the prompt . In this case, the result of their comparison will be incorrect:

1 alert( "2" > "14" ); // true, так как это строки, и для первых символов верно "2" > "1"

If at least one argument is not a string, the other will be converted to a number:

1 alert(2 > "14" ); // false

Total

  • Strings in JavaScript are internally encoded Unicode. When writing a string, you can use special characters, for example, \n and insert Unicode characters by code.
  • We introduced the length property and the methods charAt , toLowerCase/toUpperCase , substring/substr/slice ( slice preferred)
  • Strings are compared letter by letter. Therefore, if a number is received as a string, then such numbers may not be compared correctly, you need to convert it to the number type.
  • When comparing strings it should be borne in mind that the letters are compared by their codes. Therefore, a capital letter is smaller than a small one, and the letter ё generally outside the main alphabet.

Creature

a = 'my string'
b = new String(object) // синтаксис устарел и не используется
c = String(object)

Arguments

string - Optional. Any group of Unicode characters.


Description, examples

String objects, as a rule, are created implicitly using string literals.

// кавычки любые - без разницы
var str = "string literal"

In string literals, you can use escape sequences to represent special characters that cannot be directly used in strings, such as a newline character or Unicode characters. When the script is compiled, each escape sequence in the string literal is converted to the characters it represents.

You can specify a Unicode character explicitly through its code.

var str = "\u1234"

String objects specified by quotes (and called "primitive" strings) are slightly different from String objects created with the new operator. So, for example, the data type (typeof) of an object created with new is 'object' , not 'string' . And such an object can directly assign additional properties and methods. As for the rest, the interpreter automatically turns primitive strings into objects.

"12345" .length // 5

Character access

Characters are accessed using the String # charAt method.

return 'cat' .charAt(1); // возвратит "a"

There is also a method missing in ECMA-262: addressing a string as an array:

var str = 'cat'
return str[1] // "a"

In contrast to the languages ​​C / PHP / etc., the once created string cannot be changed: the characters can only be read, but not changed.

To change a string variable, assign the modified string:

str = "строка"
str = str.charAt(4) + str.charAt(5) + str.charAt(6) // "ока"

String comparison

For string comparison, the usual <> operators are used.


Methods

split
charCodeAt
String.fromCharCode
charAt
concat
lastIndexOf
search
match
toLowerCase
toUpperCase
toLocaleLowerCase
toLocaleUpperCase
toString
valueOf
substring
slice
indexOf
substr
replace
created: 2014-10-07
updated: 2021-03-13
132779



Rating 9 of 10. count vote: 2
Are you satisfied?:



Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Scripting client side JavaScript, jqvery, BackBone

Terms: Scripting client side JavaScript, jqvery, BackBone