index types Oracle, MySQL, PostgreSQL, MS SQL

Lecture



What types of indexes are supported in the most popular DBMS?

B-tree


The B-Tree index family is the most commonly used type of index, organized as a balanced tree, of ordered keys. They are supported by almost all DBMS, both relational and non-relational, and for almost all types of data.

  index types Oracle, MySQL, PostgreSQL, MS SQL

Since most probably know them well (or can read about them, for example, here), the only thing that should perhaps be noted here is that this type of index is optimal for a set with a good distribution of values ​​and high power (cardinality -number of unique values).

Spatial indices


Currently, all DBMS data has spatial data types and functions for working with them, for Oracle it is a set of types and functions in the MDSYS schema, for PostgreSQL it is point, line, lseg, polygon, box, path, polygon, circle, in MySQL - geometry, point, linestring, polygon, multipoint, multilinestring, multipolygon, geometrycollection, MS SQL - Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection.
In the scheme of work of spatial queries, usually there are two stages or two stages of filtering. DBMS with weak spatial support work out only the first stage (coarse filtering, MySQL). As a rule, an approximate, approximate representation of objects is used at this stage. The most common type of approximation is the Minimum Bounding Rectangle (MBR) [100].
For spatial data types, there are special indexing methods based on R-trees (R-Tree index) and grids (Grid-based Spatial index).

Spatial grid

Spatial grid (spatial grid) index is a tree-like structure similar to a B-tree, but is used to provide access to spatial (Spatial) data, that is, to index multi-dimensional information, such as geographic data with two-dimensional coordinates (latitude and longitude). ). In this structure, the nodes of the tree are the cells of space. For example, for a two-dimensional space: first, the entire parent area will be divided into a grid of strictly defined resolution, then each grid cell in which the number of objects exceeds the maximum set of objects in the cell will be divided into the next level subgrid. This process will continue until the maximum nesting is reached (if set), or until everything is divided up to cells that do not exceed the maximum objects.
  index types Oracle, MySQL, PostgreSQL, MS SQL

In the case of three-dimensional or multidimensional space, these will be rectangular parallelepipeds (cuboids) or parallelotopes.
  index types Oracle, MySQL, PostgreSQL, MS SQL

Quadtree

Quadtree is a subspecies of the Grid-based Spatial index, in which there are always 4 descendants in the parent cell and the grid resolution varies depending on the nature or complexity of the data.

  index types Oracle, MySQL, PostgreSQL, MS SQL

R-tree

R-Tree (Regions Tree) is also a tree-like data structure similar to the Spatial Grid, proposed in 1984 by Antonin Guttman. This data structure also divides the space into many hierarchically nested cells, but which, unlike the Spatial Grid, do not have to completely cover the parent cell and can intersect.
Various algorithms can be applied to splitting overflowed vertices, which causes the division of R-trees into subtypes: with quadratic and linear complexity (Guttman, of course, described with exponential complexity — Exhaustive Search, but naturally he is not used anywhere).
The quadratic subtype is divided into two rectangles with a minimum area covering all objects. Linear - in the division by maximum distance.

  index types Oracle, MySQL, PostgreSQL, MS SQL

HASH


Hash indices were proposed by Arthur Fuller, and they do not preserve the values ​​themselves, but their hashes, thereby reducing the size (and, accordingly, their processing speed) of indices from large fields. Thus, when querying using HASH indexes, the hash from the desired value will not be compared with the field hashes, but with the hashes of the fields.
Due to the non-linearity of hash functions, this index cannot be sorted by value, which leads to the impossibility of using more / less and “is null” in comparisons. In addition, since hashes are not unique, collision resolution methods are used for matching hashes.

Bitmap


Bitmap index - the bit index method consists in creating separate bitmaps (sequence 0 and 1) for each possible value of the column, where each bit corresponds to a row with an indexed value, and its value equal to 1 means that the entry corresponding to the bit position contains the indexed value for this column or property.

Empid Floor
one Male
2 Female
3 Female
four Male
five Female


Bit cards
Value Start the end Bit mask
Male Address of the first line Last line address 10010
Female Address of the first line Last line address 01101

The main advantage of bit indices is that on large sets with low power and good clustering by their values, the index will be less than B * -tree. (Read more here or here)

Reverse index


Reverse index is also a B-tree index but with a reversed key, used mainly for monotonously increasing values ​​(for example, an auto-incrementing identifier) ​​in OLTP systems in order to remove competition for the last leaf block of the index, since due to the reversal of the value, two adjacent index entries fall into different index blocks. It cannot be used for range search.
Example:
Field in the table (bin) Reverse index key (bin)
00000001 10,000,000
... ...
00001001 10010000
00001010 01010000
00001011 11010000
As you can see, the value in the index changes much more than the value in the table itself, and therefore in the b-tree structure, they will fall into different blocks.

Inverted index


An inverted index is a full-text index that stores for each key token a sorted list of addresses of table entries that contain the given key.
one Mom washed the frame
2 Dad washed the frame
3 Dad was washing a car
four Mom polished car

In simplified form, it will look like this:
Mama 1.4
Soap one
Rama 1.2
Dad 2.3
Polished four
Car 3.4


Partial index


Partial index is an index built on the part of the table that satisfies a specific condition of the index itself. This index was created to reduce the size of the index.

Function-based index


The very flexible type of index is a functional index, that is, an index whose keys store the result of user-defined functions. Functional indexes are often built for fields whose values ​​are pre-processed before comparison in a SQL command. For example, when comparing string data case-insensitive, the UPPER function is often used. Creating a functional index with the UPPER function improves the efficiency of such comparisons.
In addition, a functional index can help to implement any other missing index type of this DBMS (except, perhaps, a bit index, for example, Hash for Oracle)

Summary Table of Index Types


Mysql PostgreSQL MS SQL Oracle
B-Tree index there is there is there is there is
Spatial indexes supported R-Tree with quadratic splitting Rtree_GiST (linear split is used) 4-level Grid-based spatial index (separate for geographic and geodetic data) R-Tree with quadratic partitioning; Quadtree
Hash index Only in tables of type Memory there is Not Not
Bitmap index Not there is Not there is
Reverse index Not Not Not there is
Inverted index there is there is there is there is
Partial index Not there is there is Not
Function based index Not there is there is there is

It is worth mentioning that in PostgreSQL GiST allows you to create an index based on R-Tree for any of your own data types. To do this, you need to implement all 7 functions of the R-Tree mechanism.

Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL

Terms: Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL