A few words about database design.

Lecture



Data, database, relational database model. Basic concepts.

We give this definition. This is information about the state of the properties of an object. That is, we have a carrot object. (Why exactly carrots? Well, I just have it now on the table in a plate. By the way, it strengthens the eyesight!)
She has properties: store, color, juiciness, length, freshness, well, there and something else. Every property has a value.
Suppose we have a lot of different carrots. All of them have their own set of properties. We can write about this in a notebook in the form of a poetic text. But working with it will not be convenient. It is more convenient to work with the data when they are presented in the table. The names of the columns are the names of the properties, each row is a specific carrot, and in cells the values ​​of the corresponding property of a specific carrot.

Let's stammer something.

Table:

Score Colour Juiciness Freshness room Store Address
Crossroads Redhead 70% 15 days one Moscow
Messenger Blue 2% 5 days 2 Pattaya
Crossroads Red 75% 10 days 3 Moscow

Well, and so on. Pretty comfortable. But there are certain disadvantages here. Do not criticize yet, I will be engaged in just this now, and we will do everything right. We can store data not only about carrots. We can, for example, about shops suppliers of carrots, as we did.

What are the disadvantages here. Descriptions of different types of objects are stored in one table: carrots and shops. It is necessary to create a table for each type of objects. In our case, these will be the "Shops" and "Carrots" tables.

The shops:

Title Address room
Crossroads Moscow one
Messenger Pattaya 2

Carrots:

Colour Juiciness Freshness room Store number
Redhead 70% 15 days one one
Blue 2% 5 days 2 2
Red 75% 10 days 3 one

The advantage of this approach is to minimize unnecessary duplication of large amounts of information. Here, instead of twice writing about a carrot from a Moscow crossroads, we wrote 2 times the store with number 1, and that, in turn, is described once in another table. Agree, now our database takes up less space and is more convenient to use. And the list of stores is now separated from the carrots and you don’t have to look for them there. By the way, I forgot to say. Database - a set of information organized in any way. We have these tables, and, as you can see, they are related: the store number in the carrot information indicates the corresponding row of the table on stores (we'll talk about the connection a little later).

So here. If the database is organized in the form of two-dimensional tables interconnected, then this model is called relational .


The relationship between the tables.

Tables are linked by fields called keys. Keys can be primary and external. In our example, in the "Carrots" table, the number field is the primary key of the table. The same is true in "Stores". The primary key uniquely identifies the entry in the table, i.e. does not contain duplicate values. In the table "Carrots" there is a field "store number". This is a foreign key - a field that stores the values ​​of the primary key of the associated table.

There are the following types of relationships (relationships) between tables:
* One-to-many. The most common form of communication. One record of the table corresponds to any number (both 0 and 1, and even more) of the records of another table. As in our example. One store delivers different carrots.

* Many-to-many. In fact, it is clear from the title what a connection is, that is why unnecessary words are not needed here. Well, for example, now let us have a table not “Carrots”, but a broader concept “Products” and the same table “Shops”. The same product can be sold in several stores, while at the same time one store can sell many products. In a good way, such a relationship should be avoided, because it contradicts both the definition of normal forms and the integrity of connections (we will consider both of these concepts later). It is better to make two connections one-to-many.

For example:
- Many-to-many communication: Shops (name, flag), Products (name, flag). Each product is sold by the store with which it has the same flag. As you can see, and to organize such a connection, so that it was also clear what was happening, is already problematic.
- We replace with one-to-many connections. Shops (name, flag), Products (name, flag), Relationships (shop_ flag, product_ flag). Now in the first two tables - only unique values. Than this all is good, you learn, when the speech about integrity and normal forms goes. For now, based on your assumptions. we will return to this example later and check them out.

*One to one. There is no big sense in such links. As a rule, you can make one from tables that are so related. But when there are too many columns in the table, you can use it.


Integrity of data and connections.

The integrity of the data - their correctness, completeness and consistency. With connections as well.
What breaks integrity? For example, with a one-to-many relationship in a subordinate table there may be any number of records corresponding to the main one. But in the main there should always be only one, no more and no less relevant entries. So, if there are records in the subordinate table, the primary key value of which is not found in the foreign key of the main table, the integrity is violated here (we look at the definition of integrity and understand why). Such violations can be controlled - if they occurred, it is enough to analyze the key values. By means of the SQL language, this is done with one small SELECT * FROM sub_table t2 WHERE NOT EXISTS query (SELECT * FROM primary_table t1 WHERE t1.foriegn_key = t2.primary_key). From the field name everything is clear. The query displays the orphan records of the subordinate table. About SQL language later.

It is just as easy to control the appearance of such things, and we'll talk about this in some detail. Think about whether it is convenient for you to maintain the integrity of many-to-many communication. Relations in the database are reduced to normal forms, if the data organized in this way does not contain redundancy and there are no threats to their integrity. In the example with a carrot, we have already performed normalization - the process of reduction to normal forms.

What are the normal forms? In practice, four are usually distinguished:
* The first normal form is such a form that only one value is stored in the table cells, and not a list of values. Those. in relational databases, the first normformat is always reached.

* Second n. implies the absence of such columns, which depend only on the part of the composite primary key and besides this, the table is already given to the 1st nf. A good example is given in Wikipedia:
Let the Head and Position together form the primary key in the following table:

Chief Position Salary Computer availability
Grishin Storekeeper 20,000 Not
Vasiliev Programmer 40,000 there is
Vasiliev Storekeeper 25,000 Not

Each employee sets the salary to the employee himself, but its boundaries depend on the position. The presence of a computer employee depends only on the position, that is, the dependence on the primary key is not complete.

As a result of casting to 2NF, we get two tables:

Chief Position Salary
Grishin Storekeeper 20,000
Vasiliev Programmer 40,000
Vasiliev Storekeeper 25,000

Here, the primary key, as in the source table, is a composite, but the only Salary attribute that does not belong to it now depends on the whole key, that is, it is complete.

Position Computer availability
Storekeeper Not
Programmer there is

* In the 3rd n. the table is, if it is in the second n. and no columns are dependent on other columns that are not in the primary key.

* In the 4th n. a table is located (it is clear that it is also found in all previous nfs), if not one line of the table by its existence obliges the existence of another line in the same table.
You will come across a table later that is not in the 4th n. and should not be in it.


Database Objects

In addition to the tables in which information is stored, the DBMS provides a number of other objects that help to ensure and maintain data integrity, convenient data access, and other things.
Among these objects are Views, Stored Procedures, Triggers, Indices. Other objects already in different DBMS may or may not be present.

Views (view) - a dynamic table containing a selection of several or more other tables. Changes to the data in the tables that are presented in the sample are immediately reflected in the view.

Stored Procedures (Stored Procedures) - a sequence of SQL-instructions, including those containing instructions on procedural extensions of the SQL language, which allow to organize such control structures as loops, branches.
They can be used both to perform frequently encountered queries (the stored procedure is faster, because it is compiled), and to perform some calculations and manipulations with these DBMS tools.

Triggers are stored procedures that are automatically called when an event occurs. They can be included before and after deleting, adding or updating information.
For example, before deleting, you can check if there will be left as a result of this orphaned entries in the child table and, if so, prohibit the deletion. Or, after deleting, you can automatically delete the associated records in another table. Before inserting and updating, it is also possible to carry out any checks that the new information will not damage the integrity of the data. After inserting and updating, you can perform any actions based on the inserted set.

Indexes (index) is an object due to which database performance is increased many times. In a lot. Now we understand how and why.
The data in the tables are stored in random order and in large quantities. When data is organized in such a way, the search can only be performed by sequential search. And if the index is not attached to the column, this is exactly what the search is for.
The index can be composed of one or more columns. Sometimes from expressions using columns (sum there, upper case, concatenation). The data from which the index is composed is stored in a tree form - this structure allows using very fast search algorithms. Each index entry points to a row in a table.
Indexes can be unique when no index element is repeated and non-unique. Using unique indexes, you can control duplicate values. Non-unique ones are needed just to speed up the search or in table links. The primary and foreign keys of the table should always be indexed.


A little bit about CASE-tools

I will not talk about them in detail, I will not impose anything, I’ll just say what I need (more precisely, why I use them) and give a few examples.

Case tools are used at design stages. Not only DB. In designing software, business plans, everything where you can decompose a task, highlight some kind of functionality.

You can take a pencil, a piece of paper, and draw everything there, and then start to implement it. Here is the first inconvenience. The thought came better, it is necessary to erase, redraw. There is no space left on a piece of paper to place this or that block, well, or something else there. When everything is finally ready, you need to implement it using the means of the selected language. First, you have to work with your hands. Secondly, all sorts of trouble will begin to arise: something did not take into account, something is difficult to be realized by the chosen language, some other nastiness.

Here, using case-tools, you can, besides getting rid of paperwork problems with a pencil, after completing the design, overtake it to the environment in which you plan to work further. If it was a database layout, create a database on this layout in your DBMS. Of course, it is necessary to modify, but still. In addition, in most cases, analysis errors will be easier for you to detect already at this stage.
I will give a couple of examples: ER-Win, PB-Win, Rational Rose.
That's all, actually.


What to consider when designing and where to start.

As always, one should begin with the statement of the problem and analysis. Not that everything is so formal, but you should have a clear idea of ​​what you need. And the more truly this idea is in the early stages, the less it will have to be changed later.

Analyzed, something you did, now normalization. To the maximum possible normal form. And at the same time the naming of tables and columns. as in programming variable names, and here. By name it should be clear that where.
At these two stages, you can apply to a piece of paper with a pencil, and CASE-tools.

If the table produces a composite primary key or if the primary key is a column with string values, create a unique index based on them, and create another column for the primary key. Numerical. It is easier to organize connections and sample and cascade operations will be faster.

Create indexes for the fields that are planned for frequent searches.

Maintain data integrity with triggers and use stored procedures to modify and access data.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL

Terms: Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL