6. Normal forms of relationships. Database Design, Types of Normal Forms

Lecture



Stages of database development

The purpose of the development of any database is the storage and use of information about any subject area. To achieve this goal, the following tools are available:

  1. The relational data model is a convenient way to present data in a subject area.
  2. The SQL language is a universal way to manipulate such data.

However, it is obvious that for the same subject domain, relational relationships can be designed in many different ways. For example, you can design multiple relationships with a large number of attributes, or vice versa, spread all attributes across a large number of small relationships. How to determine on what grounds you need to put the attributes in certain relationships?

This chapter discusses the ways of "good" or "correct" design of relational relations. We will first discuss what "good" or "correct" data models mean. Then the concepts of the first, second and third normal forms of relations (1NF, 2NF, 3NF) will be introduced and it is shown that the relations in the third normal form are “good”.

When developing a database, several modeling levels are usually distinguished, with the help of which the transition from the subject area to a specific database implementation by means of a specific DBMS occurs. The following levels can be distinguished:

  • The subject area itself
  • Domain Model
  • Logical data model
  • Physical data model
  • Actually database and applications

The subject area is a part of the real world, the data about which we want to be reflected in the database. For example, you can choose an enterprise accounting department, personnel department, bank, shop, etc. as a subject area. The subject area is infinite and contains both essential concepts and data, as well as little or no meaningful data. So, if you choose to take stock of goods in the warehouse as the subject area, then the concepts of "consignment note" and "invoice" are essential concepts, and the fact that an employee who accepts invoices has two children is not important for accounting of goods. However, from the point of view of the personnel department, data on the presence of children are essential. Thus, the importance of data depends on the choice of subject area.

Domain Model The domain model is our knowledge of the domain. Knowledge can be either in the form of informal knowledge in the brain of an expert, or expressed formally by any means. Such means may include textual descriptions of the subject area, sets of job descriptions, rules for conducting business in a company, etc. Experience shows that the text way of presenting a domain model is extremely inefficient. Much more informative and useful in the development of databases are descriptions of the subject area, made using specialized graphic notations. There are a large number of techniques for describing the subject area. Among the most well-known, we can call the SADT structural analysis method and IDEF0 based on it, Gein-Sarson data flow diagrams, the UML object-oriented analysis methodology, etc. The domain model describes more likely the processes that take place in the data domain and the data used by these processes. The success of further application development depends on how correctly the subject area is modeled.

Logical data model At the next lower level is the logical data model of the subject area. The logical model describes the concepts of the subject area, their relationship, as well as restrictions on the data imposed by the subject area. Examples of concepts - "employee", "department", "project", "salary". Examples of interrelationships between concepts - “an employee is listed in exactly one department”, “an employee can carry out several projects”, “several employees can work on one project”. Examples of restrictions - "employee age not less than 16 and not more than 60 years."

The logical data model is the initial prototype of the future database. The logical model is built in terms of information units, but without reference to a specific DBMS . Moreover, the logical data model does not necessarily have to be expressed by the means of the relational data model. The main means of developing a logical data model at the moment are various variants of ER-diagrams ( Entity-Relationship , entity-relationship diagrams ). The same ER model can be converted into a relational data model as well as a data model for hierarchical and networked DBMSs, or into a post-relational data model. However, since If we consider relational DBMS, then we can assume that the logical data model for us is formulated in terms of the relational data model.

Decisions made at the previous level, when developing a domain model, define some bounds within which a logical data model can be developed, and within these bounds, various decisions can be made. For example, the inventory domain model contains the concepts of “warehouse”, “invoice”, “product”. When developing an appropriate relational model, these terms must be used, but there are many different ways of implementation - you can create one relation in which the “warehouse”, “consignment note”, “goods” will be present, and you can create three separate relationships one for each concept.

When developing a logical data model, the following questions arise: are relationships well designed? Do they correctly reflect the domain model and, therefore, the domain itself?

Physical data model . At a lower level is the physical data model. The physical data model describes the data by means of a specific DBMS. We will assume that the physical data model is implemented by means of a relational DBMS, although, as mentioned above, this is not necessary. Relations developed at the stage of forming a logical data model are converted into tables, attributes become table columns, unique indexes are created for key attributes, domains are converted to data types accepted in a specific DBMS.

Constraints in the logical data model are implemented by various DBMS tools, for example, using indexes, declarative integrity constraints, triggers, and stored procedures. At the same time, again, decisions made at the level of logical modeling define some boundaries within which a physical data model can be developed. Similarly, various decisions can be made within these limits. For example, the relationships contained in the logical data model should be converted into tables, but for each table you can additionally declare different indexes that increase the speed of access to the data. Much depends on the specific DBMS.

When developing a physical data model, the following questions arise: are the tables well designed? Are the indexes correct? How much software code in the form of triggers and stored procedures need to be developed to maintain data integrity?

Actually database and applications . And finally, as a result of the previous steps, the database itself appears. The database is implemented on a specific software and hardware basis, and the choice of this basis allows you to significantly increase the speed of work with the database. For example, you can choose different types of computers, change the number of processors, RAM, disk subsystems, etc. Setting up the DBMS within the selected software and hardware platform is also very important.

But again, the decisions made at the previous level - the level of physical design, define the boundaries within which you can make decisions on the choice of software and hardware platform and DBMS settings.

Thus, it is clear that decisions made at each stage of modeling and database development will affect further steps. Therefore, making the right decisions at the early stages of modeling plays a special role.

Criteria for assessing the quality of a logical data model

The purpose of this chapter is to describe some principles for building good logical data models . Good in the sense that decisions made during the logical design process would lead to good physical models and ultimately good database performance.

In order to assess the quality of decisions made at the level of a logical data model, it is necessary to formulate some quality criteria in terms of a physical model and a specific implementation and see how the various decisions made during the logical modeling process affect the quality of the physical model and the speed of the database.

Of course, there may be a lot of such criteria and their choice is sufficiently arbitrary. We will consider some of these criteria, which are certainly important in terms of obtaining a high-quality database:

  • Adequacy of the domain database
  • Ease of database development and maintenance
  • Speed ​​of data update operations (insert, update, delete tuples)
  • Speed ​​of performing data fetching operations

Adequacy of the domain database

The database should adequately reflect the subject area. This means that the following conditions must be met:

  1. The state of the database at each point in time must correspond to the state of the subject area.
  2. Changing the state of the subject area should lead to a corresponding change in the state of the database.
  3. The domain constraints reflected in the domain model should be reflected and taken into account in a database in some way.

Ease of database development and maintenance

Virtually any database, with the exception of the completely elementary, contains a certain amount of program code in the form of triggers and stored procedures.

Stored procedures are procedures and functions that are stored directly in the database in compiled form and that can be run by users or applications that work with the database. Stored procedures are usually written either in a special procedural extension of the SQL language (for example, PL / SQL for ORACLE or Transact-SQL for MS SQL Server), or in some universal programming language, for example, C ++, with the inclusion of SQL statements in the code in accordance with special rules for such inclusion. The main purpose of stored procedures is the implementation of business processes in the subject area.

Triggers are stored procedures associated with some events that occur during database operation. These events are inserts, updates and deletes rows of tables. If a certain trigger is defined in the database, it will automatically start whenever an event occurs with which this trigger is associated. Very important is that the user can not bypass the trigger. A trigger is triggered regardless of which user or method triggered the event that triggered the trigger. Thus, the main purpose of triggers is to automatically maintain the integrity of the database. Triggers can be either quite simple, for example, supporting referential integrity, or quite complex, implementing any complex domain restrictions or complex actions that must occur when certain events occur. For example, a trigger may be associated with the operation of inserting a new product into an invoice; it performs the following actions — checks whether there is a necessary quantity of goods, adds a product to the invoice if there is a product, and reduces data on the availability of goods in a warehouse; missing goods and immediately sends the order by e-mail to the supplier.

It is obvious that the more program code in the form of triggers and stored procedures a database contains, the more difficult is its development and further maintenance.

Speed ​​of data update operations (insert, update, delete)

At the level of logical modeling, we define relational relationships and attributes of these relationships. At this level, we cannot define any physical storage structures (indices, hashing, etc.). The only thing we can manage is the distribution of attributes across different relationships. You can describe few relationships with a large number of attributes, or many relationships, each of which contains few attributes. Thus, it is necessary to try to answer the question - does the number of relations and the number of attributes in relations affect the speed of data update operations. Such a question, of course, is not quite correct, because The speed with which database operations are performed depends strongly on the physical implementation of the database. Nevertheless, we will try to qualitatively evaluate this effect with the same approaches to physical modeling .

The main operations that change the state of the database are the operations of inserting, updating and deleting records. In databases that require constant changes (warehouse accounting, ticket sales, etc.), performance is determined by the speed at which a large number of small insert, update, and delete operations are performed.

Consider inserting a record into a table. The record is inserted into one of the free pages of memory allocated for this table. The DBMS permanently stores information about the presence and location of free pages. If no indexes are created for the table, then the insert operation is performed at virtually the same speed regardless of the size of the table and the number of attributes in the table. If there are indexes in the table, then when performing a record insert operation, the indexes must be rebuilt. Thus, the speed of the insert operation decreases with an increase in the number of indexes on the table and depends little on the number of rows in the table.

Consider updating and deleting records from a table. Before you update or delete a record, you need to find it. If the table is not indexed, then the only way to search is to successively scan the table to find the desired record. In this case, the speed of update and delete operations increases significantly with an increase in the number of records in the table and does not depend on the number of attributes. But in fact, non-indexed tables are almost never used. For each table, one or more indexes are usually declared corresponding to potential keys. Using these indexes, the record search is performed very quickly and practically does not depend on the number of rows and attributes in the table (although, of course, there is some dependence). If multiple indexes are declared for a table, then during the update and delete operations, these indexes must be rebuilt, which takes additional time. Thus, the speed of the update and delete operations also decreases with an increase in the number of indexes on the table and depends little on the number of rows in the table.

It can be assumed that the more attributes a table has, the more indexes will be declared for it. This dependence, of course, is not direct, but with the same approaches to physical modeling, this is usually the case. Thus, it can be assumed that the more attributes have relationships developed in the course of logical modeling, the slower the data updating operations will be performed , at the expense of time spent on rebuilding a larger number of indices.

Additional considerations in favor of the above thesis on the slowing down of data updating operations (the effect of logging, the length of rows of tables) are given in the work of A.Prokhorov [27].

Speed ​​of data fetching operations

One of the purposes of the database is to provide information to users. Information is retrieved from a relational database using a SQL - SELECT statement. One of the most expensive operations when executing a SELECT statement is the operation of joining tables. Thus, the more interrelated relationships were created during logical modeling, the more likely it is that when the queries are executed these relationships will connect, and, consequently, the slower the queries will be executed. Thus, an increase in the number of relations leads to a slower execution of data retrieval operations, especially if the requests are not known in advance.

Basic example

Consider as a subject area some organization that performs some projects. We describe the domain model with the following informal text:

  1. Employees of the organization carry out projects.
  2. Projects consist of several tasks.
  3. Each employee can participate in one or several projects, or temporarily not participate in any projects.
  4. Several employees may work on each project, or the project may be temporarily suspended, then no employee is working on it.
  5. On each task in the project is exactly one employee.
  6. Each employee is listed in one department.
  7. Each employee has a telephone located in the employee’s department.

In the course of additional clarification of what data should be taken into account, it turned out the following:

  1. Each employee must keep a personnel number and last name. Personnel number is unique for each employee.
  2. Each department has a unique number.
  3. Each project has a number and name. Project number is unique.
  4. Each work from the project has a number unique within the project. Works in different projects may have the same numbers.

1NF (First Normal Form)

The concept of the first normal form has already been discussed in Chapter 2. The first normal form ( 1NF ) is a common relationship. According to our definition of relationships, any relationship is automatically already in 1NF. Let us briefly recall the properties of relations (these will be the properties of 1NF):

  • With respect to no identical tuples.
  • Tuples are not ordered.
  • Attributes are not ordered and are distinguished by name.
  • All attribute values ​​are atomic.

6. Normal forms of relationships.  Database Design, Types of Normal Forms

6. Normal forms of relationships.  Database Design, Types of Normal Forms

6. Normal forms of relationships.  Database Design, Types of Normal Forms

In the course of logical modeling, in the first step, it was proposed to store data in one respect, having the following attributes:

СОТРУДНИКИ_ОТДЕЛЫ_ПРОЕКТЫ ( Н_СОТР , ФАМ , Н_ОТД , ТЕЛ , Н_ПРО , ПРОЕКТ , Н_ЗАДАН )

Where

Н_СОТР - табельный номер сотрудника

ФАМ - фамилия сотрудника

Н_ОТД - номер отдела, в котором числится сотрудник

ТЕЛ - телефон сотрудника

Н_ПРО - номер проекта, над которым работает сотрудник

ПРОЕКТ - наименование проекта, над которым работает сотрудник

Н_ЗАДАН - номер задания, над которым работает сотрудник

Because каждый сотрудник в каждом проекте выполняет ровно одно задание, то в качестве потенциального ключа отношения необходимо взять пару атрибутов { Н_СОТР , Н_ПРО }.

В текущий момент состояние предметной области отражается следующими фактами:

  • Сотрудник Иванов, работающий в 1 отделе, выполняет в первом проекте "Космос" задание 1 и во втором проекте "Климат" задание 1.
  • Сотрудник Петров, работающий в 1 отделе, выполняет в первом проекте "Космос" задание 2.
  • Сотрудник Сидоров, работающий во 2 отделе, выполняет в первом проекте "Космос" задание 3 и во втором проекте "Климат" задание 2.

Это состояние отражается в таблице (курсивом выделены ключевые атрибуты):

Н_СОТР ФАМ Н_ОТД ТЕЛ Н_ПРО PROJECT Н_ЗАДАН
one Ivanov one 11-22-33 one Space one
one Ivanov one 11-22-33 2 Climate one
2 Петров one 11-22-33 one Space 2
3 Сидоров 2 33-22-11 one Space 3
3 Сидоров 2 33-22-11 2 Climate 2

Таблица 1 Отношение СОТРУДНИКИ_ОТДЕЛЫ_ПРОЕКТЫ

Аномалии обновления

Даже одного взгляда на таблицу отношения СОТРУДНИКИ_ОТДЕЛЫ_ПРОЕКТЫ достаточно, чтобы увидеть, что данные хранятся в ней с большой избыточностью . Во многих строках повторяются фамилии сотрудников, номера телефонов, наименования проектов. Кроме того, в данном отношении хранятся вместе независимые друг от друга данные - и данные о сотрудниках, и об отделах, и о проектах, и о работах по проектам. Пока никаких действий с отношением не производится, это не страшно. Но как только состояние предметной области изменяется, то, при попытках соответствующим образом изменить состояние базы данных, возникает большое количество проблем.

Исторически эти проблемы получили название аномалии обновления . Попытки дать строгое понятие аномалии в базе данных не являются вполне удовлетворительными [51, 7]. В данных работах аномалии определены как противоречие между моделью предметной области и физической моделью данных, поддерживаемых средствами конкретной СУБД. "Аномалии возникают в том случае, когда наши знания о предметной области оказываются, по каким-то причинам, невыразимыми в схеме БД или входящими в противоречие с ней" [7]. Мы придерживаемся другой точки зрения, заключающейся в том, что аномалий в смысле определений упомянутых авторов нет, а есть либо неадекватность модели данных предметной области, либо некоторые дополнительные трудности в реализации ограничений предметной области средствами СУБД. Более глубокое обсуждение проблемы строгого определения понятия аномалий выходит за пределы данной работы.

Таким образом, мы будем придерживаться интуитивного понятия аномалии как неадекватности модели данных предметной области, (что говорит на самом деле о том, что логическая модель данных попросту неверна!) или как необходимости дополнительных усилий для реализации всех ограничений определенных в предметной области (дополнительный программный код в виде триггеров или хранимых процедур).

Because аномалии проявляют себя при выполнении операций, изменяющих состояние базы данных, то различают следующие виды аномалий:

  • Аномалии вставки (INSERT)
  • Аномалии обновления (UPDATE)
  • Аномалии удаления (DELETE)

В отношении СОТРУДНИКИ_ОТДЕЛЫ_ПРОЕКТЫ можно привести примеры следующих аномалий:

Аномалии вставки (INSERT)

В отношение СОТРУДНИКИ_ОТДЕЛЫ_ПРОЕКТЫ нельзя вставить данные о сотруднике, который пока не участвует ни в одном проекте. Действительно, если, например, во втором отделе появляется новый сотрудник, скажем, Пушников, и он пока не участвует ни в одном проекте, то мы должны вставить в отношение кортеж (4, Пушников, 2, 33-22-11, null, null, null). Это сделать невозможно, т.к. атрибут Н_ПРО (номер проекта) входит в состав потенциального ключа, и, следовательно, не может содержать null-значений.

Точно также нельзя вставить данные о проекте, над которым пока не работает ни один сотрудник.

Причина аномалии - хранение в одном отношении разнородной информации (и о сотрудниках, и о проектах, и о работах по проекту).

The conclusion is that the logical data model is inadequate to the domain model. A database based on this model will not work correctly.

Update Anomaly (UPDATE)

Фамилии сотрудников, наименования проектов, номера телефонов повторяются во многих кортежах отношения. Поэтому если сотрудник меняет фамилию, или проект меняет наименование, или меняется номер телефона, то такие изменения необходимо одновременно выполнить во всех местах, где эта фамилия, наименование или номер телефона встречаются, иначе отношение станет некорректным (например, один и тот же проект в разных кортежах будет называться по-разному). Таким образом, обновление базы данных одним действием реализовать невозможно. Для поддержания отношения в целостном состоянии необходимо написать триггер, который при обновлении одной записи корректно исправлял бы данные и в других местах.

Причина аномалии - избыточность данных, также порожденная тем, что в одном отношении хранится разнородная информация.

Conclusion - increases the complexity of database development. A database based on such a model will work correctly only if there is additional program code in the form of triggers.

Deletion Anomalies (DELETE)

When deleting some data, other information may be lost. For example, if you close the project "Cosmos" and delete all the lines in which it is found, then all data about the employee Petrov will be lost. If you remove the employee Sidorov, you will lose information that in the department number 2 is the telephone 33-22-11. If the project is temporarily suspended, then deleting the data on the work on this project will also delete the data on the project itself (project name). Moreover, if there was an employee who worked only on this project, then the data about that employee would be lost.

Причина аномалии - хранение в одном отношении разнородной информации (и о сотрудниках, и о проектах, и о работах по проекту).

Вывод - логическая модель данных неадекватна модели предметной области. База данных, основанная на такой модели, будет работать неправильно.

Функциональные зависимости

Отношение СОТРУДНИКИ_ОТДЕЛЫ_ПРОЕКТЫ находится в 1НФ, при этом, как было показано выше, логическая модель данных не адекватна модели предметной области. Таким образом, первой нормальной формы недостаточно для правильного моделирования данных.

Определение функциональной зависимости

Для устранения указанных аномалий (а на самом деле для правильного проектирования модели данных !) применяется метод нормализации отношений. Нормализация основана на понятии функциональной зависимости атрибутов отношения.

Определение 1 . Let be 6. Normal forms of relationships.  Database Design, Types of Normal Forms - отношение. Множество атрибутов 6. Normal forms of relationships.  Database Design, Types of Normal Forms функционально зависимо от множества атрибутов 6. Normal forms of relationships.  Database Design, Types of Normal Forms ( 6. Normal forms of relationships.  Database Design, Types of Normal Forms функционально определяет 6. Normal forms of relationships.  Database Design, Types of Normal Forms ) тогда и только тогда, когда для любого состояния отношения 6. Normal forms of relationships.  Database Design, Types of Normal Forms для любых кортежей 6. Normal forms of relationships.  Database Design, Types of Normal Forms из того, что 6. Normal forms of relationships.  Database Design, Types of Normal Forms следует что 6. Normal forms of relationships.  Database Design, Types of Normal Forms (т.е. во всех кортежах, имеющих одинаковые значения атрибутов 6. Normal forms of relationships.  Database Design, Types of Normal Forms , значения атрибутов 6. Normal forms of relationships.  Database Design, Types of Normal Forms также совпадают в любом состоянии отношения 6. Normal forms of relationships.  Database Design, Types of Normal Forms ). Символически функциональная зависимость записывается

6. Normal forms of relationships.  Database Design, Types of Normal Forms .

The set of attributes 6. Normal forms of relationships.  Database Design, Types of Normal Formsis called the determinant of functional dependency , and the set of attributes 6. Normal forms of relationships.  Database Design, Types of Normal Formsis called the dependent part .

Comment.If attributes 6. Normal forms of relationships.  Database Design, Types of Normal Formsconstitute a potential relationship key 6. Normal forms of relationships.  Database Design, Types of Normal Forms, then any relationship attribute is 6. Normal forms of relationships.  Database Design, Types of Normal Formsfunctionally dependent on6. Normal forms of relationships.  Database Design, Types of Normal Forms .

Example 1 . In relation to the EMPLOYEES_DESIGNS_Projects , the following examples of functional dependencies can be given:

Attribute dependence on relationship key:

{ N_SOPR , N_PRO } 6. Normal forms of relationships.  Database Design, Types of Normal Forms FAM

{ N_SOTR , N_PRO } 6. Normal forms of relationships.  Database Design, Types of Normal Forms N_OTD

{ H_SOTR , N_PRO } 6. Normal forms of relationships.  Database Design, Types of Normal Forms TEL

{ Н_СОТР , Н_ПРО } 6. Normal forms of relationships.  Database Design, Types of Normal Forms ПРОЕКТ

{ Н_СОТР , Н_ПРО } 6. Normal forms of relationships.  Database Design, Types of Normal Forms Н_ЗАДАН

Зависимость атрибутов, характеризующих сотрудника от табельного номера сотрудника:

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms ФАМ

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms Н_ОТД

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms ТЕЛ

Зависимость наименования проекта от номера проекта:

Н_ПРО 6. Normal forms of relationships.  Database Design, Types of Normal Forms ПРОЕКТ

Зависимость номера телефона от номера отдела:

Н_ОТД 6. Normal forms of relationships.  Database Design, Types of Normal Forms ТЕЛ

Comment. Приведенные функциональные зависимости не выведены из внешнего вида отношения, приведенного в таблице 1. Эти зависимости отражают взаимосвязи, обнаруженные между объектами предметной области и являются дополнительными ограничениями, определяемыми предметной областью. Таким образом, функциональная зависимость - семантическое понятие . Она возникает, когда по значениям одних данных в предметной области можно определить значения других данных. Например, зная табельный номер сотрудника, можно определить его фамилию, по номеру отдела можно определить телефона. Функциональная зависимость задает дополнительные ограничения на данные, которые могут храниться в отношениях. Для корректности базы данных (адекватности предметной области) необходимо при выполнении операций модификации базы данных проверять все ограничения, определенные функциональными зависимостями.

Функциональные зависимости отношений и математическое понятие функциональной зависимости

Функциональная зависимость атрибутов отношения напоминает понятие функциональной зависимости в математике. Но это не одно и то же . Для сравнения напомним математическое понятие функциональной зависимости:

Определение 2 . Функциональная зависимость ( функция ) - это тройка объектов 6. Normal forms of relationships.  Database Design, Types of Normal Forms where

6. Normal forms of relationships.  Database Design, Types of Normal Forms - множество ( область определения ),

6. Normal forms of relationships.  Database Design, Types of Normal Forms - множество ( множество значений ),

6. Normal forms of relationships.  Database Design, Types of Normal Forms - правило, согласно которому каждому элементу 6. Normal forms of relationships.  Database Design, Types of Normal Forms ставится в соответствие один и только один элемент 6. Normal forms of relationships.  Database Design, Types of Normal Forms ( правило функциональной зависимости ).

Функциональная зависимость обычно обозначается как 6. Normal forms of relationships.  Database Design, Types of Normal Forms or 6. Normal forms of relationships.  Database Design, Types of Normal Forms .

Comment. Rule 6. Normal forms of relationships.  Database Design, Types of Normal Forms может быть задано любым способом - в виде формулы (чаще всего), при помощи таблицы значений, при помощи графика, текстовым описанием и т.д.

Функциональная зависимость атрибутов отношения тоже напоминает это определение. Действительно:

  • В качестве области определения выступает домен, на котором определен атрибут 6. Normal forms of relationships.  Database Design, Types of Normal Forms (или декартово произведение доменов, если 6. Normal forms of relationships.  Database Design, Types of Normal Forms является множеством атрибутов)
  • В качестве множества значений выступает домен, на котором определен атрибут 6. Normal forms of relationships.  Database Design, Types of Normal Forms (или декартово произведение доменов)
  • Rule 6. Normal forms of relationships.  Database Design, Types of Normal Forms реализуется следующим алгоритмом - 1) по данному значению атрибута 6. Normal forms of relationships.  Database Design, Types of Normal Forms найти любой кортеж отношения, содержащий это значение, 2) значение атрибута 6. Normal forms of relationships.  Database Design, Types of Normal Forms в этом кортеже и будет значением функциональной зависимости, соответствующим данному 6. Normal forms of relationships.  Database Design, Types of Normal Forms . Определение функциональной зависимости в отношении гарантирует, что найденное значение 6. Normal forms of relationships.  Database Design, Types of Normal Forms не зависит от выбора кортежа , поэтому правило 6. Normal forms of relationships.  Database Design, Types of Normal Forms

определено корректно.

Отличие от математического понятия отношения состоит в том, что, если рассматривать математическое понятие функции, то для фиксированного значения 6. Normal forms of relationships.  Database Design, Types of Normal Forms соответствующее значение функции 6. Normal forms of relationships.  Database Design, Types of Normal Forms всегда одно и то же . Например, если задана функция 6. Normal forms of relationships.  Database Design, Types of Normal Forms , то для значения 6. Normal forms of relationships.  Database Design, Types of Normal Forms соответствующее значение 6. Normal forms of relationships.  Database Design, Types of Normal Forms всегда будет равно 4. В противоположность этому в отношениях значение зависимого атрибута может принимать различные значения в различных состояниях базы данных. Например, атрибут ФАМ функционально зависит от атрибута Н_СОТР . Предположим, что сейчас сотрудник с табельным номером 1 имеет фамилию Иванов, т.е. при значении детерминанта равного 1, значение зависимого аргумента равно "Иванов". Но сотрудник может сменить фамилию, например на "Сидоров". Теперь при том же значении детерминанта, равного 1, значение зависимого аргумента равно "Сидоров".

Таким образом, понятие функциональной зависимости атрибутов нельзя считать полностью эквивалентным математическому понятию функциональной зависимости, т.к. значение этой зависимости различны при разных состояниях отношения, и, самое главное, эти значения могут меняться непредсказуемо .

Функциональная зависимость атрибутов утверждает лишь то, что для каждого конкретного состояния базы данных по значению одного атрибута (детерминанта) можно однозначно определить значение другого атрибута (зависимой части). Но конкретные значение зависимой части могут быть различны в различных состояниях базы данных.

2НФ (Вторая Нормальная Форма)

Определение 3 . Отношение 6. Normal forms of relationships.  Database Design, Types of Normal Forms находится во второй нормальной форме ( 2НФ ) тогда и только тогда, когда отношение находится в 1НФ и нет неключевых атрибутов, зависящих от части сложного ключа . ( Неключевой атрибут - это атрибут, не входящий в состав никакого потенциального ключа).

Comment. Если потенциальный ключ отношения является простым, то отношение автоматически находится в 2НФ.

Отношение СОТРУДНИКИ_ОТДЕЛЫ_ПРОЕКТЫ не находится в 2НФ, т.к. есть атрибуты, зависящие от части сложного ключа:

Зависимость атрибутов, характеризующих сотрудника от табельного номера сотрудника является зависимостью от части сложного ключа:

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms ФАМ

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms Н_ОТД

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms ТЕЛ

Зависимость наименования проекта от номера проекта является зависимостью от части сложного ключа:

Н_ПРО 6. Normal forms of relationships.  Database Design, Types of Normal Forms ПРОЕКТ

Для того, чтобы устранить зависимость атрибутов от части сложного ключа, нужно произвести декомпозицию отношения на несколько отношений. При этом те атрибуты, которые зависят от части сложного ключа, выносятся в отдельное отношение.

Отношение СОТРУДНИКИ_ОТДЕЛЫ_ПРОЕКТЫ декомпозируем на три отношения - СОТРУДНИКИ_ОТДЕЛЫ , ПРОЕКТЫ , ЗАДАНИЯ .

Отношение СОТРУДНИКИ_ОТДЕЛЫ ( Н_СОТР , ФАМ , Н_ОТД , ТЕЛ ):

Functional dependencies:

Зависимость атрибутов, характеризующих сотрудника от табельного номера сотрудника:

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms ФАМ

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms Н_ОТД

Н_СОТР 6. Normal forms of relationships.  Database Design, Types of Normal Forms ТЕЛ

Зависимость номера телефона от номера отдела:

Н_ОТД 6. Normal forms of relationships.  Database Design, Types of Normal Forms ТЕЛ

H_SOTR FAM N_OTD TEL
one Ivanov one 11-22-33
2 Petrov one 11-22-33
3 Sidorov 2 33-22-11

Table 2 Relationship EMPLOYEES_DEPARTMENTS

Attitude PROJECTS ( N_PRO , PROJECT ):

Functional dependencies:

N_PRO 6. Normal forms of relationships.  Database Design, Types of Normal Forms PROJECT

N_PRO PROJECT
one Space
2 Climate

Table 3 Attitudes PROJECTS

RELATIONSHIP TASKS ( N_SOPR , N_PRO , N_ZADAN ):

Functional dependencies:

{ Н_СОТР , Н_ПРО } 6. Normal forms of relationships.  Database Design, Types of Normal Forms N_ZADAN

H_SOTR N_PRO N_ZADAN
one one one
one 2 one
2 one 2
3 one 3
3 2 2

Table 4 Relationships TASKS

Analysis of decomposed relationships

Relations obtained as a result of decomposition are in 2NF. Indeed, the EMPLOYEES and PROJECTS relationships have simple keys, therefore they are automatically located in 2NF, the JOBS relation has a complex key, but the only non-key attribute N_ZADAN functionally depends on the entire key { Н_СОТР , Н_ПРО } .

Part of update anomalies resolved. So, data on employees and projects are now stored in different relationships, therefore, when employees who are not participating in any project appear, tuples are added to the EMPLOYEES_DEPARTMENTS relationship. Similarly, when a project appears that no employee is working on, a tuple is simply inserted into the PROJECTS relationship.

The names of employees and project names are now stored without redundancy. If the employee changes the name or the project changes the name, the update will be made in one place.

If the project is temporarily terminated, but the project itself is required to be preserved, then for this project the corresponding tuples are deleted for the TASK , and the data on the project itself and

продолжение следует...

Продолжение:


Часть 1 6. Normal forms of relationships. Database Design, Types of Normal Forms
Часть 2 - 6. Normal forms of relationships. Database Design, Types of


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL

Terms: Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL