Multi-user database processing systems

Lecture

1. EVOLUTION OF DATA PROCESSING CONCEPTS

Data processing has evolved over time. In the development of data processing concepts can distinguish the following steps:

· Database processing on mainframes using a DBMS;
· Database processing using remote data processing systems;
· Processing of local databases on a PC using desktop DBMS;
· Use of sharing systems (working with a centralized database using network versions of desktop DBMS);
· Use of client / server systems;
· Use of distributed database processing systems.

2. REMOTE TREATMENT SYSTEMS
The classic architecture for processing multiuser databases is remote processing.
Users process data in batch mode. Interactive access mode is carried out using terminals that do not have their own computing resources. Communication management programs (communications), application programs, DBMS and OS work on a single central computer. Since all processing is done by a single computer, the user interface of remote processing systems is usually quite simple. The remote processing scheme is shown in Figure 1.
Users (in Fig. 1, n-users are shown) work with terminals that transmit data and transaction messages to a central computer (remote processing computer). Data management functions are assigned to the operating system. The part of the OS responsible for managing communications receives messages and data and sends them to the appropriate application programs. Programs access the DBMS, and the DBMS performs database operations using the part of the OS that is responsible for processing the data. When the transaction is completed, the communication management subsystem returns the results to users sitting at the terminals. Since their user interface is quite simple and mainly text-oriented, all output formatting commands are generated by the central computer processor and transmitted over the communications line. Such systems, such as those described, are called remote processing systems , since the connection between the inputs and the outputs is via a central computer located at a distance that processes the data.
The advantage of such processing is the possibility of collective use of resources and equipment, centralized data storage, and the disadvantage is the lack of personalization of the working environment (all software is stored centrally and used collectively). Historically, remote processing systems have been the most common alternative to multi-user database systems. But as PCs began to appear in offices and their power as data servers grew, new multi-user data processing architectures emerged.
3. SYSTEM OF JOINT USE OF FILES
3.1. File / server architecture and the role of desktop DBMS in it
In the presence of a computer network, it is possible to store and use in multi-user mode centralized databases hosted on a single computer - the network server. In this case, each user of his PC gets access to a centralized database shared by all users. There are various concepts of network data processing.
Consider a file-sharing architecture that was designed before the client / server architecture and in many ways is fairly simplified.
Almost all file-sharing systems use local networks. This architecture is characterized by collective access to a common database on a server, which is a file server . The file server contains the files necessary for the operation of the applications and the DBMS itself. It ensures the functioning of the part of the network version of the DBMS that manages the data in the database. However, user applications and the network DBMS itself are located and function on separate workstations and access the file server as needed.
Consider the organization of the file / server architecture using a desktop DBMS.
Network versions of desktop DBMSs differ from local versions in that they have some special mechanisms that allow many users to access shared data resources from a centralized database together. The DBMS at each workstation sends requests to the file server for all the data it needs, which are stored on the disk of the file server. All data from the database is sent to the user's computer, regardless of how much they really need to fulfill the request. As a result, a local copy of the database is created on the user's computer (updated from the real database on the server from time to time). Then the user's DBMS executes the request. The scheme of work with a desktop DBMS in multi-user mode is shown in Fig. 2
3.2. Disadvantages of the file / server architecture
The file server architecture has the following main disadvantages.
1. Since the file server cannot process SQL queries, large amounts of data are transferred when sharing files over a local network (full copies of the database are moved across the network from the server to the client’s computer). With this architecture, traffic [1] on the local network is quite large.
2. With an increase in the amount of stored data and the number of users, the performance of desktop DBMS decreases. Because of these problems, file-sharing systems are rarely used to process large amounts of data.
3. With such an architecture, the entire burden of performing the database query and database integrity management rests with the user's DBMS.
3. At each workstation there should be the network version of the desktop DBMS itself, which requires large amounts of RAM on the user's computer.
4. Multiple users can access the same files at once, which complicates the integrity management and recovery of the database on the server.
3.3. Advantages and disadvantages of desktop DBMS
Advantages of desktop DBMS:
- · They are easy to learn and use;
- · Have a friendly user interface;
- · Focused on the PC class, on the widest category of non-professional users;
- · Provide good speed when working with small databases.
Disadvantages of desktop DBMS:
- · With an increase in the volume of stored data and an increase in the number of users, their performance decreases and there may be failures in data processing;
- · Integrity monitoring is performed within the user application, which may cause a data integrity violation;
- · Very low efficiency of work in the computer network.
More than a dozen desktop DBMS are known. The most popular, based on the number of copies sold, are DBASE, Visual DBASE, Paradox, Microsoft FoxPro, Visual FoxPro, Access. A more detailed description of the mentioned DBMS is given in [1].
[1] Traffic is the amount of information transmitted over a network for a certain period of time.
4 CUSTOMER / SERVER SYSTEMS
4.1. Clients, servers
The most effective work with a centralized database is provided by the client / server architecture. Unlike the remote processing system, in which there is only one computer, the client / server system consists of many computers connected to a network. Computers called clients are involved in processing applications. Computers, called servers , handle the database.
The type of computers used as clients may be different, they may be mainframes or microcomputers. However, as a rule, client functions are almost always performed by PCs. The server can be any type of computer, but for economic reasons, server functions are most often also performed by PCs, but with higher performance.
4.2. Client applications, database servers
The network server hosts the database and installs a powerful server database management system — a database server . A database server is a software component that provides storage of large volumes of information, its processing and presentation to users in a network mode.
On the client computer, the client application makes a query to the database. The server database management system provides interpretation of the request, its execution, generation of the result of the request and sending it over the network to the client computer. The client application interprets it as necessary and presents it to the user. The client application can also send a request to update the database and the server database will make the necessary changes to the database. The client / server architecture diagram is shown in Fig. 3
In the client / server architecture, the functions of the client application and the server database are separated.
The functions of the client application are divided into the following groups:
- · Data input / output (presentation logic) is a part of the client application code that determines what the user sees on the screen when working with the application;
- · Business logic is a part of the client application code that defines the algorithm for solving specific tasks of the application;
- · Data processing within the application (database logic) is a part of the client application code that associates server data with the application. For this connection, a procedural SQL query language is used, with the help of which data is sampled and modified in server DBMSs.
The database server generally performs a whole range of data management actions. The main ones are the following:
- · Fulfillment of user requests for the selection and modification of data and metadata received from client applications operating on a PC in the local network;
- · Storage and backup of data;
- · Support of referential integrity of data according to the rules defined in the database;
- · Providing authorized access to data based on the verification of the rights and privileges of the user;
- · Transaction logging and transaction logging.
4.3. Understanding Stored Procedures and Triggers
In the modern client / server model, business logic is divided between the client and the server. On the server, business logic is implemented in the form of stored procedures — special software modules that are stored in the database and managed directly by the DBMS.
A stored procedure is a special procedure that is performed by the database server. Stored procedures are written in a procedural language, which depends on the specific DBMS. For writing stored procedures for MS SQL Server, the advanced standard of the SQL language - Transact-SQL is used. The stored procedure here is a sequence of Transact-SQL statements stored in the database. Stored procedures are precompiled, so their efficiency is higher than regular queries. They are executed directly on the server.
There are two types of stored procedures: system and user . System stored procedures are designed to obtain information from system tables and perform various maintenance operations and are especially useful when administering a database. Custom stored procedures are created directly by developers or database administrators. The usefulness of stored procedures is determined primarily by the high (compared to conventional Transact-SQL queries) the speed of their execution. However, the greatest effect is achieved when performing multiple repetitive operations. Custom stored procedures are used to solve virtually any problem. The user can get the right to execute the stored procedure, even if he does not have access rights to the objects accessed by the program.
The stored procedure is called explicitly, i.e. by directly accessing the procedure from the client application working with the database. Stored procedures are used to retrieve or modify data at any time. Stored procedures can take arguments at startup and return values as result sets.
The database logic is implemented using triggers. A trigger is a special type of stored procedure that is automatically executed on every attempt to change data. A trigger is always associated with a specific table and is executed when, when editing this table, an event occurs with which it is associated (for example, inserting, deleting, or updating a record). Each table can have an arbitrary number of triggers of any type. After the insert, update, delete operations, a trigger can be triggered, which will result in the calculation of business rules or the execution of certain actions. When you delete a table that has triggers, they are also deleted.
Triggers ensure data integrity by preventing unauthorized or incorrect changes. Triggers do not accept parameters and do not return values. They are executed implicitly, that is, the trigger is started only when you try to change the data. Triggers can have multiple levels of nesting (for example, in MS SQL Server DBMS, triggers have up to 32 levels of nesting), that is, the execution of one trigger triggers the execution of another trigger. A trigger is part of a transaction, therefore, if the trigger fails, the entire transaction is canceled. Conversely, if some part of the transaction is not completed, then the trigger will be canceled.
4.4. Benefits of client / server architecture
With client / server processing, network traffic is reduced, as only query results are transmitted through the network.
The load of file operations falls mainly on the server, which is more powerful than client computers and is therefore able to serve requests faster. As a result, the need for client applications in RAM is reduced.
Since servers are capable of storing large amounts of data, a considerable amount of disk space for other applications is released on client computers.
The level of data consistency increases and the level of database security significantly increases, since the data integrity rules are defined in the server database management system and are the same for all applications using this database.
It is possible to store business rules (for example, referential integrity rules or restrictions on data values) on the server, thus avoiding duplication of code in various client applications using a common database.
4.5. Characteristics of database servers
Modern server DBMS:
- · Exist in several versions for different platforms, as a rule, for different commercial versions of UNIX - Solaris, HP / UX. Many manufacturers also release versions of their database servers for Windows NT Workstation Windows 95/98, as well as versions for Linux;
- · In most cases come with convenient administrative utilities;
- · Carry out backup and archiving of data and transaction logs;
- · Support multiple replication scenarios;
- · Allow parallel processing of data in multiprocessor systems. Servers that allow parallel processing allow multiple processors to access the same database, which ensures high transaction processing speed;
- · Support data warehousing and OLAP. A data warehouse is a collection of data obtained directly or indirectly from its information systems, which contain current and business information, as well as from some external sources.
- · Perform distributed queries and transactions;
- · Make it possible to use various data design tools - universal or focused on a specific DBMS;
- · Have client development tools and report generators;
- · Support the publication of databases on the Internet;
- · Have broad capabilities for managing user privileges and access rights to various database objects.
Modern database servers include Oracle 9 (Oracle), MS SQL Server 2000 (MS), Informix (Informix), Sybase (Sybase), Db2 (IBM). A brief overview of server-based DBMS is given in the manual [2].

4.6. Database Access Mechanisms
All server-based DBMS have a client part that accesses the database through a DBMS. There is no direct connection between the client application and the DBMS, and software modules are additionally embedded that allow the client application to access the database created using different DBMS. Such modules are called data access mechanisms.
There are two main ways to access data from client applications: using the application interface and using the universal software interface.
An application programming interface (API - Application Programming Interface) is a set of functions that are called from a client application. It can work only with the DBMS of this manufacturer, and when replacing it, you will have to rewrite a significant part of the client application code. The application programming interface is different for different DBMS.
The universal data access mechanism provides the ability to use the same interface to access different types of DBMS. Usually it is implemented in the form of special additional modules, called drivers.
The most common software interface that provides data access to a specific database is Microsoft’s Open Database Connectivity (ODBC). As part of ODBC, a software application directly interacts with the dispatcher driver, sending it ODBC calls. The driver manager is responsible for dynamically loading the required ODBC driver through which it accesses the database server. The ODBC driver makes all calls to the ODBC functions and "translates" them into the data source language. The DBMS stores and outputs data in response to requests from the ODBC driver.
Задание ODBC-источникаданных является действием, которое осуществляется средствами операционной системы, управляющей компьютером.В операционной системе Windows в Панели управления предусмотрен пункт Исочники данных ODBC (32 разр) из которого вызывается Администратор источников данных ODBC. С его помощью могут быть заданы:
- · пользовательский DSN – источник данных, доступный только текущему пользователю на текущем компьютере;
- · файловый DSN – источник данных, которые могут применять совместно различные пользователи, у которых установлены одинаковые ODBC-драйверы;
- · системный DSN – источник данных, доступный всем пользователям и службам текущего компьютера.
5. СИСТЕМЫ ОБРАБОТКИ РАСПРЕДЕЛЕННЫХ БАЗ ДАННЫХ
5.1. Понятие и архитектура распределенной БД
Распределенная БД (РаБД) – набор логически связанных между собой разделяемых данных и их описаний, которые физически распределены по нескольким компьютерам ( узлам) в некоторой компьютерной сети.
Каждая таблица в РАБД может быть разделена на некоторое количество частей, называемых фрагментами . Фрагменты могут быть горизонтальными , вертикальными и смешанными . Горизонтальные фрагменты представляют собой подмножества строк, а вертикальные – подмножества столбцов. Фрагменты распределяются на одном или нескольких узлах.
С целью улучшения доступности данных и повышения производительности системы для отдельных фрагментов может быть организована репликация – поддержка актуальной копии некоторого фрагмента на нескольких различных узлах. Репликаты – множество различных физических копий некоторого объекта БД, для которых в соответствии с определенными в БД правилами поддерживается синхронизация с некоторой «главной копией».
Существуют несколько альтернативных стратегий размещения данных в системе: раздельное (фрагментированное) размещение, размещение с полной репликацией и размещение с выборочной репликацией.
Раздельное (фрагментированное) размещение. В этом случае БД разбивается на непересекающиеся фрагменты, каждый из которых размещается на одном из узлов системы. При отсутствии репликации стоимость хранения данных будет минимальна, но при этом будет невысок также уровень надежности и доступности данных в системе. Отказ на любом из узлов вызовет утрату доступа только к той части данных, которая на нем хранилась.
Размещение с полной репликацией. Эта стратегия предусматривает размещение полной копии всей БД на каждом из узлов системы. Следовательно, надежность и доступность данных, а также уровень производительности системы будут максимальными. Однако стоимость хранения данных и уровень затрат на передачу данных в этом случае будут самыми высокими.
Размещение с выборочной репликацией. Данная стратегия представляет собой комбинацию методов фрагментации, репликации и централизации. Одни массивы данных разделяются на фрагменты, тогда как другие подвергаются репликации. Все остальные данные хранятся централизованно. Целью применения данного метода является объединение всех преимуществ, существующих в остальных моделях, с одновременным исключением свойственных им недостатков. Благодаря своей гибкости, именно эта стратегия используется чаще всего.
5.2. Распределенная СУБД
Работу с РаБД обеспечивают распределенные СУБД. Распределенная СУБД (РаСУБД) – комплекс программ, предназначенный для управления распределенной БД и позволяющий сделать распределенность информации «прозрачной» для конечного пользователя. Из определения РаСУБД следует, что для конечного пользователя должен быть полностью скрыт тот факт, что распределенная БД состоит из нескольких фрагментов, которые могут размещаться на нескольких компьютерах, расположенных в сети и к ней возможен параллельный доступ нескольких пользователей. Назначение обеспечения «прозрачности» состоит в том, чтобы распределенная система внешне вела себя точно так же, как и централизованная. Такое распределение данных позволяет, например, хранить в узле сети те данные, которые наиболее часто используются в этом узле. Такой подход облегчает и ускоряет работу с этими данными и оставляет возможность работать с остальными данными БД, хотя для доступа к ним требуется потратить некоторое время на передачу данных по сети.
Основная задача РаСУБД состоит в обеспечении средств интеграции локальных баз данных, располагающихся в некоторых узлах компьютерной сети, с тем, чтобы пользователь, работающий в любом узле сети, имел доступ ко всем этим БД как к единой БД. Другими словами, для клиентских приложений РаБД представляется не набором баз, а единым целым. Каждый фрагмент БД сохраняется на одном или нескольких компьютерах, которые соединены между собой линиями связи и каждый из них работает под управлением отдельной СУБД. Пользователи взаимодействуют с РаБД через приложения. Приложения могут быть классифицированы как те, которые не требуют доступа к данным на других узлах (локальные приложения), и те, которые требуют подобного доступа (глобальные приложения). В РаСУБД должно существовать хотя бы одно глобальное приложение, поэтому любая РаСУБД должна имеет следующие особенности:
- · набор логически связанных разделяемых данных;
- · сохраняемые данные разбиты на некоторое количество фрагментов;
- · между фрагментами может быть организована репликация данных;
- · фрагменты и их реплики распределены по различным узлам;
- · узлы связаны между собой сетевыми соединениями;
- · работа с данными на каждом узле управляется локальной СУБД.
СУБД на каждом узле способна поддерживать автономную работу локальных приложений.

5.3. Гомогенные и гетерогенные распределенные БД
РаБД можно классифицировать на гомогенные и гетерогенные .
Гомогенной РаБД управляет один и тот же тип СУБД. Гетерогенной РаБД управляют различные типы СУБД, использующие разные модели данных – реляционные, сетевые, иерархические или объектно-ориентированные СУБД.
Гомогенные РаБД значительно проще проектировать и сопровождать. Кроме того, подобный подход позволяет поэтапно наращивать размеры РаБД, последовательно добавляя новые узлы к уже существующей РаБД. Гетерогенные РаБД обычно возникают в тех случаях, когда независимые узлы, управляемые своей собственной СУБД, интегрируются во вновь создаваемую РаБД.

5.4. Двенадцать правил К. Дейта для РаБД и РаСУБД
К. Дейтом были сформулированы 12 правил (1987) для типичной РаБД. Основой этих правил является то, что РАБД должна восприниматься пользователем точно так же, как и привычная централизованная БД.
1. Локальная автономность . В данном контексте автономность означает следующее:
- · локальные данные принадлежат локальным владельцам и сопровождаются локально;
- · все локальные процессы остаются чисто локальными;
- · все процессы на заданном узле контролируются только этим узлом.
2. Отсутствие опоры на центральный узел . В системе не должно быть ни одного узла, без которого система не сможет функционировать, т.е. никакой конкретный сервис (управление транзакциями, оптимизация запросов и др.) не должен возлагаться на какой-либо специально выделенный центральный узел.
3. Непрерывное функционирование. В идеале в системе не должна возникать потребность в плановом останове ее функционирования.
4. Независимость от расположения . Пользователь должен получать доступ к базе данных с любого узла, причем получать доступ к любым данным, независимо от того, где они физически сохраняются.
5. Независимость от фрагментации . Пользователь должен получать доступ к данным независимо от способа их фрагментации.
6. Независимость от репликации . Пользователь не должен нуждаться в сведениях о наличии репликации данных, т.е. пользователь не будет иметь средств для получения прямого доступа к конкретной копии элемента данных, а также не должен заботиться об обновлении уже имеющейся копии.
7. Обработка распределенных запросов . Система должна поддерживать обработку запросов, ссылающиеся на данные, расположенные более чем на одном узле.
8. Обработка распределенных транзакций . Система должна поддерживать выполнение транзакций.
9. Независимость от типа оборудования . Система должна быть способна функционировать на оборудовании с различными вычислительными платформами.
10. Независимость от сетевой архитектуры . Система должна быть способна функционировать в сетях с различной архитектурой.
11. Независимость от операционной системы . Система должна быть способна функционировать под управлением различных операционных систем.
12. Независимость от типа СУБД .

5.5. Обработка распределенных запросов
В распределенной среде работа системы не должна демонстрировать никакого снижения производительности, связанного с его распределенной архитектурой, например с присутствием медленных сетевых соединений. РаСУБД должна находить наиболее эффективные стратегии выполнения запросов. В распределенной среде обработчик распределенных запросов отображает запрос на доступ к данным в упорядоченную последовательность операций локальных баз данных (в отличие от централизованной, где обработчик запросов оценивает каждый запрос на доступ к данным, а выполнение его представляет собой упорядоченную последовательность операций с БД). Дополнительная сложность возникает из-за необходимости учитывать наличие фрагментации, репликации и определенной схемы размещения данных. Обработчик распределенных запросов должен выяснить:
- · к какому фрагменту следует обратиться;
- · какую копию фрагмента использовать, если его данные реплицируются;
- · какое из местоположений должно использоваться.
Возможности выполнения распределенного запроса поддерживаются сейчас почти всеми серверными СУБД (по крайней мере в том случае, когда в транзакцию вовлечены серверы от одного производителя). С этой целью используется механизм двухфазного завершения транзакций, когда на первом этапе серверы, вовлеченные в транзакцию, сигнализируют о готовности ее завершить, а на втором этапе происходит реальная фиксация изменений в БД.

5.6. Преимущества и недостатки РАСУБД
Системы с распределенными БД имеют дополнительные преимущества перед традиционными централизованными системами баз данных.
Преимущества РаСУБД :
1. Отражение структуры организации.
2. Разделяемость и локальная автономность.
3. Повышение доступности данных.
4. Повышение надежности.
5. Повышение производительности.
6. Экономические выгоды.
7. Модульность системы.
Недостатки РаСУБД :
1. Повышение сложности. РаСУБД являются более сложными программными комплексами, чем централизованные СУБД, что обусловлено распределенной природой используемых ими данных, а также репликацией данных.
2. Увеличение стоимости. Увеличение сложности означает и увеличение затрат на приобретение и сопровождение РаСУБД.
3. Проблемы защиты. В централизованных системах доступ к данным легко контролируется. Однако в распределенных системах требуется организовать контроль доступа не только к данным, реплицируемым на несколько различных узлов, но и защиту сетевых соединений самих по себе.
4. Complicating data integrity monitoring. In the RDBMS, the increased cost of data transmission and processing can hamper the organization of effective protection against data integrity violations.
5. Lack of standards. There are no standards for communication channels and data access protocols, and there are also no tools and methodologies that can help users transform centralized systems into distributed ones.
6. Lack of experience. The necessary experience of industrial operation of distributed systems has not yet been accumulated, comparable to the experience of operating centralized systems.
7. Complication of the database development procedure. The development of distributed databases, in addition to the usual difficulties associated with the design of centralized databases, requires deciding on the fragmentation of data, the distribution of fragments into individual nodes and the organization of data replication procedures.
8. The complexity of management and the resulting potential risk of loss of data integrity.

5.7. Overview of distributed DBMS
Currently, the most developed in theoretical and practical terms are relational distributed DBMS. The most studied RANSDBs include:
- · System SDD-1, created in the late 70s-early 80s in the research department of the company Computer Corporation of America;
- · System R ^* , which is a distributed version of System R and created in the early 80s by IBM;
- · Distributed INGRES system, which is a distributed version of the INGRES system and was created in the early 80s at the University of California at Berkeley.
Currently, most commercial relational server database systems provide different types of support for the use of distributed databases. The functions of the distributed DBMS are most fully implemented in the systems:
- · INGRES / STAR, developed by the Ingres Division of The ASK Group Inc .;
- · ORACLE 7 by ORACLE Corp .;
- · The IBM distributed DB2 system module.
Most closely approached the implementation of functions of distributed DBMS such as:
- · Informix On-line company Informix Software;
- · Sybase System 10 by Sybase Inc.

Comments

To leave a comment

If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.

To reply

Comment

To confirm that you are not a bot, answer:

Name

Email(not published)

Vote

Multi-user database processing systems

Comments

To leave a comment

Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL

Terms: Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL