1.3 Classification Theories

Lecture



Classification - a means of streamlining knowledge. In object-oriented analysis, defining common properties of objects helps to find common key abstractions and mechanisms, which in turn leads us to a simpler system architecture. Unfortunately, no rigorous classification methods have been developed yet, and there is no rule for distinguishing classes and objects. There are no such concepts as "perfect structure of classes", "correct choice of objects". As in many technical disciplines, the choice of classes is a compromise solution.

Fortunately, there is a wide experience of classification in other sciences, on the basis of which methods of object-oriented analysis have been developed. Each such technique offers its own rules (heuristics) for identifying classes and objects. They will be the subject of this lesson.

Defining classes and objects is one of the most difficult tasks of object-oriented design. Experience shows that this work usually contains elements of discovery and invention. With the help of discoveries, we recognize the key concepts and mechanisms that form the vocabulary of the subject area. With the help of the invention, we construct generalized concepts, as well as new mechanisms that define the rules for the interaction of objects. Therefore, discovery and invention are integral parts of a successful classification. The purpose of the classification is to find common properties of objects. By classifying, we combine into one group objects with the same structure or the same behavior.

Reasonable classification is undoubtedly a part of any science. Not surprisingly, the classification affects many aspects of object-oriented design. It helps to determine the hierarchy of inheritance and aggregation. Finding the general forms of interaction of objects, we introduce mechanisms that will be the foundation for the implementation of our project.

The fact that a reasonable classification is a difficult problem cannot be called news. And since there are parallels with similar difficulties in object-oriented design, consider examples of classification in biology.

Up to the XVIII century, the idea of ​​the possibility of classifying living organisms according to the degree of complexity was dominant. The measure of complexity was subjective, so it is not surprising that the person appeared on the list in the first place. In the middle of the 18th century, the Swedish botanist Karl Linney offered a more detailed taxonomy for the classification of organisms: he introduced the concepts of genus and species. A century later, Darwin advanced the theory that the mechanism of evolution is natural selection and the currently existing animal species are a product of the evolution of ancient organisms. Darwin's theory was based on a reasonable classification of species.

In the modern classification of living beings, groups of organisms with a common genetic history are distinguished, that is, organisms with similar DNA are included in one group. DNA classification is useful to distinguish between organisms that are similar in appearance but are genetically very different. According to modern views, dolphins are closer to cows than to trout.

Perhaps, for you, biology seems to be a mature, fully formed science with certain criteria for the classification of organisms. But it is not. Biologist May said: "To date, we do not even know the order of the number of plant and animal species inhabiting our planet: less than 2 million species are classified, while the possible number of species is estimated from 5 to 50 million." Moreover, different criteria for the classification of the same animals lead to different results. Martin says that "it all depends on what you want to receive. If you want the classification to talk about the blood relationship of the species, you will get one answer, if you want to reflect the level of adaptation, the answer will be different." It can be concluded that, even in strict scientific disciplines, the methods and criteria of classification strongly depend on the purpose of classification.

The conclusion is simple. As Descartes argued: "Opening order is not an easy task, but if it is found, it is not at all difficult to understand." The best programming solutions look simple, but, as experience shows, it is very difficult to achieve a simple architecture.

All this information is given here not in order to justify the "long-term construction" in software, although in fact many managers and users think that centuries are necessary to complete the work begun. I just wanted to emphasize that sensible classification — intellectual work and the best way to do it — is a sequential, iterative process. In the beginning, the problem is solved somehow, for each particular case. With the accumulation of experience, some solutions are more successful than others, and there is a kind of folklore, passing from person to person. Successful solutions are studied more systematically, they are programmed and analyzed. This allows you to develop models, implement them, and develop a theory that generalizes the solution found. This in turn raises the practice to a higher level and allows you to take on an even more difficult task.

The iterative approach to classification imposes an appropriate imprint on the procedure for constructing a hierarchy of classes and objects when developing complex software. In practice, usually a certain structure of classes is taken as a basis, which is gradually improved. And only at a late stage of development, when some experience has already been gained of using such a structure, can we critically evaluate the quality of the resulting classification. Based on the experience, we can create a new subclass from the existing ones (conclusion), or divide a large class into many small ones (factorization), or, finally, merge several existing ones into one (composition). Perhaps in the development process will be found new common properties, previously not seen, and we will be able to define new classes (abstraction).

Why is the classification so difficult? This is due to two reasons. Firstly, the lack of a “perfect” classification, although, naturally, some classifications are better than others. Secondly, a reasonable classification requires a fair amount of creative insight. All this resembles a riddle: "Why does a laser beam look like a goldfish? .. Because neither of them can whistle." You have to be a very creative thinker to find common ground in such unrelated subjects.

Since the time of Plato, the problem of classification has occupied the minds of countless philosophers, linguists and mathematicians. Therefore, it would be wise to study the experience gained and apply it in object-oriented design. Only three approaches are historically known:

  • classical categorization
  • conceptual clustering
  • prototype theory

1.3.1 Classical categorization

In the classical approach, "all things with a given property or set of properties form a certain category. Moreover, the presence of these properties is a necessary and sufficient condition that determines the category." For example, single people are a category: each person is single or married, and this feature is sufficient to decide which category an individual belongs to. On the other hand, tall people do not define categories, unless, of course, we specifically specify the criterion that allows us to clearly distinguish tall people from low ones.

Thus, the classical approach uses the affinity of their properties as a criterion for the similarity of objects. In particular, objects can be divided into disjoint sets depending on the presence or absence of some characteristic. Minsky suggested that "the best are such sets of properties, the elements of which interact little with each other. This explains the universal love for such criteria as size, color, shape and material. Since these criteria do not overlap, it is possible to argue about some subject matter that he is big, gray, round and wooden. " Generally speaking, the properties do not have to be measurable, as they can be used the observed behavior. The fact that the birds fly and there are no fish allows us to distinguish the eagle from the trout.

What specific properties should be taken into account? It depends on the situation. For example, the color of the car should be fixed in the task of recording the production of the automobile plant, but it is not interesting to the program that controls the traffic light. That is why we say that there is no absolute classification criterion; the same class structure may be suitable for one task and not suitable for another. It cannot be argued that some classification scheme reflects the structure and order of things in nature better than others. Nature is indifferent to our attempts to understand it. Some classifications are indeed more important than others, but only in connection with our interests, and not because they reflect reality more faithfully or more fully.

Modern Western thinking is mostly saturated with classical categorization, however, as the example with high and low people shows, this approach does not always work. Kosok notes that "natural categories are not clearly delimited from each other. Most birds fly, but not all. The chair can be wooden, metal or plastic, and the number of legs it depends entirely on the designer's whim. It is almost impossible to list the defining properties of a natural category, so that there are no exceptions. " These are, indeed, fundamental defects of the classical categorization, which they tried to correct in modern approaches. We will deal with them now.

1.3.2 Conceptual clustering

This is a more modern version of the classical approach. It arose from attempts at a formal representation of knowledge. With this approach, conceptual descriptions of classes (clusters of objects) are first formed, and then we classify entities according to these descriptions. For example, take the concept of "love song". This is precisely a concept, not a sign or property, since the degree of song's loyalty can hardly be measured. But if it can be argued that a song is more about love than about something else, then we put it in this category.

Conceptual clustering can be associated with the theory of fuzzy (multi-valued) sets, in which an object can belong to several categories at the same time with varying degrees of accuracy. Conceptual clustering makes absolute judgments in the classification based on the best agreement.

1.3.3 Prototype Theory

Classical categorization and conceptual clustering are quite expressive methods, quite suitable for designing complex software systems. But still there are situations in which these methods do not work. Consider a more modern classification method, the theory of prototypes, the background of which can be found in the book on the psychology of perception of Roche and her colleagues.

There are some abstractions that have neither clear properties nor clear definitions. This problem can be explained as follows: there are categories (for example, games) that do not correspond to classic patterns, since there are no signs common to all games ... For this reason, they can be united by the so-called family similarity ... The category of games has no clear boundary . The category can be expanded and include new types of games, provided that they resemble already known games. That is why this approach is called prototype theory: a class is defined by a single prototype object, and a new object can be assigned to a class, provided that it is endowed with substantial similarity to the prototype.

Another example: we consider a soft pouf, a hairdresser’s chair and a folding chair as chairs, not because they satisfy a fixed set of prototype traits, but because they have sufficient family resemblance to the prototype ... No common set of prototype properties is required, which it would have been suitable for a padded stool and for a hairdresser's chair, but both of them are chairs, since each of them individually looks like a prototype chair, even if each is different in its own way. Properties determined when interacting with an object (interaction properties) are central to the definition of family resemblance.

The concept of interaction properties is central to the theory of prototypes. In conceptual clustering, we group according to different concepts. In the theory of prototypes, the classification of objects is made according to the degree of their similarity with a specific prototype.

Another example: we consider a soft pouf, a hairdresser’s chair and a folding chair as chairs, not because they satisfy a fixed set of prototype traits, but because they have sufficient family resemblance to the prototype ... No common set of prototype properties is required, which it would have been suitable for a padded stool and for a hairdresser's chair, but both of them are chairs, since each of them individually looks like a prototype chair, even if each is different in its own way. Properties determined when interacting with an object (interaction properties) are central to the definition of family resemblance.

The concept of interaction properties is central to the theory of prototypes. In conceptual clustering, we group according to different concepts. In the theory of prototypes, the classification of objects is made according to the degree of their similarity with a specific prototype.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Object oriented programming

Terms: Object oriented programming