Deconvolutional Neural Network (DNN)

Lecture



The use of classical neural networks for image recognition is complicated, as a rule, by a large dimension of the vector of the input values ​​of the neural network, a large number of neurons in the intermediate layers and, as a result, large expenditures of computing resources for training and computing the network. Convolutional neural networks are less likely to have the disadvantages described above.

The convolutional neural network ( pers . Convolutional neural network , CNN ) is a special architecture of artificial neural networks proposed by Jan Lecun and aimed at effective image recognition and is part of the technology of deep learning (eng. Deep leaning ). This technology is built by analogy with the principles of the visual cortex, in which so-called simple cells were discovered, which respond to straight lines from different angles and complex cells, the reaction of which is associated with the activation of a specific set of simple cells. Thus, the idea of ​​convolutional neural networks is the alternation of convolutional layers (English convolution layers ) and sub-sampling layers (English subsampling layers , subsample layers). [6]

  Deconvolutional Neural Network (DNN)
Figure 1. The architecture of the convolutional neural network

The key point in the understanding of convolutional neural networks is the concept of so-called “shared” weights, i.e. a part of the neurons of some considered layer of the neural network may use the same weights. Neurons using the same weights are combined into feature maps , and each neuron of the feature map is associated with a portion of the neurons of the previous layer. When calculating the network, it turns out that each neuron performs convolution (the operation of convolution) of a certain area of ​​the previous layer (determined by the set of neurons associated with the neuron). The layers of a neural network constructed in the manner described are called convolutional layers. In addition, convolutional layers in a convolutional neural network can be subsampling layers (performing the functions of reducing the dimension of the feature map space) and fully connected layers (the output layer, as a rule, is always fully connected). All three types of layers can alternate in random order, which allows to draw feature maps from feature maps, which in practice means the ability to recognize complex feature hierarchies [3].

What exactly affects the quality of pattern recognition when training convolutional neural networks? Puzzled by this question, stumbled upon an article by Matthew Zeiler ( Matthew Zeiler ). He developed the concept and technology of Deconvolutional Neural Networks ( DNN ) for understanding and analyzing the performance of calibration neural networks. The article by Matthew Ziler offers Deconvolutional Neural Network s technology, which builds hierarchical representations of the image (Figure 2), taking into account filters and parameters obtained during CNN training (Figure 2). These representations can be used to solve problems of primary signal processing, such as noise reduction, and they can also provide low-level functions for object recognition. Each hierarchy level can form more complex functions based on the functions of the levels located in the hierarchy below.

  Deconvolutional Neural Network (DNN)
Figure 2. Image views

The main difference between CNN and DNN is that in CNN the input signal is subjected to several layers of convolution and subsampling. DNN, on the contrary, seeks to generate an input signal in the form of a sum of convolutions of feature maps taking into account the applied filters (Fig. 3). To solve this problem, a wide range of tools of the pattern recognition theory is used, for example, algorithms for eliminating blur ( deblurring ). The work, written by Matthew Zyler, is an attempt to link the recognition of image objects with low-level tasks and data processing and filtering algorithms.

Understanding the convolution operation requires an interpretation of the behavior of feature maps in intermediate layers. To study a convolutional neural network, a DNN is attached to each of its layers, as shown in Figure 3, providing a continuous path from the network's outputs to the image's input pixels. First, a convolution operation is performed on the input image and feature maps are calculated over all layers, then, to study the behavior in CNN , weights of all neurons in the layer are set to zero and the resulting feature maps are used as input parameters for the attached deconvnet layer. Then we successively carry out the operations: (I) separation, (II) rectification and (III) filtration. The trait maps in the layer below are reconstructed in such a way as to obtain the necessary parameters, such as the neuron weights in the layer and the filters used. This operation is repeated until the values ​​of the input pixels of the image are reached.

  Deconvolutional Neural Network (DNN)
Figure 3. The process of research of convolutional neural networks using DNN

The separation operation: in convolutional neural networks, this is a union operation, it is irreversible, however, an approximate inverse value can be obtained by recording the location of the maxima within each region. The operation of unification is understood as the summation of all input values ​​of the neuron and the transfer of the obtained sum to the transfer function of the neuron. In DNN , the disconnect operation uses changes in the set of variables placed in the layer above, at the appropriate places in the layer that is being processed at the moment (see Figure 2).

Rectification operation: a convolutional neural network uses a non-linear function ( relu (x) = max (x, 0) , where the x-input image is), ensuring that the resulting feature maps will always be positive.

Filtering operation: a convolutional neural network uses the filters obtained in the network training process to convolve the feature maps from the previous layer. To understand which filters were applied to an image, deconvnet uses transposed versions of the same filters. Designing a “descent down” from higher levels uses parameter changes derived from CNN training. Since these changes are characteristic of this input image, the reconstruction obtained from one function thus resembles a small piece of the initial image with structures (Fig. 4) weighted in accordance with their contribution to the feature map. Since the model is trained in accordance with the identified features, they, the structures, implicitly show which parts of the input image (or parts of two different images) are different in the features obtained [4]. Also, the resulting structures allow conclusions to be drawn about which low-level features of the image are key to its classification.

  Deconvolutional Neural Network (DNN)
Fig. 4. Image structures.

Although in theory the global minimum can always be found, in practice it is difficult. This is due to the fact that the elements in the feature maps are connected to each other through filters. One element in the map can affect other elements that are located far from this element, which means that minimization can take a very long time.

Benefits of using DNN :

1) conceptually simple training schemes. DNN training is carried out through the use of unpooling, rectification and image filtering, as well as feature maps obtained during CNN training;
2) applying DNN to source images, you can get a large set of filters that cover the entire image structure using primitive representations; In this way, filters are obtained that apply to the entire image, and not to each small piece of the original image. This is a great advantage, as there is a more complete understanding of the processes that occur during CNN training.
3) views (Fig. 2) can be obtained without configuring special parameters or additional modules, such as separation, rectification and filtering. They, representations, turn out in the course of training of CNN ;
4) the DNN approach is based on the global minimum search method, as well as the use of filters obtained from CNN training, and is designed to minimize ill-conditioned costs that arise in the convolutional approach.

The review article also contains the results of experiments conducted by Matthew Zyler. The network proposed by him at ImageNet 2013 competitions showed the best result in solving the problem of image classification, the error was only 14.8%. Classification of objects in 1000 categories. The training sample consisted of 1.2 million images, and the test sample of 150 thousand images. For each test image, the recognition algorithm should issue 5 class marks in descending order of their reliability. When calculating the error, it was taken into account whether the most reliable mark corresponds to the mark of the class of the object actually present in the image, which is known for each image. The use of 5 tags is intended to exclude the “punishment” for the algorithm in the case when it recognizes objects of other classes in the image, which can be implicitly represented [1]. More competitions for ImageNet 2013 are described here.

The results of Deconvolution Neural Networks are shown in Figure 5.

  Deconvolutional Neural Network (DNN)
Figure 5. DNN results

Seiler further plans to develop DNN technology in the following areas:

1) Improving the classification of images in DNN. DNN networks have been introduced in order to understand the features of the learning process of convolutional networks. Using the parameters obtained during the training of the convolutional neural network, in addition to the high-level functions, a mechanism can be provided to increase the level of classification in DNN . Further work is related to the classification of the original images, so you can say that the template method will be applied. Images will be classified based on the class to which the object belongs.
2) scaling DNN . Inference methods used in DNN , by their nature, are slow, since many iterations are necessary. Numerical approximation methods based on direct communication should also be investigated. This will require only those functions and merge parameters for the current batch of snapshots that allow the DNN to scale to large data sets.
3) improvement of convolutional models for the detection of several objects in the image. Convolutional networks are known to be used for classification for many years and they have recently been applied to very large data sets. However, further development of algorithms used in the convolutional approach is necessary to detect several objects at once in the image. To detect CNN at once several objects in the image, a large set of training data is required, and the number of parameters for training the neural network also increases significantly. [4]

After studying his article, they decided to conduct a study on DNN . Matthew Seiler developed the Deconvolutional Network Toolbox for Matlab . And immediately ran into a problem - the non-trivial task of installing this Toolbox . After a successful installation, we decided to share these skills with habravchanami.

So, let's proceed to the installation process. Deconvolutional Network Toolbox was installed on a computer with the following technical characteristics:
Windows 7 64x
Matlab b2014a

Let's start with preparing the software that needs to be installed:

1) Windows SDK , during installation it is necessary to remove the check marks from the Visual C ++ Compilers items
Microsoft Visual C ++ 2010
If the computer already has VS 2010 redistributable x64 or VS 2010 redistributable x86 installed , you will have to remove it.
Finish the installation of the Windows SDK , and install the patch
2) After that, download and install VS 2010
3) Also, to install this toolbox, you need to install the icc compiler, in our case it is the Intel C ++ Composer XE Compiler 2011 .
4) In Matlab, type the command
  mbuild -setup 
and automatically select SDK 7.1 .
Similarly happens with
  mex –setup 

If the compiler was successfully installed, then you can now start building the toolbox.

Software preparation is complete. Getting to the compilation process.

1) Download toolbox with
www.matthewzeiler.com/software/DeconvNetToolbox/DeconvNetToolbox.zip and unpack
2) In Matlab, go to the directory where the unpacked toolbox is located, and run the file “setupDeconvNetToolbox.m”
3) Go to the PoolingToolbox folder. Open the file compilemex.m
This file requires a number of changes, as it is written for Linux.
It is necessary to register the paths in MEXOPTS_PATH to Matlab , to the libraries located in the compiler folder for the 64-bit system, as well as to the header files of the compiler and VisualStudio 2010 .
4) Let's make some more changes, they look like this
exec_string = strcat({'mex '},MEXOPTS_PATH,{' '},{'-liomp5mt max_pool.cpp'});
eval(exec_string{1});

Similarly, you need to do for the rest of the compiled files. (Example)
5) We will also make changes to mexopts.sh
It is also necessary to prescribe paths to the 64 and 32 bit compiler. (Example)
6) Now go to the IPP Convolution Toolbox directory
7) Go to the MEX folder and run the file complimex.m , here you also need to register the same as in paragraph 4, and separately add the paths to ipp_lib and ipp_include . (Example)
Matlab will say that there are not enough libraries, they need to be put in c: \ Program Files (x86) \ Intel \ ComposerXE-2011 \ ipp \ lib \ intel64 \
8) Similarly, p5 make changes
  exec_string = strcat ({'mex'}, MEXOPTS_PATH, {''}, {IPP_INCLUDE_PATH}, {''}, {IPP_LIB64_PATH}, {''}, {IPP_LIB_PATH}, {''}, {'- liomp5mt -lippiemerged -lippimerged -lippcore -lippsemerged -lippsmerged -lippi ipp_conv2.cpp '});
 eval (exec_string {1}); 

(Example)
Run the file, if everything worked correctly - we continue.
9) Go to the GUI folder
10) Open the file gui.m , here you need to set the path to the folder with the unpacked Deconvolutional toolbox , for me it looks like this
START_DATASET_DIRECTORY = 'C:/My_projects/DeconvNetToolbox/DeconvNetToolbox';
START_RESULTS_DIRECTORY = 'C:/My_projects/DeconvNetToolbox/DeconvNetToolbox';

(Example)
11) Run the file gui.m. Earned? Close and go on.
12) Now in the folder where Tulbox was pumped out, we create the Results folder, and in it the temp folder. Now we start from the Results gui.m folder , a graphical interface appears in which the parameters are set, in the lower right corner of the Save Results set to 1, and click Save . As a result of these actions, the gui_has_set_the_params.mat file is generated in the GUI directory with the parameters of the DNN network demonstration model proposed by Matthew Zyler.
13) Now you can start learning the resulting model by calling the script trainAll.m.
If all actions were performed correctly, then after completion of training in the Results folder you can see the result of the work.

Now you can start doing research! The results of the research will be devoted to a separate article.

Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Pattern recognition

Terms: Pattern recognition