Neural Network Tips

Lecture



Historically, artificial neural networks during their more than half a century of history have experienced both periods of rapid take-off and increased public attention, as well as periods of skepticism and indifference that have replaced them. In good times, scientists and engineers think that a universal technology has finally been found that can replace a person in any cognitive tasks. Like mushrooms after a rain, various new models of neural networks appear, between their authors, professional mathematicians, there are intense disputes about a greater or lesser degree of biological nature of their proposed models. Professional biologists observe these discussions from the side, periodically breaking down and exclaiming “Yes, there is no such thing in real nature!” - and without any special effect, because neurosetetic mathematicians listen to biologists, as a rule, only when the facts of biologists agree with their own theories. However, over time, a pool of tasks is gradually accumulating, on which neural networks work frankly poorly and people's enthusiasm cools.

Nowadays, neural networks are again at the zenith of fame thanks to the invention of the “without a teacher” pre-training method based on Restricted Bolzmann Machines, RBM, which allows you to train deep neural networks (i.e., with extra-large, about tens of thousands the number of neurons) and the success of deep neural networks in practical problems of recognition of oral speech [1] and images [2]. For example, speech recognition in Android is implemented on deep neural networks. How long it will last and how deeply the neural networks will justify their expectations are unknown.
Meanwhile, parallel to all scientific disputes, trends and trends, the community of users of neural networks is clearly distinguished - software engineers and practitioners who are interested in the applied aspect of neural networks, their ability to learn from the collected data and solve recognition problems. The well-developed, relatively small models of multilayer perceptrons (Multilayer Perceptron, MLP) and the network of radial basis functions (Radial Basis Function network, RBF) do an excellent job with many practical problems of classification and forecasting. These neural networks are described many times, I would advise the following books, in order of my personal sympathy for them: Osovsky [3], Bishop [4], Haikin [5]; There are also good courses on Coursera and similar resources.

However, with regard to the general approach of using neural networks in practice, it is fundamentally different from the usual deterministic developer approach “programmed, it works - it means it always works.” By their nature, neural networks are probabilistic models, and the approach to them should be completely different. Unfortunately, many new programmers of machine learning technologies in general, and neural networks in particular, make system errors when working with them, get frustrated and give up on this matter. The idea of ​​writing this treatise on Habr arose after communicating with such frustrated users of neural networks - excellent, experienced, confident programmers.

Here is my list of rules and typical errors of using neural networks.

1. If it is possible not to use neural networks - do not use them.
Neural networks allow solving the problem in case if it is impossible to propose an algorithm by means of multiple (or very multiple) data viewing with eyes. For example, if there is a lot of data, they are non-linear, noisy and / or of a high dimensionality.

2. The complexity of neural networks must be adequate to the complexity of the task.
Modern personal computers (for example, Core i5, 8 GB RAM) allow for comfortable time to train neural networks on samples of tens of thousands of examples, with the dimension of input data up to a hundred. Large samples are a challenge for the deep neural networks mentioned above, which are trained on multiprocessor GPUs. These models are very interesting, but are out of focus of this article.

3. Training data should be representative.
A training set must fully and comprehensively represent the phenomenon being described, include various possible situations. It is good when there is a lot of data, but this in itself does not always help either. In narrow circles, there is a widespread anecdote, when a geologist comes to the discriminator, puts a piece of mineral in front of him and asks him to develop a system for recognizing such a substance. “Are there any more examples of data?” Asks the discriminator. “Of course!” The geologist replies, pulls out a pickaxe and splits his piece of mineral into several more pieces. As you understand, there will be no benefit from such an operation - such an increased sample does not carry any new information.

4. Shuffle the sample.
After the input and output data vectors are collected, if the measurements are independent of each other, change the order of the vectors in an arbitrary way. This is critical for correctly separating the sample into Train / Test / Validation and all “sample-by-sample” training methods.

5. Normalize and center the data.
For multilayer perceptrons, and for many other models, the values ​​of the input data must lie within the limits [-1; 1]. Before you submit them to the neural network, subtract the average from the data and divide all the values ​​by the maximum value.

6. Divide the sample into Train, Test and Validation.
The basic mistake of newbies is to ensure the minimal error of the neural network operation on the training set, along with it, hellishly retraining it and then desiring the same good quality on new real data. This is especially easy to do if there is little data (or they are all “from one piece”). The result can be very frustrating: the neural network will adapt to the sample as much as possible and will lose performance on real data. In order to control the generalizing abilities of your model, divide all data into three samples at a ratio of 70: 20: 10. Train on Train, periodically checking the quality of the model on Test. For the final unbiased assessment - Validation.
The cross-validation technique, when Train and Test is repeatedly formed in turn in an arbitrary way from the same data, can be cunning and give a false impression of the good quality of the system - for example, if data are taken from different sources and this is critical. Use the correct Validation!

7. Apply regularization.
Regularization is a technique that avoids retraining a neural network during training, even if there is little data. If you find a check mark with such a word, be sure to put it. A sign of a retrained neural network is large weights, of the order of hundreds and thousands, such a neural network will not work normally on new, not seen before, data

8. No need to retrain neural network on-line.
The idea of ​​training the neural network permanently all the time on new incoming data is in itself correct, in real biological systems, this is exactly what happens. We study every day and rarely go crazy. However, for conventional artificial neural networks at the present stage of technical development, this practice is risky: the network can retrain or adapt to the most recent data received - and lose its generalizing abilities. In order for the system to be used in practice, the neural network needs to: 1) train, 2) test the quality on test and validation samples, 3) select a successful network option, record its weights and 4) use the trained neural network in practice, weights in the process use not to change.

9. Use new learning algorithms: Levenberg-Marquardt, BFGS, Conjugate Gradients, etc.
I am deeply convinced that to realize learning by the method of back propagation of error (backpropagation) is the sacred duty of everyone who works with neural networks. This method is the simplest, relatively easy to program and allows you to study well the learning process of neural networks. Meanwhile, backpropagation was invented in the early 70s and became popular in the mid 80s of the last century, since then more advanced methods have emerged that can at times improve the quality of education. Better use them.

10. Train neural networks in MATLAB and similar friendly environments.
If you are not a scientist developing new methods of teaching neural networks, but a programmer-practitioner, I would not recommend encoding the procedure for teaching neural networks on your own. There are a large number of software packages, mainly in MATLAB and Python, which allow you to train neural networks, while monitoring the process of learning and testing, using convenient visualization and debugging tools. Use the heritage of humanity! I personally like the approach “learning in MATLAB with a good library - implementing a trained model with my hands”, it is quite powerful and flexible. An exception is the STATISTICA package, which contains advanced methods for teaching neural networks and allows them to be generated in the form of C program code, convenient for implementation.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Computational Intelligence

Terms: Computational Intelligence