Donnerstag, 9. August 2007

Face Recognition with Neural Networks

This paper provides a brief introduction to neural networks and the task of face recognition with ANNs. It’s based on “Machine Learning”, written by Tom M. Mitchell and some resources of the World Wide Web.

See chapter “References” for more informations.

The aim of this paper is not to give a detailed introduction to neural networks, because there are many good books and online resources doing a great job in this. Just use Google do find some resources.

After a very brief introduction to gradient descent, backpropagation and neural networks I will show, how to design a neural network to recognize faces. The data set (the images) can be found here: http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html, the source
code here: http://code.google.com/p/mindthegap/

1. Backpropagation and Gradient Descent
In the paper „Introduction“ we have developed a learner to learn the task of playing Tic Tac Toe.We have used a linear combination of weights and features to represent the learning task and have used the LMS update rule to adjust the weights. With the LMS update rule we have minimized the error between the output and the target output. The LMS update rule is a stochastic gradient descent. This linear representation of a learning task is just suitable for problems, which are linear separable. In many cases this is not possible, so different (more complex) represenations have to be found. One represenation are neural networks. Depending
on the structure of the network (number of different layers), neural networks can represent very complex nonlinear hypothesis spaces. But beside some differences our simple Tic Tac Toe learner shares some basic concepts with neural networks: the basis of both are features and weights.

Backpropagation is a (supervised) learning algorithm for feed-forward networks. As many other algorithm it uses gradient descent to find weights which minimize the error. Gradient descent involves computing the error gradient and adjusting the weights in the direction of the gradient.

I´ve implemented some simple neural networks to learn some simple tasks. The source code is here [2] available. There are neural networks which learn simple identity functions, simple inverse functions etc. There is also a delta rule implementation, which can be used to compare different types and parameters of the gradient descent.
You can compare
• plain gradient descent
• stochastic gradient descent (approximation)
• different learning rates and
• decay learning rates
For an overview about neural networks see [3].

2. Designing the ANN
Beside the implementation of the backpropagation algorithm our neural network to learn face recognition needs some important design decisions. First we have to design the overall structure of the network. We decide to use two sigmoid layers (one hidden layer and one output layer) consisting of some (currently) unknown number of units. Second we have to consider, which features our network should learn. The images are a 32x30 grid of pixels. Each pixel has an integer value from 0 to 255 representing color values. An obvious solution for our problem domain is to use 960 input units representing the pixel values. Third we have to decide, how
many hidden units we want to use. As suggest in [1] we use 20 units (but we can easily change this and experiment with more or less units). Fourth we have to design the output layer. F.ex. we can use only one output unit to represent the final classification. But we choose again an obvious solution: for every person there should be one output unit (because we expect that the network can learn this output encoding easier). The last decision comprises the target encoding. The target encoding determines the error between current output of the network and the desired output. Because we have one output unit for every person, our target encoding for e.g. the first person (of seven) looks like this: 0.9, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1. So high values
stand for the target classification.

The final structure looks like this:





















3 The Implemenation
The implemenatin of the network is straightforward. The class Unit represents a single sidmoid unit. It has methods for calculating the current error of the unit and for adjusting the weights. ImageNeuralNetwork represents the whole network. It defines the structur of the network and has methods for propagating the different layers and for training and testing the network. TargetEncoding defines the above specified target encoding. XVal is a simple crossvalidation implementation. ImageReader reads each single character from the image files and stores them as integers in an array. And ImageLearnerNN puts the pieces together. It uses the crossvalidation implementation to train networks and looks for the ’best’ network. The accuracy
of the ’best’ network is calculated and this process is repeated ten times to get a mean result.

References
[1] Machine Learning, Mitchell, Tom M. (1997), ISBN 0-07-115467-1
[2] Source Code, http://code.google.com/p/mindthegap/
[3] Artificial Neural Network, http://en.wikipedia.org/wiki/Artificial_neural_network
[4] Backpropagation, http://en.wikipedia.org/wiki/Backpropagation
[5] Mitchell Textbook, http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html