For a basic introduction to neural networks, also covering more advanced topics the following two textbooks are a good starting point:
-  C. M. Bishop. Neural networks for pattern recognition. Clarendon Press, Oxford, 1995.
-  S. Haykin. Neural networks: a comprehensive foundation. Pearson Education, New Delhi, 2005.
Both books offer a profound introduction to neural networks and their applications to pattern recognition without requiring prior knowledge in the area of machine learning. Although the references used in these two books cover most of the important work in the area of neural networks besides very recent research, the following paragraphs will give a more detailed overview.
As multilayer perceptrons are an important model of neural networks, Rosenblatt's perceptron might be interesting to get started with neural networks (although the perceptron is covered in the above textbooks, as well):
-  F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 1958.
The backpropagation algorithm, proposed in , still is an important tool for training neural networks. It allows to efficiently evaluate the gradient of a chosen error measure such that basic first-order optimization techniques can be used for training. This algorithm, as well, is described in most textbooks on neural networks.
-  D. E. Rumelhart, G. E. Hinton, R. J. Williams. Learning representations by back-propagating errors. Nature, 323, 1986.
In , the advantages of second-order optimization methods for network training are discussed. When using second-order methods for training, the hessian matrix of the chosen error measure needs to be evaluated. To accomplish this, an algorithm similar to the error backpropagation algorithm is described in :
-  S. Becker, Y. LeCun. Improving the convergence of back-propagation learning with second order methods. Proceedings of the 1988 Connectionist Models Summer School, pages 29-37, 1989.
-  C. M. Bishop. Exact calculation of the hessian matrix for the multi-layer perceptron. Neural Computation, 4, 1992.
In  and  the expressive power of multilayer perceptrons is explored. In general, a multilayer perceptron with at least one hidden layer is capable of approximating every target function up to arbitrary accuracy .
-  K. Hornik, M. Stinchcombe, H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2, 1989.
-  K. Hornik, M. Stinchcombe, H. White. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks, 3, 1990.
Although, neural networks have found extensive use in human language processing, as well, the convolutional network proposed in  focuses on applications in the area of computer vision:
-  Y. LeCun, O. Matan, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, H. S. Baird. Handwritten zip code recognition with multilayer networks. Proceedings of the International Conference on Pattern Recognition, pages 35 - 40, Atlantic City, 1990.
Experimental results of different architectures of neural networks applied to the task of document recognition are discussed in . A discussion of generalization with respect to the constrained architecture of a convolutional network can be found in .
-  Y. LeCun. Generalization and network design strategies. Technical report, University of Toronto, 1989.
-  Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, pages 2278 - 2324, 1998
Further discussion of recent architectural changes in the use of convolutional networks for a variety of computer vision tasks can be found in  and  where different layer types and their usage are reviewed:
-  K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun. What is the best multi-stage architecture for object recognition? Proceedings of the International Conference on Computer Vision, pages 2146–2153, 2009.
-  Y. LeCun, K. Kavukcuoglu, C. Farabet. Convolutional networks and applications in vision. Proceedings of the International Symposium on Circuits and Systems, pages 253 - 256, 2010.
Deep learning, that is training deep neural networks (in general, neural networks are considered deep if there are more than 3 layers present ), is still considered very difficult . Convolutional networks can be considered an exception as, due to its constrained architecture, training deep convolutional networks is possible using traditional training - gradient descent and error backpropagation.  provides a thorough discussion of recent approaches to deep learning as well as its difficulties:
-  Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 4, 2009.
In addition,  discusses unsupervised training of deep networks which can both be interpreted as regularization method and as optimization aid. Further details on unsupervised training of deep networks can be found in ,  and .
-  D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, S. Bengio. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, pages 625 - 660, 2010.
-  D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, P. Vincent. The difficulty of training deep architectures and the effect of unsupervised pre-training. Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 153 - 160, 2009.
-  X. Glorot, Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 249 - 256, 2010.
Recent success of convolutional networks is reported in , where convolutional networks are applied to the task of image classification:
-  A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, pages 1097 - 1105, 2012.
Based on the architecture in , and another unsupervised model (unsupervised training is discussed in detail in ) named deconvolutional network and introduced in , the authors of  try to get further insights into the internal operation of convolutional networks by visualizing the activations of higher convolutional layers:
-  M. D. Zeiler, D. Krishnan, G. W. Taylor, R. Fergus. Deconvolutional networks. Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 2528 - 2535, 2010.
-  M. D. Zeiler, R. Fergus. Visualizing and understanding convolutional networks. Computing Research Repository, abs/1311.2901, 2013.
Tip: Another extensive reading list on deep learning and code examples can be found at deeplearning.net.
Note: Additional references covering an introduction to neural networks can also be found in my seminar paper which is available here.
, , , , , , , , , , , , , , , , , , , .