Data mining and applications in science and technology

Computer Assignment on SVD and Character Recognition

Construct an algorithm in MATLAB for character recognition of handwritten digits. Using a training set, compute an SVD of each matrix of digits of one kind. Use the first few (5-10) singular vectors as basis and classify unknown test digits according to how well they can be represented in terms of the respective bases (use the residual vector in the lest squares problem as a measure). Try to tune the algorithm for accuracy of classification. Check if all digits are equally easy or difficult to classify. Report the number of incorrectly classified digits in a table. Also look at some of those, and see that in many cases they are very badly written.

If time permits, check the singular values of the different digits, and see if it is motivated to use different numbers of basis vectors for different digits.

The test data are available at the location  /mailocal/lab/numt/ngssc/digits/, see the README file for details. The following files are needed:

  1. dzip.mat and azip.mat: the first is a vector that holds the digits (the number) and the second is an array of dimension 256 x 1707 that holds the training images. The images are vectors of dimension 256, that have been constructed from 16 x 16 images.

  2. The test data are given in dtest.mat and testzip.mat.

  3. There is a function ima2.m that takes an image vector as input and displays it. If you are interested and good at colour maps, you are welcome to refine the given map.