Final Assignment

Final Assignment

In order to receive the full amount of ECTS for this course, it is required to write a small report [e.g. 3 to 5 pages] on the analysis of a data set using approaches and tools presented in the course. This report should be finalized one week after the course. The selection of the data set can be discussed during the course. The analysis can already be started as a part of the laboratory work. Ph.D. students considering PRTools for their research may take a data set from their own Ph.D. project. Ph.D. students already using PRTools are encouraged to select another data set from to broaden their view. Note that some of the data sets mentioned below are too large to be analysed in a few days, so a motivated reduction is needed.

Send to

m.loog@tudelft.nl

Outline

Analyse a real‐world labeled dataset:

  1.  Find a good, initial representation for the dataset [features, dissimilarities]
  2. Study at least three different simple classifiers [learning curves, confusion matrices, ROC curves]
  3. Build also a neural network, a combined classifier, or a non‐linear support vector classifier
  4. Visualize
  5. Extract a small feature set
  6. Find the best classifier using a small subset of the data
  7. Evaluate on a larger test set
  8. Present the analysis and results in a report of three to five pages

Datasets

– If not your own data set…
a. NIST 16×16 pixels, normalized handwritten numbers, nist16,[10x200x256]
b. NIST 128×128, 2 class dissimilarities [2000×2000]
‐ dist_digit128x128_blured_dig38
‐ dist_digit128x128_hamming_dig38
‐ dist_digit128x128_hausd_dig38
‐ dist_digit128x128_modhausd_dig38
c. Chromosome banding patterns, cbands, [24x200x64]
d. KL‐features of hand printed numbers, mfeat‐kar [10x200x64]
e. Complete multi‐feature dataset mfeat‐xxx, [10x200x(216+76+64+6+240+47)]
f. Texture composite image, texturel [5x(128×128)x7], texturet [256x256x7]
g. Diabetes, 2 classes, 768 objects, 8 features
h. Breast, 2 classes, 699 objects, 10 features
i. Ionosphere, 2 classes, 351 objects, 34 features
j. Liver, 2 classes, 345 objects, 6 features
k. Kimia dataset of silhouettes, [reformatted], kimia (18x12x4096)
l. Zongker dissimilarity matrix, 10 classes, 2000×2000

Datafiles

m. ORL face database, 92×112 pixels, face_x, [40x10x10304]
n. Flowers, image database of 17 classes, 80 images per class
o. Delft Image Database, 256 images in 9 classes
p. NIST digits, raw images of 28000 handwritten numerals in 10 classes
q. Highway, 100 highway images given by 5 features and a desired pixel labelling

ASCI course A1 : Advanced Pattern Recognition