# Multimedia Information Retrieval

An introduction to retrieval from image, video, and audio databases, from a statistical, pattern recognition, and algorithmic point of view.

- Models of information retrieval
- Nearest neighbors, range queries, hash algorithms
- PLSA and topic models vector space models
- Text categorization, analysis, tagging, parsing
- Content based image and video retrieval
- Image retrieval based on color, texture, and shape
- Visual bag of words model
- Grouping, static and motion segmentation, scene cuts
- Geometric indexing, verification, and object recognition
- Automatic annotation and categorization
- HMM-based methods (audio, video, music, document retrieval)
- Generative and discriminative methods
- Selected special purpose applications, such as face detection, x-ray image analysis
- Performance evaluation and competitions
- Applications in consumer imaging, security, forensics, and copyright and plagiarism detection

# Materials

# Questions

## Lecture 1

- Name common commercial applications of multimedia information retrieval.
- Describe specific, common multimedia information retrieval tasks (what do queries look like? what is stored in the database?).
- Write a Python / NumPy / SciPy function to …
- Define: precision, recall, 11-point P/R graph, F-measure, mean average precision, true/false positive/negative, ROC curve, DET curve
- Describe the vector space model of text information retrieval.
- Describe the process of tokenization.
- Describe latent semantic analysis.

## Lecture 2

- Name and describe major image storage formats, their properties, and differences (e.g., PPM, PNG, JPEG, JPEG2000).
- Name and describe common command line tools and Python libraries for image processing (e.g., ImageMagick, SciPy, PIL, etc.)
- What is EXIF? What kind of information is stored in it?
- Given Python expressions to load images, display them, and convert them to grayscale.
- Describe JPEG compression.
- How do you perform in-memory JPEG compression of an image given as a Python array?
- What is an MD5 checksum? What is a common use for checksums in image databases?
- How are images stored in relational databases?
- What is the HDF5 database format? How does it differ from a relational database?
- What is OpenCV?
- How would you perform fade-to-black detection in a video?
- How would you perform scene cut detection in a video?
- What is keyframe detection? What is it used for?
- What is optical flow? How does it differ from object motion?

## Lecture 3

- What can you say about the error rate of nearest neighbor classifiers?
- How are nearest neighbor classifiers related to information retrieval applications?
- Why are nearest neighbor classifiers particularly suitable for information retrieval applications (compared to other kinds of classifiers)?
- What properties do dissimilarity functions commonly satisfy?
- What are the requirements for a distance in a metric space?
- What Python library functions are useful for performing nearest neighbor classification?
- Name some common color spaces.
- What is a color histogram?
- How are color histograms used for image retrieval?
- What is a perceptually uniform color space?
- What is image texture?
- How is image texture represented?
- How do you segment images by their texture?

## Lecture 4

- What do clustering algorithms do?
- Describe commonly used, different approaches to clustering.
- Describe the k-means clustering algorithm.
- What is Gaussian mixture modeling?
- When does GMM work better than k-means clustering?
- How can you determine the correct number of clusters in a clustering problem?
- What is hierarchical clustering?
- What is a dendrogram?
- What is single-linkage clustering? What is complete linkage clustering? What is average linkage clustering?
- How can you combine k-means clustering and PCA? Why would you?
- What is “hierarchical tree VQ”? What is hierarchical clustering?

## Lecture 5

- What are image patches? How are they used in MMIR?
- How are patches commonly represented?
- How does compression by vector quantization work?
- What is a VQ code histogram? What is the bag of visual words method?
- What is interest point detection?
- How are interest points used with patch descriptors?
- Name commonly used interest point detectors.
- Describe how you can tag/label images using a combination of interest point detection, patch descriptors, and logistic regression.
- Motivate and describe the Harris corner detector.
- Describe corner detection based on (1) median filters, (2) level curves, (3) orientation histograms, (4) morphological operations.
- What is scale space? How is scale space used in the SIFT interest point detector?

## Lecture 6

- Describe the abstraction used for interest point detection and descriptors used in OpenCV.
- Describe common applications for interest point detectors and descriptors.
- Describe the demands that each of these applications places on interest point detectors / descriptors.
- Name some commonly used interest point detectors and feature descriptors.
- What are SIFT, SURF, MSER, Harris corners?
- Give commonly desired properties for interest point detectors.
- Describe how we can benchmark/test/evaluate interest point detectors.
- Describe the SIFT feature descriptor.

## Lecture 7

- Describe the relationship between k-nearest neighbor classification and nonparametric density estimation.
- What happens with nearest neighbor classification in high dimensions?
- What are approximate nearest neighbor methods? Does such an approximation make sense in high dimensions?
- What is the intrinsic dimension of a data set?
- How can we determine the intrinsic dimension of a data set?
- What is the covering dimension?
- Describe a linear method for dimensionality reduction.
- Name some common nonlinear dimensionality reduction techniques.
- Given a video sequence of an object rotating in 3D, what is the intrinsic dimensionality of the video frames viewed as feature vectors?
- Name some commonly used techniques for fast approximate nearest neighbor retrieval.

## Lecture 8

- Explain the Hough transform for finding lines / the generalized Hough transform.
- Explain the RANSAC algorithm for finding lines / for finding general object instances.
- Explain the RAST algorithm for finding lines / for finding general object instances.
- Explain how 2D coordinates can be represented as complex numbers, and how common geometric transformations can be expressed as complex arithmetic.
- Describe how you can perform 2D object recognition using interest point detectors and geometric matching.
- Explain how SIFT or SURF descriptors can be used to improve / speed up geometric matching for object recognition.

## Lecture 9

- Explain segmentation by thresholding / adaptive thresholding.
- Discuss how thresholding might be used for color segmentation / texture segmentation.
- What is document binarization?
- How can the k-means algorithm be used for simple color-based segmentations of images?
- Which color space is better for performing color based segmentation, Lab or RGB? Why?
- What is the basic idea behind edge-based segmentation?
- Explain the watershed segmentation algorithm.
- Explain the idea behind the random-walk based segmentation algorithm.
- What are superpixels?
- Describe a simple k-means based implementation of superpixel segmentation.
- Describe the graph cut segmentation algorithm.
- What are active contours?
- Describe the GVF-based active contour segmentation algorithm.

## Lecture 10

- How is speech generated? What is the vocal tract? How does it generate sounds?
- What are formants?
- What is a spectrogram? How is it computed?
- What is a window function?
- What is the purpose of dynamic time warping?
- Describe the dynamic time warping algorithm.
- How can you perform speech recognition with dynamic time warping?
- What is a Markov chain?
- What is an n-gram?
- Define: accessibility / communicating states / irreducibility / transience / recurrence / positive recurrence / absorption / ergodicity / reversibility

## Lecture 11

- What is a Hidden Markov Model? How is it specified?
- How can OCR (optical character recognition) be carried out with an HMM?
- How can speech recognition be carried out with an HMM?
- Describe how a spectrogram or the image of a line of text is transformed (using k-means) into a symbol sequence suitable for modeling with an HMM.
- Construct a simple HMM to produce this ____ letter.
- What is a durational model?
- What is the durational model for a single HMM state with a self-recurrence and one transition to another state?
- How can you construct a durational model with an approximately normal distribution of durations out of an HMM?
- What does the forward algorithm compute?
- Describe the forward algorithm.
- Why do we implement most HMM algorithms using log probabilities?
- What is Viterbi decoding?
- Describe the Viterbi decoding algorithm.
- What is Viterbi training?
- What does the forward-backward algorithm compute?
- Describe the forward-backward algorithm.
- Describe Baum-Welch reestimation.
- What is the Bakis model?
- Why does the Bakis model often give better results than an unconstrained model with the same number of states?
- Why does the Bakis model result in better durational models than a model with the minimum number of states necessary to produce the output sequence?