Image Descriptors

Represent surrounding region mathematically

Color histogram
Mean dy/dx values
Texton histograms

Feature Descriptors should be

Invariant: shouldn’t change even if image is transformed
Discriminative: should be highly unique for each point

Raw Image Patch: Intensity

Not invariant to geometry and appearance change

Image Gradients

Sensitive to deformations

Colour Histogram

Invariant to changes in scale and rotation

Not sensitive to the spatial layout

Spatial Histogram

Retains rough spatial layout

Some invariance to deformations

Not invariant to rotation

Multi-Scale Oriented Patches (MOPS)

Take 40x40 window around detected interest point (e.g. Harris corner)
Subsample every 5th pixel $\rightarrow$ absorbs localization errors
Rotate to horizontal $\rightarrow$ robustness to photometric variations
Normalize intensity values within the window by subtracting the mean & dividing by the standard deviation.

GIST Descriptor

Divide image patch into 4x4 cells
Apply filter bank (Gabor filters) of directional edge detectors.
Compute filter response averages for each cell.
Size of descriptor is 4x4xN, where N is size of the filter bank.

GIST encoded rough spatial distribution of the image gradients!

Gabor filters

If scale is small compared to the frequency, the Gabor filters can be used as approximate derivative operators

odd Gabor filter looks a lot like Gaussian Derivative

even Gabor filter looks a lot like Laplacian (2nd derivative)

Scale Invariant Feature Transform (SIFT)

Take 16x16 window around detected interest point

window: normalized wrt scale, orientation, and intensity
Divide the 16x16 window into a 2x2 case

to add some sensitivity to spatial layout
Compute edge orientation (angle of the gradient - 90°) for each pixel
Throw out weak edges (threshold gradient magnitude)
Create histogram of surviving edge orientations for each cell
16 cells * 8 orientations = 128 dimensional descriptor

normalized to 1

Advantages:

Invariant to scale & rotation
Can handle changes in viewpoint
- Up to about 60 degree out of plane rotation
Can handle significant changes in illumination
- Sometimes even day vs. night (below)
Fast and efficient
- SURF is an even faster follow-up derived from SIFT

Feature Matching

Feature distance

ratio distance = $\parallel f_1-f_2 \parallel / \parallel f_1-f_2’ \parallel$

matching SIFT descriptors: threshold ratio of nearest to 2nd nearest descriptor

Evaluating Feature Matches

$recall = {#TP \over # TP + # FP} $

$specificity = {#FP \over # TN + # FN} $

$AUC$ Area Under the Receiver Operator Characteristic (ROC) Curve, 1 is the best

Image Projections

Image transformations

Filtering: $G(x) = h{F(x)}$ Changes range of image function (pixel values)

Wrapping: $G(x) = F(h{x})$ Changes domain of image function (pixel locations)

Homography

Convert to homogeneous coordinates
Multiply by the homography matrix

Special transformation with 8 degrees of freedom relating the two projections.
Convert back to heterogeneous coordinates

Direct Linear Transform (DLT)

$P’ = H \cdot P$

One way to find $H$ is via DLT.

$x’ = \alpha (h_1x + h_2y + h_3)$

$y’ = \alpha (h_4x + h_5y + h_6)$

$1 = \alpha (h_7x + h_8y + h_9)$

So that

$h_7xx’ + h_8yx’ + h_9x’ - h_1x - h_2y - h_3 = 0$

$h_7xy’ + h_8yy’ + h_9y’ - h_4x - h_5y - h_6 = 0$

Re-write in matrix form: $A_ih = 0$

Solving for H using DLT

Given ${x_i, x_i’}$ solve for H such that $x’ = Hx$

For each correspondence, create 2x9 matrix $A_i$
Concatenate into single 2n x 9 matrix $A$
Compute singular value decomposition (SVD) of $A=U\sum V^T$
Store singular vector of the smallest singular value $h=v_i$
Reshape to get H

Warping

Pixels may end up between two points

To determine the intensity of each point, we distribute color among neighboring pixels

If a pixel (x’, y’) receives intensity from more than one pixels (x, y), we average their intensity contributions.

Drawbacks

linear least squares estimation, works only when the actual transform between two functions is linear
doesn’t deal with outliers $\rightarrow$ RANSAC

Random Sample Consensus (RANSAC)

Sample (randomly) the number of points required to fit the model
Solve for model parameters using samples
Score by the fraction of inliers within a preset threshold of the model

Estimating homography with RANSAC

RANSAC loop

Get (min 4) point correspondences (randomly)
Compute H using DLT
Count inliers
Keep H if largest number of inliers

Recompute H using all inliers

PREVIOUSLocal Features

NEXTRecognition