Image Descriptors
Represent surrounding region mathematically
- Color histogram
- Mean dy/dx values
- Texton histograms
Feature Descriptors should be
- Invariant: shouldn’t change even if image is transformed
- Discriminative: should be highly unique for each point
Raw Image Patch: Intensity
Not invariant to geometry and appearance change
Image Gradients
Sensitive to deformations
Colour Histogram
Invariant to changes in scale and rotation
Not sensitive to the spatial layout
Spatial Histogram
Retains rough spatial layout
Some invariance to deformations
Not invariant to rotation
Multi-Scale Oriented Patches (MOPS)
-
Take 40x40 window around detected interest point (e.g. Harris corner)
-
Subsample every 5th pixel $\rightarrow$ absorbs localization errors
-
Rotate to horizontal $\rightarrow$ robustness to photometric variations
-
Normalize intensity values within the window by subtracting the mean & dividing by the standard deviation.
GIST Descriptor
- Divide image patch into 4x4 cells
- Apply filter bank (Gabor filters) of directional edge detectors.
- Compute filter response averages for each cell.
- Size of descriptor is 4x4xN, where N is size of the filter bank.
GIST encoded rough spatial distribution of the image gradients!
Gabor filters
If scale is small compared to the frequency, the Gabor filters can be used as approximate derivative operators
odd Gabor filter looks a lot like Gaussian Derivative
even Gabor filter looks a lot like Laplacian (2nd derivative)
Scale Invariant Feature Transform (SIFT)
-
Take 16x16 window around detected interest point
window: normalized wrt scale, orientation, and intensity
-
Divide the 16x16 window into a 2x2 case
to add some sensitivity to spatial layout
-
Compute edge orientation (angle of the gradient - 90°) for each pixel
-
Throw out weak edges (threshold gradient magnitude)
-
Create histogram of surviving edge orientations for each cell
-
16 cells * 8 orientations = 128 dimensional descriptor
normalized to 1
Advantages:
-
Invariant to scale & rotation
- Can handle changes in viewpoint
- Up to about 60 degree out of plane rotation
- Can handle significant changes in illumination
- Sometimes even day vs. night (below)
- Fast and efficient
- SURF is an even faster follow-up derived from SIFT
Feature Matching
Feature distance
ratio distance = $\parallel f_1-f_2 \parallel / \parallel f_1-f_2’ \parallel$
matching SIFT descriptors: threshold ratio of nearest to 2nd nearest descriptor
Evaluating Feature Matches
$recall = {#TP \over # TP + # FP} $
$specificity = {#FP \over # TN + # FN} $
$AUC$ Area Under the Receiver Operator Characteristic (ROC) Curve, 1 is the best
Image Projections
Image transformations
Filtering: $G(x) = h{F(x)}$ Changes range of image function (pixel values)
Wrapping: $G(x) = F(h{x})$ Changes domain of image function (pixel locations)
Homography
-
Convert to homogeneous coordinates
-
Multiply by the homography matrix
Special transformation with 8 degrees of freedom relating the two projections.
-
Convert back to heterogeneous coordinates
Direct Linear Transform (DLT)
$P’ = H \cdot P$
One way to find $H$ is via DLT.
$x’ = \alpha (h_1x + h_2y + h_3)$
$y’ = \alpha (h_4x + h_5y + h_6)$
$1 = \alpha (h_7x + h_8y + h_9)$
So that
$h_7xx’ + h_8yx’ + h_9x’ - h_1x - h_2y - h_3 = 0$
$h_7xy’ + h_8yy’ + h_9y’ - h_4x - h_5y - h_6 = 0$
Re-write in matrix form: $A_ih = 0$
Solving for H using DLT
Given ${x_i, x_i’}$ solve for H such that $x’ = Hx$
- For each correspondence, create 2x9 matrix $A_i$
- Concatenate into single 2n x 9 matrix $A$
- Compute singular value decomposition (SVD) of $A=U\sum V^T$
- Store singular vector of the smallest singular value $h=v_i$
- Reshape to get H
Warping
Pixels may end up between two points
To determine the intensity of each point, we distribute color among neighboring pixels
If a pixel (x’, y’) receives intensity from more than one pixels (x, y), we average their intensity contributions.
Drawbacks
- linear least squares estimation, works only when the actual transform between two functions is linear
- doesn’t deal with outliers $\rightarrow$ RANSAC
Random Sample Consensus (RANSAC)
- Sample (randomly) the number of points required to fit the model
- Solve for model parameters using samples
- Score by the fraction of inliers within a preset threshold of the model
Estimating homography with RANSAC
RANSAC loop
- Get (min 4) point correspondences (randomly)
-
Compute H using DLT
-
Count inliers
- Keep H if largest number of inliers
Recompute H using all inliers