Feature Matching | Object Detection | |
---|---|---|
scoring | Feature matching across pairs of images and not feature detection (e.g. cornerness score). A lower score is a better match, since we use distance measure for comparison. |
Higher -> more likely to be an object |
threshold | Keep matches if below some threshold | Keep detections if above some threshold |
Precision $TP \over TP+FP$ |
How accurate are the feature pairs declared as matches? Lower threshold -> lower FP (also lower TP, but doesn’t matter) -> higher precision |
How accurate are the detections? Higher threshold -> lower FP -> high precision |
Recall $TP \over TP+FN$ |
Was the algorithm able to find all the actual pairs of features? Higher threshold -> more TP (again balanced in numerator and denominator) but also lower FN -> higher recall |
Could we find all the objects? Lower threshold -> less FN -> higher recall |
Specificity $TN \over TN+FP$ |
Can the algorithm correctly disregard the features which are not part of any pair? Lower threshold -> lower FP, no impact on the TNs -> higher specificity |
If threshold too high:
high precision: few false positives
low recall: many false negatives
Motion Estimation
Key Assumptions
-
Color Constancy
Brightness constancy for intensity images
Implication: allows for pixel to pixel comparison (not image features)
$I(x(t), y(t), t) = C$
-
Small Motion
Pixels only move a little bit
Implication: linearization of the brightness constancy constraint
$I(x + u\delta t, y+v\delta t, t + \delta t) = I(x, y, t)$
Approach
Look for nearby pixels with the same color
$I(x + u\delta t, y+v\delta t, t + \delta t) = I(x, y, t)$
$I(x, y, t) + {\partial I \over \partial x}\delta x + {\partial I \over \partial y}\delta y + {\partial I \over \partial t}\delta t = I(x, y, t)$
${\partial I \over \partial x}\delta x + {\partial I \over \partial y}\delta y + {\partial I \over \partial t}\delta t = 0$
$I_xu + I_yv + I_t = 0$
Horn-Schunck Optical Flow | Lucas-Kanade Optical Flow |
---|---|
brightness constancy, small motion | method of differences |
smooth flow (flow can vary from pixel to pixel) | constant flow (flow is constant for all pixels) |
global method (dense) |
local method (sparse) |
Direct, dense methods - Directly recover image motion at each pixel from spatio-temporal image brightness variations - Dense motion fields, but sensitive to appearance variations - Suitable for vedio and when image motion is small |
Feature-based methods - Extract visual features (corners, textured area) and track them over multiple frames - Sparse motion fields, but more robust tracking - Suitable when image motion is large (10s of pixels) |
Lucas-Kanade Optical Flow
Assumption
$I_xu + I_yv + I_t = 0$
Assume that the surrounding patch has constant flow
$\begin{bmatrix} I_x(p_1) & I_y(p_1) \\ I_x(p_2) & I_y(p_2) \\ \vdots &\vdots \\ I_x(p_{25}) & I_y(p_{25})\end{bmatrix} \begin{bmatrix}u \\ v\end{bmatrix} = -\begin{bmatrix} I_t(p_1) \\ I_t(p_2) \\ \vdots \\ I_t(p_25) \end{bmatrix}$
$Ax = b$
Least Squares Approximation: $A^TA\hat x = A^Tb$
$x = (A^TA)^{-1}A^Tb$
- $A^TA$ should be invertible
- $A^TA$ shouldn’t be too small ($\lambda_1$ and $\lambda_2$ shouldn’t be too small)
- $A^TA$ should be well conditioned ($\lambda_1 / \lambda_2$ shouldn’t be too large)
$A^TA$ was introduced in Harris Corner Detector
- Corners are when λ1, λ2 are big; this is also when Lucas-Kanade optical flow works best
- Corners are regions with two different directions of gradient (at least)
- Corners are good places to compute flow!
Aperture Problem
Small visible image patch of line cannot tell the direction of movement
Want patches with different gradients to the avoid aperture problem
Aliasing
Temporal aliasing causes ambiguities since images can have many pixels with the same intensity and lead to wrong ‘correspondences’
Coarse-to-Fine Optical Flow Estimation
run iterative L-K -> wrap & upsample -> run iterative L-K -> …
Horn-Schunck Optical Flow
For every pixel,
Enforce brightness constancy: $min_{u,v}[I_xu_{ij} + I_yv_{ij} + I_t]^2$
Enforce smooth flow field: $min_u(u_{i,j}-u_{i+1,j})^2$
$min_{u,v}\sum_{i,j}{E_s(i,j)+\lambda E_d(i,j)}$
$E_s(i,j) $: smoothness
$E_d(i,j)$: brightness constancy
$\lambda$: weight
Compute partial derivative, derive update equations
Applications for Optical Flow
- Segmentation of objects in space or time
- Estimating 3D structure
- Learning dynamical models – how things move
- Recognizing events and activities
- Improving video quality
Errors in assumptions
- A point does not move like (all) its neighbors, e.g. at object boundaries
- Brightness constancy does not (always) hold
- The motion is large (larger than a pixel)
- Not-linear: Iterative refinement
- Local minima: coarse-to-fine estimation