Regions & Boundaries

Boundaries

Why extract edges?

Resilient to lighting and color -> useful for recognition

Cue for shape and geometry -> useful for recognition, understanding 3D

What are image edges?

Very sharp discontinuities in the intensity

How to detect edges?

Edges correspond to extrema of derivative. Since images are discrete, we need to take finite differences.

Filters

The Sobel Filter

Horizontal sobel filter

Horizontal gradients -> vertical lines

$\begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix} * \begin{bmatrix} 1 & 0 & -1\end{bmatrix}$

$\begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix}$ is bulrring matrix

$\begin{bmatrix} 1 & 0 & -1\end{bmatrix}$ is 1D derivative filter

Vertical sobel filter

Vertical gradients -> horizontal lines

$\begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix} * \begin{bmatrix} 1 & 2 & 1\end{bmatrix}$

Computing Image Gradients

Select derivative filters.
Convolve with the image to comput derivatives

$\frac{\partial f}{\partial x} = S_x \otimes f$

$\frac{\partial f}{\partial y} = S_y \otimes f$
Form the image gradient, and compute its direction and magnitude.

gradient: $\bigtriangledown f = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}]$

A vector that points in the direction of most rapid change in intensity

gradient direction: $\theta = tan^{-1} (\frac{\partial f}{\partial y}/\frac{\partial f}{\partial x})$

gradient magnitude: $|\bigtriangledown f| = \sqrt{(\frac{\partial f}{\partial x})^2 + (\frac{\partial f}{\partial y})^2}$

Edge strength

Effects of Noise

Noise is high frequency
Differentiation accentuates noise, very sensitive to noise

Must blur first before using derivative filters, then look for peaks in $\frac {\partial}{\partial x} (h \otimes f)$

However, smoothing removes the noise, but also blurs the edge.

$\because \frac {\partial}{\partial x} (h \otimes f) =\frac {\partial}{\partial x} (h) \otimes f$ by differentiation property of convolution, we can save operations by computing derivation of Gaussian first.

$\because$ Differentiation is a convolution operation

$\therefore (h \otimes g) \otimes f$ where $g = \begin{bmatrix}1 & 0 & -1\end{bmatrix}$

Which $\delta$ should we use? (for Gaussian kernel)

Larger $\delta$ detects longer-scale edges, smaller $\delta$ detects finer edges.

Large $\delta \rightarrow$ Good detection (high SNR). Poor localization

Small $\delta \rightarrow$ Poor detection (low SNR). Good localization

Smoothing to suppress noise. Increase contrast to enhance edges.

Canny Edge Detection

Edge Thining

Non-Maximum Suppression

For each pixel, check if it is a local max along the gradient direction. If so, keep as edge, if not, discard (set to 0).

Hysteresis Thresholding

Use a high threshold to start edge curves, and a low threshold to continue growing them.

Mark pixels that clear high threshold as boundary
If there are pixels that clear low threshold and are connected to a boundary, mark them as boundary too.

Region

K-means Clustering

Randomly initialized the k cluster centers, and iterate between assigning membership and computing cluster centers

Pros

Simple, fast to compute
Converges to local minimum of within-cluster squared error

Cons/issues:

Setting k
Sensitive to initial centers
Sensitive to outliers
Detects spherical clusters
Assumes means can be computed

Mean Shift Algorithm

Find modes or local maxima of density in the feature space

Kernel Density Estimation

A kernel is a function that satisfies:

$f_{R^d}\phi (x) = 1$
$\phi(x)≥0$

Examples:

Rectangular: $\phi(x) = 1$ for $a ≤x≤b$
Gaussian: $\phi(x) = e^{-{x^2 \over 2\sigma^2}}$
Epanechnikov: $\phi(x) = {3 \over 4}(1-x^2)$ for $\mid x\mid ≤1$

Technical Details

$\hat {f_K} = {1 \over nh^d} \sum_{i=1}^nK({x-x_i\over h})$

$K(x) = c_kk(| x | ^2)$

$\nabla \hat f(x) = {2c_{k,d} \over nh^{d+2}} [\sum_{i=1}^ng(| {x-x_i \over h} |^2)][{\sum_{i=1}^n x_ig(| {x-x_i \over h} |^2) \over \sum_{i=1}^n g(| {x-x_i \over h} |^2)} -x]$

$K(x)$: the radially symmetric kernel to estimate density

$h$: bandwidth, defines the radius of kernel

$c_k$: a normalization constant

$g(x)$: $-k’(x)$ denotes the derivative of the selected kernel profile

We expect $\nabla \hat f(x)$ to be 0

${2c_{k,d} \over nh^{d+2}} [\sum_{i=1}^ng(| {x-x_i \over h} |^2)]$ unlikely to be 0, because it’s proportional to density estimate at x

The mean shift procedure from a give point $x_t$ is:

Compute the mean shift vector m: $[{\sum_{i=1}^n x_ig(| {x-x_i \over h} |^2) \over \sum_{i=1}^n g(| {x-x_i \over h} |^2)} -x]$
Translate the density window: $x_i^{t+1} = x_i^t + m(x_i^t)$
Iterate steps 1 and 2 until convergence, i.e. $\nabla f(x_i) = 0$

Pros

General algorithm for mode-finding (we apply it towards segmentation)
Model-free; no prior assumptions on cluster shape or distribution of data (spherical, elliptical, etc.)
One parameter (window size / bandwidth h, which has a physical meaning)
Finds variable number of modes (vs. pre-specified k in k-means)
Robust to outliers

Cons

Output depends on h and selecting h is non-trivial
Computationally expensive
Scales poorly with feature space dimension

Computational Complexity Improvement

Assign all points within radius $r$ of end point to mode
Assign all points within radius $r/c$ of search path to mode

PREVIOUSDigital Image Representations & Image Filtering

NEXTTextures