a.k.a. Scale Invariant Feature Transform

Local Feature matching invariant to

  • scale
  • orientation/rotation
  • illumination/brightness
  • occlusion
  • noise
  • small changes in viewpoint

Keypoint Localization

See Laplacian of Gaussian Edge Detector

  • increasing σ (scale parameter)
  • for each blob, as we increase σ, a peak will emerge and then fade away
  • apply the σ-Normalized Laplacian of the Gaussian (NLoG) using multiple values of σ
  • at some scale, the output will attain a peak
  • characteristic scales (σ) ∝ the size of the blobs
  • Selection of σ:
    • σk = σ0sk, k = 0,1,2,3…
    • s = constant multiplier
    • σ0 = initial scale
  • Fast approximation → DoG
    • DoG = (s-1) NLoG

see Difference of Gaussians Edge Detector

  • detect local maximas obtained

  • Non maximal suppression: Run a NxNxN grid over the stack, if the absolute value of center pixel is significantly larger than the absolute values of its neighbors in its scale and its neighbouring scales, it is declared to be an extremum.

See Non-Maximal Suppression

  • use some contrast thresholding to remove weak extrema

Orientation invariant Region Selection

  • consider a square window of pixels in the blob
  • get image gradient directions
  • principal orientation: most common gradient direction

SIFT Descriptors

  • calculate gradient orientation histogram of each of the four quadrants of the grid and concatenate them → normalized histogram = SIFT descriptor
  • histograms have 8 directions
  • descriptor = 128 elements
  • Comparing SIFT descriptors
    • L2 Norm
      • smaller value = better match
    • Normalized Correlation
        • perfect match when d(H1,H2) = 1
    • Intersection
      • larger value = better match

Applications

Limitations

  • 3D objects
  • different viewpoints, angles