Next: A
robust image transform Up: A
radial cumulative similarity Previous: A
radial cumulative similarity
Introduction
Finding corresponding points in image pairs or image sequences is a central
problem in computer vision. Most classical methods assume brightness constancy,
and perform best when tracking high-contrast regions that lie on a single
surface. However, many images have visually important features that violate
this assumption. Developing methods to track corresponding points which
lie on occluding boundaries is necessary if one is to track complicated
objects with multiple articulated surfaces, such as the human face.
Figure 1: Correspondence is difficult
when a uniform surface moves across different background patterns. Consider
the correspondence of window A with windows B or C; traditional robust
methods equate the match between A:B and A:C, since the ``outlier'' regions
in each is equally different.
|
In recent years, robust estimation methods have been applied to image correspondence,
and have been shown to considerably improve performance in cases of occlusion.
Black and Anandan pioneered robust optic flow using redescending error
norms that substantially discount the effect of outliers [1].
Shizawa and Mase derived methods for transparent local flow estimation
[2]. Bhat and Nayar have advocated the use
of rank statistics for robust correspondence [4];
Zabih and Woodfill use ordering statistics combined with spatial structure
in the CENSUS transform [5]. Several authors
have explored methods of finding image "layers" to pool motion information
over arbitrarily shaped regions of support and to iteratively refine parameter
estimates [6,8,7],
but these methods generally assume models of global object motion to define
coherence.
Figure 2: Finding local correspondences
in regions with occlusion is a difficult challenge. (a,e) and (c,g) are
images taken before and after user's expression changes; (b,f) and (d,h)
are enlarged views of corresponding points, with a cross drawn to indicate
the center point of the window. Traditional correspondence methods have
difficulty at points such as these, where there is little foreground texture,
substantial occlusion, and variable sign of contrast at the occlusion boundary.
(a)(b)
(c)(d)
(e)(f)
(g)(h) |
However, these methods make a critical assumption: that there will be sufficient
contrast in the foreground (''inlier'') portion of an analysis window to
localize the correspondence match. This is often not true, due either to
a uniform foreground surface or low-resolution video sampling. This problem
is illustrated in Figure 1, which shows a foreground
region with zero contrast in front of two different background regions;
note that the sign of contrast changes at the occlusion boundary between
the two frames. A example in real imagery is shown in Figure 2;
the marked locations pose a considerable challenge for existing robust
correspondence methods, since any window large enough to include substantial
foreground contrast will include a very large percentage of outliers.
Most robust and non-robust correspondence methods fail when there is
no coherent foreground contrast. Transparent-motion analysis [2,3,9,10]
can potentially detect motion in these difficult cases, but has not, to
date, been able to provide precise spatial localization of corresponding
points. Smoothing methods such as regularization or parametric motion constraints
(affine [11,12,13]
or learned from examples [14]) can
provide approximate localization when good motion estimates are available
in nearby image regions, but this is not always the case. If a corpus of
training images is available, techniques for feature or appearance modeling
can solve these problems, c.f. [18,19].
For many detailed image analysis/synthesis tasks, finding precise correspondences
such as shown in these figures is extremely important. Image compositing
[15], automatic morphing [16],
and video resynthesis [17], all require
accurate correspondence and slight flaws can yield perceptually significant
errors. To obtain good results, authors of these methods have relied on
either extreme redundancy of measurement, human-assisted tracking, substantial
smoothing, or domain-specific feature-appearance models.
In this paper, we describe a new method that can solve the correspondence
tasks illustrated in Figures 1 and 2
using purely local image analysis, without prior training, and without
smoothing or pooling of motion estimates. Our approach defines an image
transform; this transform characterizes the local structure of an image
in a manner insensitive to points in an occluded region (e.g., outliers),
but which is sensitive to the shape of the occlusion boundary itself.
In essence, our method is to perform matching on a redundant, local representation
of image homogeneity. In this paper we show examples where color is the
attribute analyzed for homogeneity, but our method is applicable to other
local image characteristics (such as texture, range data, or simply image
intensity). While we only show sparse tracking results, our method can
readily yield dense correspondences, assuming sufficient image contrast.
Next: A
robust image transform Up: A
radial cumulative similarity Previous: A
radial cumulative similarity
Trevor Darrell
9/9/1998