Nonlinear data that lack a vector space structure are commonly encountered in many branches of computer vision. Examples include normalized histogram vectors [1], [3] and covariance descriptors [2], [4] found in object detection/recognition, diffusion tensors in biomedical image analysis [5], [6], 3D rotation matrices in geometrical computer vision [7] and linear subspaces of the n-dimensional Euclidean space in video based vision [8]. The spaces where such nonlinear data lie lack a vector space structure in the sense that they are either not closed under vector addition and scalar multiplication, or these operations are not defined in them at all. For example, d × d covariance matrices, commonly used as region descriptors in object detection, form a convex cone in the d(d + 1)/2 Euclidean space. This convex cone is not closed under scalar multiplication.
The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, andshape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.
Navneet Dalal and Bill Triggs, researchers for the French National Institute for Research in Computer Science and Automation (INRIA), first described HOG descriptors at the 2005 Conference on Computer Vision and Pattern Recognition (CVPR). In this work they focused on pedestrian detection in static images, although since then they expanded their tests to include human detection in videos, as well as to a variety of common animals and vehicles in static imagery.
Combining Multiple Manifold-valued Descriptors for
Improved Object Recognition
Sadeep Jayasumana1, 2, Richard Hartley1, 2, Mathieu Salzmann2
, Hongdong Li1
, and Mehrtash Harandi2
1Australian National University, Canberra 2NICTA, Canberra
sadeep.jayasumana@anu.edu.au
Abstract—We present a learning method for classification
using multiple manifold-valued features. Manifold techniques
are becoming increasingly popular in computer vision since
Riemannian geometry often comes up as a natural model for
many descriptors encountered in different branches of computer
vision. We propose a feature combination and selection method
that optimally combines descriptors lying on different manifolds
while respecting the Riemannian geometry of each underlying
manifold. We use our method to improve object recognition by
combining HOG [1] and Region Covariance [2] descriptors that
reside on two different manifolds. To this end, we propose a
kernel on the n-dimensional unit sphere and prove its positive
definiteness. Our experimental evaluation shows that combining
these two powerful descriptors using our method results in
significant improvements in recognition accuracy.
I. INTRODUCTION
Nonlinear data that lack a vector space structure are
commonly encountered in many branches of computer vision.
Examples include normalized histogram vectors [1], [3]
and covariance descriptors [2], [4] found in object detection/recognition,
diffusion tensors in biomedical image analysis
[5], [6], 3D rotation matrices in geometrical computer
vision [7] and linear subspaces of the n-dimensional Euclidean
space in video based vision [8]. The spaces where such
nonlinear data lie lack a vector space structure in the sense
that they are either not closed under vector addition and scalar
multiplication, or these operations are not defined in them at
all. For example, d × d covariance matrices, commonly used
as region descriptors in object detection, form a convex cone
in the d(d + 1)/2 Euclidean space. This convex cone is not
closed under scalar multiplication.
Although the nonlinear data classes stated above do not
have a vector space structure and do not adhere to Euclidean
geometry, many of them do possess interesting geometries
that are studied under a separate branch of mathematics:
Riemannian geometry. Riemannian geometry provides tools to
extend some Euclidean notions such as inner products and
angles between curves, to nonlinear manifolds. Conventionally,
computer vision and machine learning algorithms are
developed for Euclidean spaces assuming linear (vector space)
structure of the data. Utilizing these Euclidean techniques on
manifold-valued data is not always straightforward. However,
when the nonlinear data at hand lie on a Riemannian manifold,
some Euclidean methods can be generalized to the manifoldvalued
data using tools provided by Riemannian geometry.
NICTA is funded by the Australian Government as represented by the
Department of Broadband, Communications and the Digital Economy and
the ARC through the ICT Centre of Excellence program.
This work was supported in part by an ARC grant.
A common technique used to generalize Euclidean algorithms
to Riemannian manifolds is to first obtain a Euclidean
representation of the manifold-valued data by approximating
the manifold by the tangent space at some point (usually the
sample mean) on the manifold [4]. However, this technique
only gives a first order approximation of the manifold and
hence results in significant distortion of the original data
distribution, specially in the areas far away from the point
whose tangent space is used. Moreover, the extensive use of
Riemannian tools such as exponential maps and logarithmic
maps in such algorithms makes them inefficient.
An alternative approach to generalizing Euclidean algorithms
to a given manifold is to embed the manifold in a high
dimensional Reproducing Kernel Hilbert Space (RKHS) using
a positive definite kernel defined on the manifold. This method
has drawn significant attention in recent years [9]–[12]. In this
approach, the manifold is embedded in a linear Hilbert space,
making it possible to utilize Euclidean methods on manifoldvalued
data, while simultaneously obtaining a richer, higher
dimensional representation of the original data distribution.
This approach has shown to perform better than tangent space
methods in many instances [10], [12].
In this paper, we address the problem of combining multiple
manifold-valued descriptors for improved object recognition.
It is well known that using more than one descriptor in a
feature selection framework enhances the recognition/detection
accuracy significantly [4], [13], [14]. For vector-valued descriptors,
a number of feature combination and selection methods,
ranging from simple concatenation of multiple features
into a single vector to boosting, are available and commonly
used. However, combining multiple features lying on different
manifolds while respecting the true geometries of the underlying
manifolds is not straightforward and has received little
attention.
Here, we propose a method for combining multiple
manifold-valued descriptors via RKHS embedding. We use
positive definite kernels defined on the manifolds that account
for their true geometries to embed the manifolds in Hilbert
spaces and combine features in those Hilbert spaces. As a
concrete example, we consider two specific manifolds, the unit
n-sphere S
n in the n+ 1 dimensional Euclidean space and the
Riemannian manifold of d × d Symmetric Positive Definite
(SPD) matrices Sym+
d
, and show how our method can be used
to combine descriptors sampled from these two manifolds.
Contributions: The present paper makes two main contributions:
First, it proposes a new, provably positive definite
kernel on S
n which, unlike the usual Gaussian RBF kernel,
accounts for the true geometry of the sphere. Second, this paper
introduces a method to optimally combine two well-known
and extremely successful region descriptors, namely Histogram
of Oriented Gradients (HOG) [1] and Region Covariance
descriptors [2], [4], in order to improve object recognition.
These two descriptor types lie on two different Riemannian
manifolds. Our method optimally combines the two manifoldvalued
descriptors in order to maximize the object recognition
accuracy. It will be shown in our experiments that optimally
combining these two powerful region descriptors results in
significant improvements in object recognition.
II. RELATED WORK
A significant amount of research has been done in recent
years on generalizing Euclidean computer vision and machine
learning techniques to Riemannian manifolds. This includes
works in binary classification on a manifold [4], multi-class
classification on a manifold [9], [15], clustering [8], dimensionality
reduction [12] and interpolation [6]. Most of these
works focus on a single manifold with a specific geometry. For
example, [4], [6] consider the Riemannian manifold of SPD
matrices, [8], [9] consider the Grassmann manifold. Combining
or selecting features lying on different manifolds has received
very little attention.
Two specific Riemannian manifolds encountered very often
in computer vision are the unit n-sphere S
n and the Riemannian
manifold of SPD matrices Sym+
d
. Some examples
for descriptors lying on S
n are famous SIFT descriptors [3],
HOG descriptors [1], Local Binary Patterns (LBP) descriptors
[16] and any histogram representation in general which
is subjected to direct or block l
2 normalization. Examples of
descriptors sampled from Sym+
d
include Region Covariance
descriptors [2], diffusion tensors [6] and structure tensors [12].
Kernels on Sym+
d
that account for the Riemannian geometry
of the manifold have been proposed in [12]. However,
for descriptors lying on S
n, which also is a Riemannian
manifold, the traditional Euclidean Gaussian RBF is usually
employed [1], neglecting the true geometry of the manifold.
Histogram of Oriented Gradients (HOG) descriptors were
first proposed in [1] for human detection and have subsequently
become very popular as region descriptors for object
classification and detection. After the mandatory block
normalization step, HOG descriptors lie on the n-dimensional
sphere of some fixed radius, whose geometry is the same as
that of the unit sphere S
n. As an alternative region/object
descriptor, Region Covariance descriptors first emerged in [2]
and thereafter found applications in texture recognition [17],
face recognition [17], action recognition [18] and tracking [18].
Covariance descriptors, being SPD matrices, lie on the Riemannian
manifold Sym+
d
. It has been shown in many occasions
that accounting for the geometry of Sym+
d
is key to the
success of algorithms operating on covariance descriptors [4],
[10], [12].
Kernel methods are extensively used in Euclidean spaces
mainly for classification with SVM and also for clustering,
dimensionality reduction and regression [19], [20]. In recent
years, there have been a series of works targeting the generalization
of kernel methods to Riemannian manifolds [9]–[12].
The main challenge in generalizing kernel methods to a given
Riemannian manifold lies in defining a kernel on the manifold
that encodes the nonlinear geometry of the manifold while
being positive definite. According to the Mercer’s theorem [19]
only a positive definite kernel yields a valid embedding in
an RKHS. Moreover, the positive definiteness of the kernel(s)
being used is a requirement for the convergence of many
popular learning algorithms [14], [21]
III. MANIFOLDS AND KERNELS
In this section, we briefly review the two manifolds used
in the paper and positive definite kernels defined on them that
permits us to embed the manifold under consideration in a
high dimensional Hilbert space.
In differential geometry, a topological manifold, also
known simply as a manifold, is a topological space (a set
with the notion of neighborhood or open sets) which is locally
similar to some Euclidean space. A differentiable manifold
is a topological manifold equipped with a globally defined
differential structure that allows one to perform calculus on
the manifold. Finally, a Riemannian manifold is defined as a
differential manifold with a smoothly varying inner product
defined on the tangent bundle.
The geodesic between two points on a Riemannian manifold
can be thought of as the shortest curve connecting the
two points without leaving the manifold. Geodesics correspond
to straight lines in Euclidean spaces. Therefore, the length
of the connecting geodesic, dubbed geodesic distance, is the
most suitable distance measure between two points lying on a
Riemannian manifold.
A. The Unit n-sphere
The n-dimensional sphere that has unit radius and is
centered at the origin of the n+1 dimensional Euclidean space,
denoted by S
n, is perhaps the simplest Riemannian manifold
after the Euclidean space itself. It inherits a Riemannian metric
from its embedding in R
n+1. Under this Riemannian metric,
the geodesic distance dg between two points x, y ∈ S
n is
simply the great circle distance between the two points, which
is defined formally as
dg(x, y) = arccos(x
T y), (1)
where arccos : [−1, 1] → [0, π] is the usual inverse cosine
function.
Almost every descriptor used in computer vision that is
derived from a histogram is ultimately normalized, either fully
or block-wise, using the l
2 norm [1], [3], [16]. The resulting
descriptors therefore lie on S
n, for some n. When block
normalization is used, the radius of the sphere might not be
unit, but since any n-dimensional sphere centered at the origin
is homeomorphic to S
n, their geometries turn out to be exactly
the same. For simplicity, one can think of this as scaling the
block-normalized vectors by a constant, which does not alter
the data in any way.
Although the actual geometry of S
n is not Euclidean,
conventionally only Euclidean kernels, such as the linear kernel
and the Gaussian RBF kernel with the Euclidean distance, have
been used to perform kernel methods on descriptors lying on
S
n [1]. In this paper, we propose a provably positive definite
kernel that is derived from the geodesic distance on S
n. This
proposed kernel, named geodesic exponential kernel, permits
No comments:
Post a Comment