phymath999: HOG computer vision Nonlinear data that lack a vector space structure are commonly encountered in many branches of computer vision.

http://www.robots.ox.ac.uk/~sadeep/files/dicta2013_multi_manifold_mkl.pdf

Nonlinear data that lack a vector space structure are commonly encountered in many branches of computer vision. Examples include normalized histogram vectors [1], [3] and covariance descriptors [2], [4] found in object detection/recognition, diffusion tensors in biomedical image analysis [5], [6], 3D rotation matrices in geometrical computer vision [7] and linear subspaces of the n-dimensional Euclidean space in video based vision [8]. The spaces where such nonlinear data lie lack a vector space structure in the sense that they are either not closed under vector addition and scalar multiplication, or these operations are not defined in them at all. For example, d × d covariance matrices, commonly used as region descriptors in object detection, form a convex cone in the d(d + 1)/2 Euclidean space. This convex cone is not closed under scalar multiplication.

The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, andshape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.

Navneet Dalal and Bill Triggs, researchers for the French National Institute for Research in Computer Science and Automation (INRIA), first described HOG descriptors at the 2005 Conference on Computer Vision and Pattern Recognition (CVPR). In this work they focused on pedestrian detection in static images, although since then they expanded their tests to include human detection in videos, as well as to a variety of common animals and vehicles in static imagery.

Combining Multiple Manifold-valued Descriptors for Improved Object Recognition Sadeep Jayasumana1, 2, Richard Hartley1, 2, Mathieu Salzmann2 , Hongdong Li1 , and Mehrtash Harandi2 1Australian National University, Canberra 2NICTA, Canberra sadeep.jayasumana@anu.edu.au Abstract—We present a learning method for classification using multiple manifold-valued features. Manifold techniques are becoming increasingly popular in computer vision since Riemannian geometry often comes up as a natural model for many descriptors encountered in different branches of computer vision. We propose a feature combination and selection method that optimally combines descriptors lying on different manifolds while respecting the Riemannian geometry of each underlying manifold. We use our method to improve object recognition by combining HOG [1] and Region Covariance [2] descriptors that reside on two different manifolds. To this end, we propose a kernel on the n-dimensional unit sphere and prove its positive definiteness. Our experimental evaluation shows that combining these two powerful descriptors using our method results in significant improvements in recognition accuracy. I. INTRODUCTION Nonlinear data that lack a vector space structure are commonly encountered in many branches of computer vision. Examples include normalized histogram vectors [1], [3] and covariance descriptors [2], [4] found in object detection/recognition, diffusion tensors in biomedical image analysis [5], [6], 3D rotation matrices in geometrical computer vision [7] and linear subspaces of the n-dimensional Euclidean space in video based vision [8]. The spaces where such nonlinear data lie lack a vector space structure in the sense that they are either not closed under vector addition and scalar multiplication, or these operations are not defined in them at all. For example, d × d covariance matrices, commonly used as region descriptors in object detection, form a convex cone in the d(d + 1)/2 Euclidean space. This convex cone is not closed under scalar multiplication. Although the nonlinear data classes stated above do not have a vector space structure and do not adhere to Euclidean geometry, many of them do possess interesting geometries that are studied under a separate branch of mathematics: Riemannian geometry. Riemannian geometry provides tools to extend some Euclidean notions such as inner products and angles between curves, to nonlinear manifolds. Conventionally, computer vision and machine learning algorithms are developed for Euclidean spaces assuming linear (vector space) structure of the data. Utilizing these Euclidean techniques on manifold-valued data is not always straightforward. However, when the nonlinear data at hand lie on a Riemannian manifold, some Euclidean methods can be generalized to the manifoldvalued data using tools provided by Riemannian geometry. NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the ARC through the ICT Centre of Excellence program. This work was supported in part by an ARC grant. A common technique used to generalize Euclidean algorithms to Riemannian manifolds is to first obtain a Euclidean representation of the manifold-valued data by approximating the manifold by the tangent space at some point (usually the sample mean) on the manifold [4]. However, this technique only gives a first order approximation of the manifold and hence results in significant distortion of the original data distribution, specially in the areas far away from the point whose tangent space is used. Moreover, the extensive use of Riemannian tools such as exponential maps and logarithmic maps in such algorithms makes them inefficient. An alternative approach to generalizing Euclidean algorithms to a given manifold is to embed the manifold in a high dimensional Reproducing Kernel Hilbert Space (RKHS) using a positive definite kernel defined on the manifold. This method has drawn significant attention in recent years [9]–[12]. In this approach, the manifold is embedded in a linear Hilbert space, making it possible to utilize Euclidean methods on manifoldvalued data, while simultaneously obtaining a richer, higher dimensional representation of the original data distribution. This approach has shown to perform better than tangent space methods in many instances [10], [12]. In this paper, we address the problem of combining multiple manifold-valued descriptors for improved object recognition. It is well known that using more than one descriptor in a feature selection framework enhances the recognition/detection accuracy significantly [4], [13], [14]. For vector-valued descriptors, a number of feature combination and selection methods, ranging from simple concatenation of multiple features into a single vector to boosting, are available and commonly used. However, combining multiple features lying on different manifolds while respecting the true geometries of the underlying manifolds is not straightforward and has received little attention. Here, we propose a method for combining multiple manifold-valued descriptors via RKHS embedding. We use positive definite kernels defined on the manifolds that account for their true geometries to embed the manifolds in Hilbert spaces and combine features in those Hilbert spaces. As a concrete example, we consider two specific manifolds, the unit n-sphere S n in the n+ 1 dimensional Euclidean space and the Riemannian manifold of d × d Symmetric Positive Definite (SPD) matrices Sym+ d , and show how our method can be used to combine descriptors sampled from these two manifolds. Contributions: The present paper makes two main contributions: First, it proposes a new, provably positive definite kernel on S n which, unlike the usual Gaussian RBF kernel, accounts for the true geometry of the sphere. Second, this paper introduces a method to optimally combine two well-known and extremely successful region descriptors, namely Histogram of Oriented Gradients (HOG) [1] and Region Covariance descriptors [2], [4], in order to improve object recognition. These two descriptor types lie on two different Riemannian manifolds. Our method optimally combines the two manifoldvalued descriptors in order to maximize the object recognition accuracy. It will be shown in our experiments that optimally combining these two powerful region descriptors results in significant improvements in object recognition. II. RELATED WORK A significant amount of research has been done in recent years on generalizing Euclidean computer vision and machine learning techniques to Riemannian manifolds. This includes works in binary classification on a manifold [4], multi-class classification on a manifold [9], [15], clustering [8], dimensionality reduction [12] and interpolation [6]. Most of these works focus on a single manifold with a specific geometry. For example, [4], [6] consider the Riemannian manifold of SPD matrices, [8], [9] consider the Grassmann manifold. Combining or selecting features lying on different manifolds has received very little attention. Two specific Riemannian manifolds encountered very often in computer vision are the unit n-sphere S n and the Riemannian manifold of SPD matrices Sym+ d . Some examples for descriptors lying on S n are famous SIFT descriptors [3], HOG descriptors [1], Local Binary Patterns (LBP) descriptors [16] and any histogram representation in general which is subjected to direct or block l 2 normalization. Examples of descriptors sampled from Sym+ d include Region Covariance descriptors [2], diffusion tensors [6] and structure tensors [12]. Kernels on Sym+ d that account for the Riemannian geometry of the manifold have been proposed in [12]. However, for descriptors lying on S n, which also is a Riemannian manifold, the traditional Euclidean Gaussian RBF is usually employed [1], neglecting the true geometry of the manifold. Histogram of Oriented Gradients (HOG) descriptors were first proposed in [1] for human detection and have subsequently become very popular as region descriptors for object classification and detection. After the mandatory block normalization step, HOG descriptors lie on the n-dimensional sphere of some fixed radius, whose geometry is the same as that of the unit sphere S n. As an alternative region/object descriptor, Region Covariance descriptors first emerged in [2] and thereafter found applications in texture recognition [17], face recognition [17], action recognition [18] and tracking [18]. Covariance descriptors, being SPD matrices, lie on the Riemannian manifold Sym+ d . It has been shown in many occasions that accounting for the geometry of Sym+ d is key to the success of algorithms operating on covariance descriptors [4], [10], [12]. Kernel methods are extensively used in Euclidean spaces mainly for classification with SVM and also for clustering, dimensionality reduction and regression [19], [20]. In recent years, there have been a series of works targeting the generalization of kernel methods to Riemannian manifolds [9]–[12]. The main challenge in generalizing kernel methods to a given Riemannian manifold lies in defining a kernel on the manifold that encodes the nonlinear geometry of the manifold while being positive definite. According to the Mercer’s theorem [19] only a positive definite kernel yields a valid embedding in an RKHS. Moreover, the positive definiteness of the kernel(s) being used is a requirement for the convergence of many popular learning algorithms [14], [21] III. MANIFOLDS AND KERNELS In this section, we briefly review the two manifolds used in the paper and positive definite kernels defined on them that permits us to embed the manifold under consideration in a high dimensional Hilbert space. In differential geometry, a topological manifold, also known simply as a manifold, is a topological space (a set with the notion of neighborhood or open sets) which is locally similar to some Euclidean space. A differentiable manifold is a topological manifold equipped with a globally defined differential structure that allows one to perform calculus on the manifold. Finally, a Riemannian manifold is defined as a differential manifold with a smoothly varying inner product defined on the tangent bundle. The geodesic between two points on a Riemannian manifold can be thought of as the shortest curve connecting the two points without leaving the manifold. Geodesics correspond to straight lines in Euclidean spaces. Therefore, the length of the connecting geodesic, dubbed geodesic distance, is the most suitable distance measure between two points lying on a Riemannian manifold. A. The Unit n-sphere The n-dimensional sphere that has unit radius and is centered at the origin of the n+1 dimensional Euclidean space, denoted by S n, is perhaps the simplest Riemannian manifold after the Euclidean space itself. It inherits a Riemannian metric from its embedding in R n+1. Under this Riemannian metric, the geodesic distance dg between two points x, y ∈ S n is simply the great circle distance between the two points, which is defined formally as dg(x, y) = arccos(x T y), (1) where arccos : [−1, 1] → [0, π] is the usual inverse cosine function. Almost every descriptor used in computer vision that is derived from a histogram is ultimately normalized, either fully or block-wise, using the l 2 norm [1], [3], [16]. The resulting descriptors therefore lie on S n, for some n. When block normalization is used, the radius of the sphere might not be unit, but since any n-dimensional sphere centered at the origin is homeomorphic to S n, their geometries turn out to be exactly the same. For simplicity, one can think of this as scaling the block-normalized vectors by a constant, which does not alter the data in any way. Although the actual geometry of S n is not Euclidean, conventionally only Euclidean kernels, such as the linear kernel and the Gaussian RBF kernel with the Euclidean distance, have been used to perform kernel methods on descriptors lying on S n [1]. In this paper, we propose a provably positive definite kernel that is derived from the geodesic distance on S n. This proposed kernel, named geodesic exponential kernel, permits

phymath999

Tuesday, March 29, 2016

HOG computer vision Nonlinear data that lack a vector space structure are commonly encountered in many branches of computer vision.

No comments:

Post a Comment