Tuesday, March 22, 2016

manifold premier For a real-valued function fff with domain SSS, argminx∈Sf(x)subscriptxSfx\arg\min_{{x\in S}}f(x) is the set of elements in SSS that achieve the global minimum in SSS,

Manifold premier


algorithms - Explanation on arg min - Mathematics Stack ...

math.stackexchange.com/.../explanation-on-arg-min
Stack Exchange
Nov 2, 2012 - arg min is argument of the minimum. The simplest example is. $arg min_{x} f(x)$ is the value of $x$ for which $f(x)$ attains it's minimum.

optimization - Notation: what is "arg min" - Mathematics ...

math.stackexchange.com/.../notation-what-is-arg-min
Stack Exchange
May 8, 2014 - There is an explanation on arg min in our archives. The answer may be useful to you. What is the language? Just curious. That would indicate ...

Arg max - Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Arg_max
Wikipedia
arg min (or argmin) stands for argument of the minimum, and is defined analogously. For instance, are points x for which f(x) attains its smallest value. The complementary operator is, of course, min.
Definition  -  ‎Arg min  -  ‎See also  -  ‎Notes

arg min and arg max | planetmath.org

planetmath.org/argminandargmax
Jul 8, 2004 - arg ⁢ min x ∈ S ⁡ f ⁢ ( x ) = { x ∈ S : f ⁢ ( x ) = min y ∈ S ⁡ f ⁢ ( y ) } . subscript x S f x conditional-set x S f x subscript y S f y {\arg\min}_{{x\in ...

    People also ask

How to Add the arg max or arg min Functions to Lyx ...

www.briandalessandro.com/.../how-to-add-the-arg-max-and-arg-min-fu...
Oct 30, 2011 - The arg max and arg min functions are not standard as math functions in LaTeX. As a result, they are not straightforward to use in Lyx either.

An example of implementing the arg min operator in AMPL

lyle.smu.edu/~olinick/emis8373/lectures/ampl/argmin.txt
# An example of implementing the arg min operator in AMPL set S := 1 .. 10; param b {S}; let b[1] := 2; let b[2] := 1; let b[3] := 4; let b[4] := 7; let b[5] := 6; let b[6] := 8 ...

How to get arg min of an N-dimensional Matrix ? - MATLAB ...

www.mathworks.com/.../72137-how-to-get-arg-min-of-an-n...
MathWorks
Apr 15, 2013 - [m n] = min(M(:)); [x y z] = ind2sub(size(M),n);. x, y, z will be the indexes (i.e., argmin) of the minimum value of your 3D array M.

can you tell me what is this math equation? "argmin" - Forum for ...

www.edaboard.com › ... › Digital communication
Oct 3, 2011 - 1 post
... can you tell me what is this math equation? "argmin". when you say that x=arg minf(y) this means that take x=y such that f(y) is the minimum.

arg min的含义是什么? - 杰- C++博客

www.cppblog.com/guijie/archive/2010/.../136273.asp...
Translate this page
Dec 13, 2010 - arg min (or argmin) is defined analogously. Note also that functions do not in general attain a maximum value, and hence will in general not ...

LaTeX Best Practice: How to write argmax and argmin

latexbestpractice.blogspot.com/.../how-to-write-argmax-and-argmin.html
May 31, 2012 - Note that the first version is more for inline code, while the second is rather for equation. Naturally, arg min is written analogously. Don't forget to ...
arg min and arg max

Primary tabs

·         View(active tab)
·         Coauthors
·         PDF
·         Source

arg min and arg max

For a real-valued function ff with domain SS, subscriptxSfx\arg\min_{{x\in S}}f(x) is the set of elements in SS that achieve the global minimum in SS,
subscriptxSfxconditional-setxSfxsubscriptySfy{\arg\min}_{{x\in S}}f(x)=\{x\in S:\,f(x)=\min_{{y\in S}}f(y)\}.
subscriptxSfx\arg\max_{{x\in S}}f(x) is the set of elements in SS that achieve the global maximum in SS,
subscriptxSfxconditional-setxSfxsubscriptySfy{\arg\max}_{{x\in S}}f(x)=\{x\in S:\,f(x)=\max_{{y\in S}}f(y)\}.
Defines: 


Jean-Luc Starck, Fionn Murtagh, Emmanuel J. Candes, and David L. Donoho. Gray and color image contrast enhancement by the curvelet transform. IEEE Transactions on Image Processing, 12(6):706–717, 2003.

wavelet coefficients, wavelet compression

A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then decreases back to zero. It can typically be visualized as a "brief oscillation" like one might see recorded by a seismograph or heart monitor.

MULTISCALE REPRESENTATIONS FOR MANIFOLD-VALUED DATA INAM UR RAHMAN, IDDO DRORI, VICTORIA C. STODDEN DAVID L. DONOHO, PETER SCHRODER ¨ † Abstract. We describe multiscale representations for data observed on equispaced grids and taking values in manifolds such as: the sphere S2, the special orthogonal group SO(3), the positive definite matrices SPD(n), and the Grassmann manifolds G(n, k). The representations are based on the deployment of Deslauriers-Dubuc and Average-Interpolating pyramids ‘in the tangent plane’ of such manifolds, using the Exp and Log maps of those manifolds. The representations provide ‘wavelet coefficients’ which can be thresholded, quantized, and scaled much as traditional wavelet coefficients. Tasks such as compression, noise removal, contrast enhancement, and stochastic simulation are facilitated by this representation. The approach applies to general manifolds, but is particularly suited to the manifolds we consider, i.e. Riemannian symmetric spaces, such as Sn−1, SO(n), G(n, k), where the Exp and Log maps are effectively computable. Applications to manifold-valued data sources of a geometric nature (motion, orientation, diffusion) seem particularly immediate. A software toolbox, SymmLab, can reproduce the results discussed in this paper.


: "广义相对性原理 (wyle, 杨进一步发展, ()局域规范变换不变規範場)自然界定律作了一些广泛而具明确性的限制"

super simply speaking: 广义相对性原理 in a manifold such as GR lorentz manifold, means some kind of gauge field exists, but for a local observer, all he can observe is using a locally transformed gauge and and there for his observations have to be modified to be "right", in the context of this "gauge field"

now, this manifold has find its way into machine learning, AI and everything else, as it should be.

My near term goal is to have your lab boss (or somebody else) start to appreciate my work and hire me as a contractor/consultant, as a start. 

Part I. manifold in a nutshell 

1.     General concepts
微分几何中,黎曼几何研究具有黎曼度量的光滑流形,即流形切空间上二次形式的选择。它特别关注于角度、弧线长度及体积。把每个微小部分加起来而得出整体的数量

二次型的系统研究是从18纪开始的,它起源于对二次线和二次曲面的分类问题的讨论,将二次线和二次曲面的方程变形,选有主轴方向的轴作为坐标轴以简化方程的形状,这个问题是在18纪引进的
高斯独立发现了二项式定理的一般形式、数论上的二次互反律、素数定理、及算-几何平均数
柯西在其著作中给出结论:当方程是标准型时,二次曲面用二次型的符号来进行分类.

严谨地说, 黎曼几何, Riemann manifold, 研究"维弯曲空间"二次形的度量, 如果不是"弯曲高维空间"二次形的度量,  那就是希尔伯特空, etc.

I have managed to come to and earning this beautiful  characterization of Riemann manifold at concept level  after years of hard studying of physics and math, covering almost all major areas of them. 


the following is a google translation:

Differential geometry, or Riemann geometry,  Riemann metric  manifold is all about the "quadratic form". It is particularly concerned about the angle, arc length and volume in a high dimensional and curved space such as that of general relativity  Lorentz manifold. 

Studies of  "quadratic form" started at the beginning of the 18th century, 
Gauss independently discovered the general form of the binomial theorem, "Quadratic Reciprocity Law" on number theory, prime number theorem, and arithmetic - geometric mean.
, and the famous CauchySchwarz inequality, etc.  

Hilbert space studies the "quadratic form" in a "high-dimensional but curved space", however,  biology and many social systems all reside in high-dimensional and curved space, and we have to figure out how to model it. 

2. How statistics got started: modeling and measuring of linear systems experiencing disturbances and stresses 

(https://dspace.lboro.ac.uk/dspace-jspui/bitstream/2134/20203/1/Thesis-2016-Oltean.pdf)
"· unsuccessful predictions of stock prices made by sir Isaac Newton and, consequently, his terrible loss in 1720 of 20000 pounds in South Sea speculation bubble. · in 1738, Daniel Bernoulli introduced the idea of utility in order to describe preferences of people and consumer satisfaction. · successful management of the fund for the widows of Goettingen professors, performed by Carl Friedrich Gauss. · Giovanni Ceva published an essay “A mathematical approach of money” in 1711. · Laplace in his work “Essai philosophique sur les probabilites” (1812) showed that what apparently might seem random and unpredictable (such as number of letters in the Paris dead-letter office) is predictable and obeys a simple law. · Adolphe Quetelet (a former student of Fourier) studied the existence of patterns in data sets ranging from the frequency of different methods for committing murder to the chest size of Scottish men. It was him who coined the term “social physics” in 1835. 2 · explanation of the Brownian random walk and the formulation of the ChapmanKolmogorov condition for Markovian processes by Louis Bachelier in his PhD thesis on the theory of speculation. This was done 5 years before the work of Smoluchowski and Einstein on diffusion, based on the observations of price changes at Paris stockmarket. · Italian physicist Ettore Majorana wrote in 1936 a paper based on analogies between statistical physics laws and the ones from social sciences  "

Boltzmann was very explicit: “The molecules are like individuals, ... and the properties of gases only remain unaltered, because the number of these molecules, which on average has a given state, is constant. In “Populäre Schriften” , Boltzmann said “This opens a broad perspective, if we do not only think of mechanical objects. Let us consider the application of this method to the statistics of living beings, society, sociology and so forth.” 


Regression in general
https://www-sop.inria.fr/asclepios/events/MFCA11/Proceedings/MFCA11_3_1

2 Multiple Linear Regression Before formulating geodesic regression on general manifolds, we begin by reviewing multiple linear regression in R n. Here we are interested in the relationship between a non-random independent variable X R and a random dependent variable Y taking values in R n. A multiple linear model of this relationship is given by Y = α + Xβ + , (1) where α R n is an unobservable intercept parameter, β R n is an unobservable slope parameter, and  is an R n-valued, unobservable random variable representing the error. Geometrically, this is the equation of a one-dimensional line through R n (plus noise), parameterized by the scalar variable X. For the purposes of generalizing to the manifold case, it is useful to think of α as the starting point of the line and β as a velocity vector. Given realizations of the above model, i.e., data (xi , yi) R × R n, for i = 1, . . . , N, the least squares estimates, ˆα, β, ˆ for the intercept and slope are computed by solving the minimization problem (ˆα, βˆ) = arg min (α,β) X N i=1 kyi − α − xiβk 2 . (2) This equation can be solved analytically, yielding βˆ = 1 N Pxi yi − x¯ y¯ Px 2 i − x¯ 2 , αˆ = ¯y − x¯ β, ˆ where ¯x and ¯y are the sample means of the xi and yi , respectively. If the errors in the model are drawn from distributions with zero mean and finite variance, then these estimators are unbiased and consistent. M yi f (x ) = Exp(p, xv) p v Fig. 1. Schematic of the geodesic regression model.


3. It is still all about "quadratic form": sigmoid function as the backbone of logistic regression

The probability distribution used is cumulative logistic distribution which is applied to cumulated income, expenditure, or wealth on one hand and also to cumulated probabilities on the other hand. Logistic function or sigmoid function is defined as 𝑓(𝑥) = 𝐿 1+𝑒𝑥𝑝𝑘(𝑥𝑥0) (2.1) where L is the curve's maximum value, x0 is the x-value of the sigmoid's midpoint, and k = the steepness of the curve[85]. Logistic map, which is the basis for logistic function, is used to show how complex, chaotic behaviour can arise from very simple non-linear dynamical equations [86]. We use logistic cumulative probability distribution C(x), which is defined as the integral C(x) = ∫ P(x)dx x −∞ (2.2)


www.iro.umontreal.ca/~lisa/publications2/.../205
1.      
2.      
Université de Montréal
Apr 3, 2009 - quadratic units was strongest in conjunction with sparse and ... Equation 1 looks sigmoidal as a function of E, but the sharpness of the  ...
radar.oreilly.com/.../its-not-exponential-its-sigmoi.ht...
1.      
2.      
O’Reilly Media, Inc.
Nov 26, 2007 - Exponential vs. linear or quadratic curves. ... In fact, one of the most important sigmoidal functions is the logistic function, originally developed to  .


4. algorithm of machine learning and AI: putting all there together: statistics on manifold.




"As is common in applications, we use the Karcher mean. In practice, the Karcher mean can be efficiently computed using an iterative algorithm [113]. Let µ denote the intrinsic mean. The value the (sample) Fr´echet function attains at µ, 1 N X N i=1 d(µ, pi) 2 , (2.82) is called the geodesic variance. The (sample) covariance is defined through the Euclidean (sample) covariance of the 69 data as expressed in TµM: Cov({pi} N i=1) def = 1 N − 1 X N i=1 Logµ (pi)Logµ (pi) T . (2.83) Note that the point of tangency is the intrinsic mean, µ. This echoes (and in fact, generalizes) the construction of the Euclidean (sample) covariance in a Euclidean space, which is built from summing outer-products of vectors following the subtraction of the Euclidean (sample) mean: Cov({pi} N i=1) def = 1 N − 1 X N i=1 Ñ pi − 1 N X N j=1 pj é Ñpi − 1 N X N j=1 pj éT"


I kind of completed reading of "dam.brown.edu/people/freifeld//phd/ThesisOrenFreifeld.pdf"  which I started reading last Friday, and went through most of it Saturday at oakland air port.

the paragraph I quoted really "ring the bell": basically, geometrical "distance" is the backbone of  "statistical mean", and all kind of statistical variances can be expressed as some kind of products of vectors, with inner product of vectors defining your "distance" measure in a scalar field or classical vector field, and outer product of vector becomes more challenging when in a vector field such as Maxwell's electromagnetic field . 


and wrote before, when modeling a system residing in a high dimensional and curved space, one has to advance study of scalar and vector field into Riemann manifold, and advance statistics as we know into Riemann manifold, and this is exactly starting happening now, with still very few literature available.

one of difficulties is, as I wrote before, Riemann differential (or smooth enough so we can perform calculus and statistics) manifold is not math only, it is a multi discipline of advanced mathematics and physics such as general relativity and gauge field theory, which is still mind challenging to most physicists of linear physics disciplines, not to mention all other scientists outside of physics. 

the world of science is actually very interesting, with many of them not really knowing what they are doing?

5. topological approach to manifold


http://stat.fsu.edu/~anuj/pdf/papers/Y2009/TuragaChapterVideoManifolds.pdf
. Boothby, W.M.: An introduction to differentiable manifolds and Riemannian geometry. Academic Press Inc (1975)
Spivak, M.: A Comprehensive Introduction to Differential Geometry, Volume 1. Publish or Perish, Inc (1970)
3 Introduction to Manifolds We shall first start with the topological definition of a manifold in terms of charts and atlases. Using them, we will show that R n is indeed a differentiable manifold. Then, we state a theorem that defines sub-manifold of a manifold as a solution of an equation. This shall be specialized to the case of manifolds that are actually sub-manifolds of R n, arising as solutions of an equation in R n with some conditions. Furthermore, we will establish the notions of tangent vectors and tangent spaces on non-Euclidean manifolds. This will then allow the use of classical statistical methods on the tangent planes via the exponential map and its inverse. We shall provide specific examples to illustrate these notions. 3.1 General Background from Differential Geometry We start by considering the definition of a general differentiable manifold. The material provided here is brief and by no means comprehensive. We refer the interested readers to two excellent books [9][38] for a more detailed introduction to differential geometry and manifold analysis. A topological space M is called a differentiable manifold if, amongst other properties, it is locally Euclidean. This means that for each p M, there exists an open neighborhood U of p and a mapping φ : U → R n such that φ(U) is open in R n and φ : U → φ(U) is a diffeomorphism. The pair (U, φ) is called a coordinate chart for the points that fall in U; for any point y U, one can view the Euclidean coordinates φ(y) = (φ1(y), φ2(y), . . . , φn(y)) as the coordinates of y. The dimension of the manifold M is n. This is a way of flattening the manifold locally. Using φ and φ −1 , one can move between the sets U and φ(U) and perform calculations in the more convenient Euclidean space. If there exists multiple such charts, then they are compatible, i.e. their compositions are smooth. We look at the some simple manifolds as examples. Statistical Analysis on Manifolds and its applications to Video Analysis 7 Example 1. (R n is a manifold) 1. The Euclidean space R n is an n-dimensional differentiable manifold which can be covered by the single chart (R n, φ), φ(x) = x. 2. Any open subset of a differentiable manifold is itself a differentiable manifold. A well known example of this idea comes from linear algebra. Let M(n) be the set of all n × n matrices; M(n) can be identified with the set R n×n and is, therefore, a differentiable manifold. Define the subset GL(n) as the set of non-singular matrices, i.e. GL(n) = {A M(n)| det(A) 6= 0}, where det(·) denotes the determinant of a matrix. Since GL(n) is an open subset of M(n), it is also a differentiable manifold. Fig. 1 Figure illustrating the notions of tangent spaces, tangent vectors, and geodesics In order to perform differential calculus, i.e. to compute gradients, directional derivatives, critical points, etc., of functions on manifolds, one needs to understand the tangent structure of those manifolds. Although there are several ways to define tangent spaces, one intuitive approach is to consider differentiable curves on the manifold passing through the point of interest, and to study the velocity vectors of these curves at that point. To help visualize these ideas, we illustrate the notions of tangent planes, geodesics in figure 1. More formally, let M be an n-dimensional manifold and, for a point p M, consider a differentiable curve γ : (−ǫ, ǫ) → M such that γ(0) = p. The velocity ˙γ(0) denotes the velocity of γ at p. This vector has the same dimension as the manifold M itself and is an example of a tangent vector to M at p. The set of all such tangent vectors is called the tangent space to M at p. Even though the manifold M maybe nonlinear, the tangent space Tp(M) is always linear and one can impose probability models on it using more traditional approaches. Example 2. 1. In case of the Euclid



6. algebra approach to manifold



3.1 Manifolds Everyone knows what a curve is, until he has studied enough mathematics to become confused through the countless number of possible exceptions —Felix Klein In this section we introduce one more actor in multivariable calculus. So far, our mappings have been first linear, then nonlinear with good linear approximations. But the domain and codomain of our mappings have been “flat” open subsets of Rn. Now we want to allow “nonlinear Rn’s,” called smooth manifolds. These familiar objects are by no means simple: already, the theory of soap bubbles is a diffi- cult topic, with a complicated partial differential equation controlling the shape of the film. Manifolds are a generalization of the familiar curves and surfaces of everyday experience. A one-dimensional manifold is a smooth curve; a twodimensional manifold is a smooth surface. Smooth curves are idealizations of things like telephone wires or a tangled garden hose. Particularly beautiful smooth surfaces are produced when you blow soap bubbles that wobble and slowly vibrate as they drift through the air. Other examples are shown in figure 3.1.2. Figure 3.1.1. Felix Klein (1849–1925) Klein’s work in geometry “has become so much a part of our present mathematical thinking that it is hard for us to realise the novelty of his results.”—From a biographical sketch by J. O’Connor and E. F. Robertson. Klein was also instrumental in developing Mathematische Annalen into one of the most prestigious mathematical journals. Figure 3.1.2. Four surfaces in R3. The top two are graphs of functions. The bottom two are locally graphs of functions. All four qualify as smooth surfaces (two-dimensional manifolds) under definition 3.1.2. We will define smooth manifolds mathematically, excluding some objects that we might think of as smooth: a figure eight, for example. We will see how to use the implicit function theorem to tell whether the locus defined 3.1 Manifolds 285 by an equation is a smooth manifold. Finally, we will compare knowing a manifold by equations, and knowing it by a parametrization. Smooth manifolds in Rn When is a subset X Rn a smooth manifold? Our definition is based on the notion of graph. Remember from the discussion of set theory notation (section 0.3) that A×B is the set of pairs (a, b) with a A and b B. Here x is a point in Rn and y is a point in Rm. The graph of such a function consists of points µ x f(x) ¶ in Rn+m. (This is the simplest way to think of it, with the n active variables coming first, followed by the m passive variables. A manifold M embedded in Rn, denoted M Rn, is sometimes called a submanifold of Rn. Strictly speaking, it should not be referred to simply as a “manifold,” which can mean an abstract manifold, not embedded in any space. The manifolds in this book are all submanifolds of Rn. With this definition, which depends on chosen coordinates, it isn’t obvious that if you rotate a smooth manifold it is still smooth. We will see in theorem 3.1.16 that it is. Definition 3.1.1 (Graph). in The graph Γ(f) of a function f : Rn → Rm is the set of pairs (x, y) (Rn × Rm) such that f(x) = y. You are familiar with graphs of functions f : R → R; most often we graph such functions with the horizontal x-axis corresponding to the input, and the vertical axis corresponding to values of f at different x. Note that the graph of such a function is a subset of R2. For example, the graph of f(x) = x2 consists of the points µ x f(x) ¶ R2, i.e., the points µ x x2 ¶ . The top two surfaces shown in figure 3.1.2 are graphs of functions from R2 to R: the surface on the left is the graph of f ³ x y ´ = x3 − 2xy2; that on the right is the graph of f ³ x y ´ = x2 + y4. Although we depict these graphs on a flat piece of paper, they are actually subsets of R3. The first consists of the points   x y x3 − 2xy2  , the second of the points   x y x2 + y4  . More generally, the graph of a function f lives in a space whose dimension is the sum of the dimensions of the domain and codomain of f: the graph of a function f : Rn → Rm is a subset of Rn × Rm = Rn+m. Definition 3.1.2 says that if such a function f : Rn → Rm is C1, then its graph is a smooth n-dimensional manifold in Rn+m. Thus the top two graphs shown in figure 3.1.2 are two-dimensional manifolds in R3. But the torus and helicoid shown in figure 3.1.2 are also two-dimensional manifolds. Neither one is the graph of a single function expressing one variable in terms of the other two. But both are locally graphs of functions. Definition 3.1.2 (Smooth manifold in in Rn). A subset M Rn is a smooth k-dimensional manifold if locally it is the graph of a C1 mapping expressing n − k variables as functions of the other k variables. Generally, “smooth” means “as many times differentiable as is relevant to the problem at hand.” In this and the next section, it means “of class C1.” (Some authors use “smooth” to mean C∞: “infinitely many times differentiable.” For our purposes this is overkill.) When speaking of smooth manifolds, we often omit the word smooth. 286 Chapter 3. Manifolds, Taylor polynomials, quadratic forms, curvature Especially in higher dimensions, making some kind of global sense of a patchwork of graphs of functions can be quite challenging indeed; a mathematician trying to picture a manifold is rather like a blindfolded person who has never met or seen a picture of an elephant seeking to identify one by patting first an ear, then the trunk or a leg. It is a subject full of open questions, some fully as interesting and demanding as, for example, Fermat’s last theorem, whose solution after more than three centuries aroused such passionate interest. Three-dimensional and fourdimensional manifolds are of particular interest, in part because of applications in representing spacetime. “Locally” means that every point x M has a neighborhood U Rn such that M ∩U (the part of M in U) is the graph of a mapping expressing n − k of the coordinates of each point in M ∩ U in terms of the other k. This may sound like an unwelcome complication, but if we omitted the word “locally” then we would exclude from our definition most interesting manifolds. We already saw that neither the torus nor the helicoid of figure 3.1.2 is the graph of a single function expressing one variable as a function of the other two. Even such a simple curve as the unit circle is not the graph of a single function expressing one variable in terms of the other. In figure 3.1.3 we show another smooth curve that would not qualify as a manifold if we required it to be the graph of a single function expressing one variable in terms of the other; the caption justifies our claim that this curve is a smooth curve. I J I J 1 1 Figure 3.1.3. Above, I and I1 are intervals on the x-axis; J and J1 are intervals on the y-axis. The darkened part of the curve in the shaded rectangle I × J is the graph of a function expressing x I as a function of y J, and the darkened part of the curve in I1 × J1 is the graph of a function expressing y J1 as a function of x I1. (By decreasing the size of J1 a bit, we could also think of the part of the curve in I1 × J1 as the graph of a function expressing x I1 as a function of y J1.) But we cannot think of the darkened part of the curve in I × J as the graph of a function expressing y J as a function of x I; there are values of x that give two different values of y, and others that give none, so such a “function” is not well defined.

爱: "广义相对性原理 (wyle, 杨进一步发展, (微观)局域规范变换不变規範場)对自然界定律作了一些广泛而具明确性的限制"

super simply speaking: 广义相对性原理 in a manifold such as GR lorentz manifold, means some kind of gauge field exists, but for a local observer, all he can observe is using a locally transformed gauge and and there for his observations have to be modified to be "right", in the context of this "gauge field"

now, this manifold has find its way into machine learning, AI and everything else, as it should be.

My near term goal is to have your lab boss (or somebody else) start to appreciate my work and hire me as a contractor/consultant, as a start. 

1. manifold in a nutshell 

微分几何中,黎曼几何研究具有黎曼度量的光滑流形,即流形切空间上二次形式的选择。它特别关注于角度、弧线长度及体积。把每个微小部分加起来而得出整体的数量. 

二次型的系统研究是从18世纪开始的,它起源于对二次曲线和二次曲面的分类问题的讨论,将二次曲线和二次曲面的方程变形,选有主轴方向的轴作为坐标轴以简化方程的形状,这个问题是在18世纪引进的。
高斯独立发现了二项式定理的一般形式、数论上的“二次互反律”、素数定理、及算术-几何平均数. 
柯西在其著作中给出结论:当方程是标准型时,二次曲面用二次型的符号来进行分类.

非严谨地说, 黎曼几何, Riemann manifold, 研究"高维弯曲空间"二次度量, 如果不是"弯曲高维空间"二次度量,  那就是希尔伯特空间, etc.

I have managed to come to and earning this beautiful  characterization of Riemann manifold at concept level  after years of hard studying of physics and math, covering almost all major areas of them. 


the following is a google translation:

Differential geometry, or Riemann geometry,  Riemann metric  manifold is all about the "quadratic form". It is particularly concerned about the angle, arc length and volume in a high dimensional and curved space such as that of general relativity  Lorentz manifold. 

Studies of  "quadratic form" started at the beginning of the 18th century, 
Gauss independently discovered the general form of the binomial theorem, "Quadratic Reciprocity Law" on number theory, prime number theorem, and arithmetic - geometric mean.
, and the famous CauchySchwarz inequality, etc.  

Hilbert space studies the "quadratic form" in a "high-dimensional but curved space", however,  biology and many social systems all reside in high-dimensional and curved space, and we have to figure out how to model it. 

2. How statistics got started: modeling and measuring of linear systems experiencing disturbances and stresses 

(https://dspace.lboro.ac.uk/dspace-jspui/bitstream/2134/20203/1/Thesis-2016-Oltean.pdf)
" unsuccessful predictions of stock prices made by sir Isaac Newton and, consequently, his terrible loss in 1720 of 20000 pounds in South Sea speculation bubble.  in 1738, Daniel Bernoulli introduced the idea of utility in order to describe preferences of people and consumer satisfaction.  successful management of the fund for the widows of Goettingen professors, performed by Carl Friedrich Gauss.  Giovanni Ceva published an essay “A mathematical approach of money” in 1711.  Laplace in his work “Essai philosophique sur les probabilites” (1812) showed that what apparently might seem random and unpredictable (such as number of letters in the Paris dead-letter office) is predictable and obeys a simple law.  Adolphe Quetelet (a former student of Fourier) studied the existence of patterns in data sets ranging from the frequency of different methods for committing murder to the chest size of Scottish men. It was him who coined the term “social physics” in 1835. 2  explanation of the Brownian random walk and the formulation of the ChapmanKolmogorov condition for Markovian processes by Louis Bachelier in his PhD thesis on the theory of speculation. This was done 5 years before the work of Smoluchowski and Einstein on diffusion, based on the observations of price changes at Paris stockmarket.  Italian physicist Ettore Majorana wrote in 1936 a paper based on analogies between statistical physics laws and the ones from social sciences  "

Boltzmann was very explicit: “The molecules are like individuals, ... and the properties of gases only remain unaltered, because the number of these molecules, which on average has a given state, is constant. In “Populäre Schriften” , Boltzmann said “This opens a broad perspective, if we do not only think of mechanical objects. Let us consider the application of this method to the statistics of living beings, society, sociology and so forth.”


3. It is still all about "quadratic form": sigmoid function as the backbone of logistic regression

The probability distribution used is cumulative logistic distribution which is applied to cumulated income, expenditure, or wealth on one hand and also to cumulated probabilities on the other hand. Logistic function or sigmoid function is defined as 𝑓(𝑥) = 𝐿 1+𝑒𝑥𝑝−𝑘(𝑥−𝑥0) (2.1) where L is the curve's maximum value, x0 is the x-value of the sigmoid's midpoint, and k = the steepness of the curve[85]. Logistic map, which is the basis for logistic function, is used to show how complex, chaotic behaviour can arise from very simple non-linear dynamical equations [86]. We use logistic cumulative probability distribution C(x), which is defined as the integral C(x) = ∫ P(x)dx x −∞ (2.2)


Quadratic Polynomials Learn Better Image Features

www.iro.umontreal.ca/~lisa/publications2/.../205

Université de Montréal
Apr 3, 2009 - quadratic units was strongest in conjunction with sparse and ... Equation 1 looks sigmoidal as a function of E, but the sharpness of the  ...

It's not exponential, it's sigmoidal - O'Reilly Radar

radar.oreilly.com/.../its-not-exponential-its-sigmoi.ht...

O’Reilly Media, Inc.
Nov 26, 2007 - Exponential vs. linear or quadratic curves. ... In fact, one of the most important sigmoidal functions is the logistic function, originally developed to  .

4. algorithm of machine learning and AI: putting all there together: statistics on manifold.


"As is common in applications, we use the Karcher mean. In practice, the Karcher mean can be efficiently computed using an iterative algorithm [113]. Let µ denote the intrinsic mean. The value the (sample) Fr´echet function attains at µ, 1 N X N i=1 d(µ, pi) 2 , (2.82) is called the geodesic variance. The (sample) covariance is defined through the Euclidean (sample) covariance of the 69 data as expressed in TµM: Cov({pi} N i=1) def = 1 N − 1 X N i=1 Logµ (pi)Logµ (pi) T . (2.83) Note that the point of tangency is the intrinsic mean, µ. This echoes (and in fact, generalizes) the construction of the Euclidean (sample) covariance in a Euclidean space, which is built from summing outer-products of vectors following the subtraction of the Euclidean (sample) mean: Cov({pi} N i=1) def = 1 N − 1 X N i=1 Ñ pi − 1 N X N j=1 pj é Ñpi − 1 N X N j=1 pj éT"

I kind of completed reading of "dam.brown.edu/people/freifeld//phd/ThesisOrenFreifeld.pdf"  which I started reading last Friday, and went through most of it Saturday at oakland air port.

the paragraph I quoted really "ring the bell": basically, geometrical "distance" is the backbone of  "statistical mean", and all kind of statistical variances can be expressed as some kind of products of vectors, with inner product of vectors defining your "distance" measure in a scalar field or classical vector field, and outer product of vector becomes more challenging when in a vector field such as Maxwell's electromagnetic field . 

and wrote before, when modeling a system residing in a high dimensional and curved space, one has to advance study of scalar and vector field into Riemann manifold, and advance statistics as we know into Riemann manifold, and this is exactly starting happening now, with still very few literature available.

one of difficulties is, as I wrote before, Riemann differential (or smooth enough so we can perform calculus and statistics) manifold is not math only, it is a multi discipline of advanced mathematics and physics such as general relativity and gauge field theory, which is still mind challenging to most physicists of linear physics disciplines, not to mention all other scientists outside of physics. 

the world of science is actually very interesting, with many of them not really knowing what they are doing?

----------------------attachements--------


Mathematical descriptions of the electromagnetic field ...

https://en.wikipedia.org/.../Mathematical_descriptions_of_the_...

Wikipedia
1.1 Maxwell's equations in the vector field approach. 2 Potential ... 7.1 Potential formulation; 7.2 Manifestly covariant (tensor) approach .... In three dimensions, the derivative has a special structure allowing the introduction of a cross product:.



The Geometry of Minkowski Spacetime: An Introduction to ...

https://books.google.com/books?isbn=1441978380
Gregory L. Naber - 2012 - ‎Mathematics
As sample solutions to Maxwell's equations we consider the Coulomb field, the field of a uniformly moving charge, and a rather complete discussion of simple, planeelectromagnetic waves. ... outer product of a spin vector and its conjugate.


Outer product - Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Outer_product

Wikipedia
The outer product of vectors can be also regarded as a special case of the ....analysis for computing the covariance and auto-covariance matrices for two  ...

[PDF]Linear Algebra & Properties of the Covariance Matrix

www.maths.usyd.edu.au/u/alpapani/.../lecture6.pdf

University of Sydney
Oct 3, 2012 - covariance matrix it must be symmetric positive definite (SPD). ... Thevector x ∈ RN inner product with another vector y ∈ RN is. x y = y x  ...

Geometric intuition for why an outer product of two vectors ...

stats.stackexchange.com/.../geometric-intuition-for-why-an-outer-produc...

Sep 14, 2012 - I understand that the outer product of two vectors, say representing two detrended time series, can represent a cross-correlation (well ...

covariance of cross product of two vectors - Mathematics ...

math.stackexchange.com/.../covariance-of-cross-product...

Stack Exchange
Jun 18, 2014 - I have two independent vectors in 3D and know the covariance matrix of each. What will be the covariance of cross products of above vectors.
2.5.2 Basic Concepts An introduction to statistics on Riemannian manifolds can be found in [113]. See also [10] for a recent book on nonparametric statistics on manifolds. For the remainder of Section 2.5, we will assume that M is a D-dimensional geodesically-complete Riemannian manifold, and that we have an M-valued dataset denoted by p1, p2, . . . , pN . We start with generalizing the Euclidean notion of sum of squared distances to sum of squared geodesic distances. Definition 2.5.1 (The sample Fr´echet function). Let p be a point in M. The sum of squared distances between the data and p is called the sample Fr´echet function defined by SSGD(p) def = 1 N X N i=1 d(p, pi) 2 . (2.81) Next we generalize the Euclidean notion of the sample mean. Definition 2.5.2 (Intrinsic mean). The unique global minimizer of the function SSGD : M → R +, if it exists, is called the (sample) intrinsic mean. It is also known as the (sample) Fr´echet mean. Any local minimizer is called the Karcher mean [77]. As is common in applications, we use the Karcher mean. In practice, the Karcher mean can be efficiently computed using an iterative algorithm [113]. Let µ denote the intrinsic mean. The value the (sample) Fr´echet function attains at µ, 1 N X N i=1 d(µ, pi) 2 , (2.82) is called the geodesic variance. The (sample) covariance is defined through the Euclidean (sample) covariance of the 69 data as expressed in TµM: Cov({pi} N i=1) def = 1 N − 1 X N i=1 Logµ (pi)Logµ (pi) T . (2.83) Note that the point of tangency is the intrinsic mean, µ. This echoes (and in fact, generalizes) the construction of the Euclidean (sample) covariance in a Euclidean space, which is built from summing outer-products of vectors following the subtraction of the Euclidean (sample) mean: Cov({pi} N i=1) def = 1 N − 1 X N i=1 Ñ pi − 1 N X N j=1 pj é Ñpi − 1 N X N j=1 pj é

------------------------


I understand that the outer product of two vectors, say representing two detrended time series, can represent a cross-correlation (well covariance) matrix.
I also know that the inverse of a correlation matrix represents the partial correlations between two variables. Geometrically, I know that the partial correlation between two variables is the angle formed by the projection of their residuals when regressed against all other variables onto the surface perpendicular to all other variables.
I'm wondering how these two relate. That is, I know the interpretation of the inverse of matrix (partial correlation) but not the matrix or its construction.
shareimprove this question

closed as not a real question by whuber Sep 29 '12 at 21:44

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.If this question can be reworded to fit the rules in the help center, please edit the question.
1
Your reference to "outer product" does not accord with my understanding of this operation, which I believe is a conventional one. The rank of any outer product is at most one, which would produce a highly degenerate matrix: that's not what one expects of a covariance. Could you indicate what your "outer product" operation is? – whuber Sep 14 '12 at 22:27
  
am sure he refers to the sum of all the outer products of xi with itself over i. That is just the sum of cross-products ... – kjetil b halvorsen Sep 14 '12 at 23:09
1
By outer product, I mean xyT. I thought that for two random vectors, x and y the covariance matrix would be E(xyT)=E(x)E(y)T. – mac389 Sep 15 '12 at 2:58 
1
Maybe I don't understand. Suppose that the random vectors X and Y have mean vectors u=E[X] and v=E[Y], respectively. The matrix A=uvT can't be, in general, a covariance matrix, because, suppose that X and Y are such that the inner product u,v<0. Then, we have u,Au=uTAu=uTuvTu=||u||2v,u<0, and A is not positive definite. – Zen Sep 16 '12 at 2:23







--------------


Geometric Solutions of Quadratic and Cubic Equations

by
David W. Henderson1
Department of Mathematics, Cornell University
Ithaca, NY, 14853-7901, USA1

I am ready to lead you, the reader, on a path through part of the forest of mathematics - a path that has delighted me many times - and surprised me. Every time I walk along it I see something I had not seen before. We will bring with us the question: What are square roots? We will find what is one of the oldest written mathematical proofs, still very much alive, right along side some new results never before published.
These will be combined to solve quadratic equations by "completing the square" - a real square. These in turn lead to conic sections and cube roots and culminating in the beautiful general method from Omar al'Khayyam, the Persian geometer, philosopher, poet, which can be used to find all the real roots of cubic equations. Along the way we shall clearly see some of the ancestral forms of our modern Cartesian coordinates and analytic geometry. I will point our several inaccuracies and misconceptions that have crept in to the modern historical accounts of these matters. But I urge you to not look at this only for its historical interest but rather look for the meaning it has in our current-day understanding of mathematics. This path is not through a dead museum or petrified forest, this path passes through ideas which are very much alive and which have something to say to our modern technological, increasingly numerical, world.

1. The Beginning of the Path
For me the path started in eighth grade when I asked my teacher - "What is the square root?" I knew that the square root of  was a number whose square was equal to N but where can I find it? (Hidden in that question is "How do I know it always exists?") I knew what the square roots of 4 and 9 were - no problem there.
I even knew that Ö2 was the length of the diagonal of a unit square, but what of Ö2.5 or Öp  ? At first the teacher showed me a Square Root Table (a table of numerical square roots), but I soon discovered that if I took the number listed in the table as Ö2 and squared it I got 1.999396 not 2. (Modern-day pocket calculators give rise to the same problem.) So I persisted asking my question - What is the square root? Then the teacher answered by giving me THE ANSWER - the Square Root Algorithm. Do you remember the Square Root Algorithm - that procedure, similar to long division, by which it is possible to calculate the square root? Or perhaps more recently you were taught the "Divide and Average" Method which goes like this: If A1 is an approximation of ÖNthen the average of A1 and N/A1 is an even better approximation which we could call A2 . And then the next approximation A3 is the average of A2 and N/A2 . In equation form this becomesAn+1 = (1/2)(An+(N/An)) . For example, if A1 = 1.5 is an approximation of Ö2 , then A2 = 1.417··· , A3 = 1.414216··· and so forth are better and better approximation. But wait! Most of the time these algorithms do not calculate the square root - they only calculate approximations to the square root. The algorithms have an advantage over the tables because I could, at least in theory, calculate approximations as close as I wished. However they are still only approximations and my question still remained - What is this square root which these algorithms approximate?
My eighth grade teacher then gave up, but later in college I found out that modern mathematics answers: "We make an assumption (The Completeness Axiom) which implies that the sequence of approximations from the Square Root Algorithm must converge to some real number." And, when I continued to ask my question, I found that in modern mathematics the square root is a certain equivalence class of Cauchy sequences of rational numbers or a certain Dedekind cut. I then let go of my question and forgot it in the turmoil of graduate school, writing my thesis and beginning my mathematical career.
Later, I started teaching a geometry course for mathematics majors and one of the topics was Dissection Theory which leads (among other things) to the result that every polygonal region in the plane can be cut up (dissected) into a finite number of pieces which can then be rearranged to form a square. In this case we say that the polygonal region is equivalent by dissection to a square. A preliminary step to the general result is the:

Theorem 1. Every rectangle is equivalent by dissection to a square.
I presented to the class the following proof which I found slightly modified in a standard geometry text book, Eves (1963):
"Let s = Öab be the side of the square equivalent to the rectangle with sides a and b. Place the square, AEFH, on the rectangle, ABCD, as shown in [the figure].Draw ED to cut BC in R and HF in K. Let BC cut HF in G. From the similar triangles KDH and EDA we have HK/AE = HD/AD, or
HK = (AE)(HD)/AD = s(a - s)/a = s - s/a = s - b.
Therefore, ... we have EFK @ D RCDD EBR @ D KHD."
(In case that ABCD is so long and skinny that K ends up between and F we can, by cutting ABCD in half and stacking the halves, reduce the proof to the above case.)
I was satisfied with the proof until in the second year of the course when I started sensing student uneasiness with the proof. As I listened to their uneasiness there started to come up the question - What is Öab ? How do you find it ? -- Oh, yes, I remember -- that used to be my question!
The students and I also noticed that the facts used about similar triangles in the above proof are usually proved using the theory of areas of triangles and thus that this proof could not be used as part of a concrete theory of areas of polygons, which was our purpose in studying Dissection Theory in the first place.
That started me off on an exploration which continued on and off over many years. Some of what I found I will now show you (but in a different order from the order I first saw then).

2. What is a Square Root?
While reading an article about something else I ran across an item that said that the problem of changing a rectangle into a square appeared in the Sulbasutram by Baudhayana (see Prakash (1968)). "Sulbasutram" means "Rules of the cord" and is an ancient (at least 600 BC) book written in Sanskrit as a handbook for people who were building altars and temples. Most of the book gives detailed instructions on temple construction and design, but the first chapter is a geometry textbook which contains geometric statements called "Sutra". Sutra 54 is: (Here "oblong" means "rectangle".)
"If you wish to turn an oblong into a square, take the tiryanmani, i.e. the shorter side of the oblong for the side of square. Divide the remainder (that part of the oblong which remains after the square has been cut off) into two parts and inverting (their places) join those two parts to two sides of the square. (We get thus a large square out of one corner of which a small square is cut out as it were.) Fill the empty place (in the corner) by adding a piece (a small square). It has been taught how to deduct it (the added piece).
"By adding the small square in the corner we get a large square which is equal to the oblong plus the small square, therefore we must deduct the small square from the large square (see Sutra 51) and then we have as remainder a square which is equal to the oblong."
Here is a diagram for Sutra 54: 
So our rectangle has been changed into a large square from which a small square has been removed (or deducted). Now Sutra 51:
"If you wish to deduct one square from another Square, cut off a piece from the larger square by making a mark on the ground with the side of the smaller square which you wish to deduct (the process is the same as that described in Sutra 50; an oblong is cut off, the sides of which are equal to the sides of the two given squares); draw one of the sides (THE CORD REPRESENTING one OF THE longer SIDES of the oblong) across the oblong so that it touches the other side; where it touches (the other side), by this line which has been cut off the small square is deducted from the large one (i.e. the cutoff line is the side of a square the area of which is equal to the difference of the two squares.)"
This last assertion follows from sutra 50:
"If you wish to combine two squares of different size into one, scratch up with the side of the smaller square a piece cutoff from the larger one (i.e. cut off a piece from the larger square by scratching up the ground - or making a mark upon the ground - at a distance from one end of a side of the large square, which is equal to the length of the side of the smaller square; and by repeating this process on the opposite side of the larger square and joining the two marks on the ground by a line or cord, an oblong is cut off, of which the two longer sides are equal to the side of the large square). The diagonal of this cutoff piece is the side of the combined squares (of the square which combines the two squares)."
Does sutra 50 sound familiar? It should - it is a clear statement of what we call the Pythagorean Theorem, written before Pythagoras was born!

A. Seidenberg (1961) in an article entitled The Ritual Origin of Geometry gives a detailed discussion of the significance of the Sulbasutram. He argues that it was written before 600 BC (Pythagoras lived about 500 BC and Euclid about 300 BC) He gives evidence to support his claim that it contains codification of knowledge going "far back of 1700 BC" and that this knowledge was the common source of Indian, Egyptian, Babylonian and Greek mathematics. Combined together sutras 50, 51 and 54 describe a construction of a square with the same area as a given rectangle (oblong) and a proof (based on the Pythagorean Theorem) that this construction is correct. You can find stated in many books and articles that the ancient Hindus, in general, and the Sulbasutram in particular, did not have proofs or demonstrations or they are dismissed as being "rare". I suggest you decide for yourself.
Baudhayana avoids the Completeness Axiom by giving an explicit construction of the side of the square. The construction can be summarized in the diagram:
This is the same as Euclid's construction in Proposition II - 14 (see Heath (1956), page 409). But Euclid's proof is much more complicated.
Note that neither Baudhayana nor Euclid give a proof of Theorem 1 because the use of the Pythagorean Theorem obscures the dissection. However, they do give a concrete construction and a proof that the construction works. In addition, if supplemented with a dissection proof of the Pythagorean Theorem such as in Theorem 3, below, both Baudhayana's and Euclid methods prove (without using completeness):

Theorem 2: For every rectangle R there are squares S1 and S2 such that R + S2 is equivalent by dissection to S1 + S2 and thus R and S1 have the same area.

Theorem 3: (Dissection version of Pythagorean Theorem). In any right triangle, the union of the squares on the two sides is equivalent by dissection to the square on the hypotenuse.

Proof of Theorem 3 known to the ancient Chinese:
About a dozen different (and correct) proofs of Theorems 1 have been found by the students in my geometry course. One particularly clear one follows: (As far as I know this proof has never before been published.)
Let ABGH be the rectangle and extend the line AB to C so that BC @ BH. Draw the semicircle S on AC and let D be the intersection of S with the extension of BH.
Then D ADC is a right angle and the angles are congruent as indicated in the diagram. Construct the square DBEF. Then D DBC @ D IGA and D DBC @ D  DFJ both by Angle-Side-Angle. EasilyD AE@ D IHD. Thus the rectangle ABHG is equivalent by dissection to the square DBEF.

Notice that this proof avoids assuming that the square root exists (and thus avoids the Completeness Axiom) and avoids using any facts about similar triangles. The proof explicitly constructs the square and shows in an elementary way that its area is the same as the area of the rectangle. There is no need for the area or the sides of the rectangle to be expressed in numbers. Also given a real number, b, the square root of b can be constructed by using a rectangle with sides b and 1.
So, finally, I have an answer to my question - What is a square root? I say "an answer" because every year I see more or see it from a different point of view.

3. Quadratic Equations
Finding square roots is the simplest case of solving quadratic equations. If you look in some history of mathematics books, you will find that quadratic equations were extensively solved by the Babylonians (numerically) and by the Greeks (geometrically). However, the earliest known general discussion of quadratic equations took place between 800 and 1100 AD in the Muslim Empire. Best known are Mohammed Ibn Musa al'Khowarizmi (who lived in Baghdad) and Omar al'Khayyam (who lived in Persia, now Iran, and is mostly known in the West for his poetry The Rubaiyat. Both wrote books entitled Al-jabr W'al mugabalah, al'Khowarizmi about 820 AD and al'Khayyam about 1100 AD. From al'Khowarizmi we get our word "algorithm" and from the title of their books our word "algebra". An English translation of both books is available in many libraries, if you can figure out whose name it is catalogued under (see References, Karpinski (1915) and Kasir (1931)).
In these books you find geometric and numerical solutions to quadratic equations and geometric proofs of these solutions. But the first thing that you notice is that there is not one general quadratic equation as we are used to it: 2ax + bx + c = 0. Rather, because the use of negative coefficients and negative roots was avoided, they list six types of quadratic equations (we follow al'Khayyam's lead and set the leading coefficient equal to 1):
  1. x = c, which needs no solution,
  2. x = bx, which is easily solved,
  3. x2 = c, which has root x = Öc ,
  4. x2 + bx = c, with root x = Ö[(b/2)2 + c] - b/2,
  5. x2 + c = bx, with roots x = b/2 ± Ö[(b/2)2 - c] , if c < (b/2)2 , and
  6. x2 = bx + c, with root x = b/2 + Ö[(b/2) + c] .
Here b and c are always positive numbers or a geometric length (b) and area (c). These types are the only possibilities with positive coefficients and positive roots. ( x2 + bx + c = 0 has no positive roots.)

But why did mathematicians avoid negative numbers? The avoidance of negative numbers was widespread until a few hundred years ago. In the Sixteenth Century, European mathematicians called the negative numbers that appeared as roots of equations, "numeri fictici" - fictitious numbers (see Witmer (1968), page 11).
To get a feeling for why, think about the meaning of 2 x 3 as two 3's and 3 x 2 as three 2's and then try to find a meaning for 3 x (-2) and -2 x (+3). Another answer is found in the reliance on geometric justifications, as al'Khayyam wrote (see Amir-Moez (1963), page 329):
"Whoever thinks algebra is a trick in obtaining unknowns has thought it in vain. No attention should be paid to the fact that algebra and geometry are different in appearance. Algebras (jabbre and maqabeleh) are geometric facts which are proved by propositions five and six of Book two of [Euclid's] Elements".
Some historians have quoted this passage but have left out all the words appearing after "proved". In my opinion, this omission changes the meaning of the passage. Euclid's propositions that are mentioned by al'Khayyam are the basic ingredients of Euclid's proof of the square root construction and form a basis for the construction of conic sections - see below. Geometric justification when there are negative coefficients is at least very cumbersome if not impossible. (If you doubt this try to modify some of the geometric justifications below.) In any case, Euclid, upon which these mathematicians relied, did not allow negative quantities.

For the geometric justification of (III) and the finding of square roots, al'Khayyam refers to Euclid's construction of the square root in Proposition II 14.
For (IV) we have as geometric justification: 
and thus, by "completing the square" on x + b/2, we have (x + b/2)2 = c + (b/2)2 . Note the similarity between this and Baudhayana's construction of the square root (see Section 2).
For (V), first assume x < (b/2) and draw the equation as:
and note that the square on b/2 is (b/2 - x)2 + c.
This leads to x = b/2 - Ö[(b/2)2 - c]. Note that if c > (b/2)2 then this geometric solution is impossible. When x > (b/2), use the drawings:
For the solution of (VI) use the drawing:
Do the above solutions find the negative roots? Well, first, the answer is clearly, No, if you mean: Did al'Khowarizmi and al'Khayyam (or the earlier Greeks and Babylonians) mention negative roots? But let us not be too hasty, suppose -r (r, positive) is the negative root of x2 + bx = c. Then (-r)2 + b(-r) = c or r2 = br + c. Thus r is a positive root of x2 = bx + c ! The absolute value of the negative root of x2 + bx = c is the positive root of x2 = bx + c and vice versa. Also, the absolute values of the negative roots of x2 + bx + c = 0 are the positive roots of  x2 + c = bx. So, in this sense, Yes, the above geometric solutions do find all the real roots of all quadratic equations. Thus it is misleading to state, as most historical accounts do, that the geometric methods failed the find negative roots. The users of these methods did not find negative roots because they did not conceive of them. However, the methods can be easily and directly used to find all the negative roots.

4: Conic Sections and Cube Roots
The Greeks noticed that, if a/c = c/d = d/b, then (a/c)2 = (c/d)(d/b) = (c/b) and thus c3 = a2b. Now setting a = 1, we see that we can find the cube root of b, if we can find c and such that c2 = dand d2 = bc. If we think of c and d as being variables and b a constant, then we see these equations as the equations of two parabolas with perpendicular axes and the same vertex. The Greeks also saw it this way but first they had to develop the concept of a parabola!
To the Greeks, and later al'Khayyam, if AB is a line segment, then the parabola with vertex B  and parameter AB is the curve P such that, if C is on P, then the rectangle BDCE (see the drawing) has the property that (BE)2 = DB · AB . Since in Cartesian coordinates the coordinates of are (BE,BD) this last equation becomes a familiar equation for a parabola.

Points of the parabola may be constructed by using the construction for the square root given in Section 2. In particular, E is the intersection of the semicircle on AD with the line perpendicular toAB at B. (The construction can also be done by finding D' such that AB = DD', then the semicircle on BD' intersects P at C.) I encourage you to try this construction yourself; it is very easy to do if you use a compass and graph paper.

Now we can find the cube root. Let b be a positive number or length and let AB = and construct C so that CB is perpendicular to AB and such that CB = 1.

Construct a parabola with vertex B and parameter AB and construct another parabola with vertex B and parameter CB. Let E be the intersection of the two parabolas. Draw the rectangle  BGEF. Then (EF)2 = BF·AB and (GE)2 = GB·CB. But, setting c = GE = BF and d = GB = EF, we have d2 = cb and c2 = d. Thus c3 = b. If you use a fine graph paper it is easy to get three digit accuracy in this construction.
The Greeks did a thorough study of conic sections and their properties which culminated in Appolonius's book Conics which appeared in 200 BC. You can read this book in English translation, see Heath (1961).

To find roots of cubic equations in the next section we shall also need to know the (rectangular) hyperbola with vertex B and parameter AB. This is the curve H, such that if is on H and ACEDis the determined rectangle (see drawing), then (EC)2 = BC·AC.

The point E can be constructed using Section 2. Let F be the bisector of AB. Then the circle with center F and radius FC will intersect at D the line perpendicular to AB at A. From the drawing it is clear how these circles also construct the other branch of the hyperbola (with vertex A.)

Notice how these descriptions and constructions of the parabola and hyperbola look very much like they were done in Cartesian coordinates. The ancestral forms of Cartesian coordinates and analytic geometry are evident here. Also they are evident in the solutions of cubic equations in the next section. The ideas of Cartesian coordinates did not come to Descartes out of nowhere. The underlying concepts were developing in Greek and Muslim mathematics. One of the apparent reasons that full development did not occur until Descartes is that, as we have seen, negative numbers were not accepted. The full use of negative numbers is essential for the realization of Cartesian coordinates.

5: Roots of Cubic Equations
In his Al-Jabr wa'l muqabalah Omar al'Khayyam also gave geometric solution to cubic equations. We shall see that his methods are sufficient to find geometrically all real (positive or negative) roots of cubic equations; however; in his first chapter al'Khayyam says: (see Kasir (1931), page 49.)
"When, however, the object of the problem is an absolute number, neither we, nor any of those who are concerned with algebra, have been able to prove this equation - perhaps others who follow us will be able to fill the gap - except when it contains only the three first degrees, namely, the number, the thing and the square."
By "absolute number", al'Khayyam is referring to, what we call, algebraic solution as opposed to geometric one. This quotation suggests, contrary to what many historical accounts say, that al'Khayyam expected that algebraic solutions would be found.
Al'Khayyam found 19 types of cubic equations (when expressed with only positive coefficients). (See Kasir (1931), page 51). Of these 19, five reduce to quadratic equations (e.g., x+ ax = bxreduces to x2 + ax = b). The remaining 14 types al'Khayyam solves by using conic sections. His methods find all the positive roots of each type although he failed to mention some of the roots in a few cases; and, of course, he ignores the negative roots. Instead of going through his 14 types, I will show how a simple reduction will reduce all the types to only 3 types in addition to types already solved such as, x3 = b. I will then give al'Khayyam's solutions to these types.
In the cubic y3 + py2 + gy + r = 0 (where, here, p, g, r, are positive, negative, or zero) set y = x - (p/3). Try it! The resulting equation in x will have the form x3 + sx + t = 0, (where, here, and tare positive, negative or zero). If we rearrange this equation so all the coefficients are positive then we get four types that have not been previously solved:
(I) x3 + ax = b, (II) x3 + b = ax, (III) x3 = ax + b, and (IV) x3 + ax + b = 0,
where a and b are positive, in addition, to types previously solved. Now (IV) has no positive roots and the absolute value of its negative roots are the (positive) roots of (I). Also, the absolute value of the negative roots of (II) are the roots of (III) and vice - versa. Thus, we need only find the positive roots of types (I), (II), and (III).

Al'Khayyam's solution for type (I): x3 + ax = b.
"A cube and sides are equal to a number. Let the line AB [see figure] be the side of a square equal to the given number of roots, [that is, (AB)2=a, the coefficient.] Construct a solid whose base is equal to the square on AB, equal in volume to the given number, [ ]. The construction has been shown previously. Let BC be the height of the solid. [I.e. BC·(AB)2 = b.] Let BC be perpendicular to AB ... Construct a parabola whose vertex is the point B ... and parameter AB. Then the position of the conic HBD will be tangent to BC. Describe on BC a semicircle. It necessarily intersects the conic. Let the point of intersection be D; drop from D, whose position is known, two perpendiculars DZ and DE on BZ and BC. Both the position and magnitude of these lines are known."

The root is EB. Al'Khayyam's proof (using a more compact notation) is: From the properties of the parabola (Section 4) and circle (Section 2) we have
(DZ)2 = (EB)2 = BZ·AB and (ED)2 = (BZ)2 = EC·EB ,
thus
EB·(BZ)2 = (EB)2·EC = BZ·AB·EC
and therefore
AB·EC = EB·BZ and (EB)3 = EB·(BZ·AB) = (AB·ECAB = (AB)2·EC;
So
(EB)3 + a(EB) = (AB)2·EC + (AB)2·(EB) = (AB)2·CB = b.
Thus EB is a root of x3 + ax = b. Since x2 + ax increases as x increases, there can be only this one root.

Al'Khayyam's solutions for types (II) and (III): x3 + b = ax and x3 = ax + b.
Al'Khayyam treated these equations separately but by allowing negative horizontal lengths we can combine his two solutions into one solution of x3 ± b = ax. Let AB be perpendicular to BC and as before let (AB)2 = a and (AB)2·BC = b. Place BC to the left if the sign in front of b is negative (type (III)) and place BC to the right is the sign in front of b is positive (type (II)). Construct a parabola with vertex and parameter AB. Construct both branches of the hyperbola with vertices B and C and parameter BC.

Each intersection of the hyperbola and the parabola (except for B ) gives a root of the cubic. Suppose they meet at D. Then drop perpendiculars DE  and DZ. The root is BE (negative if to the left and positive if to the right). Again, if you use fine graph paper it is easy to get three digit accuracy here. I leave it for you, the reader, to provide the proof which is very similar to type (I).

A little more history: Most historical accounts assert correctly that al'Khayyam did not find the negative roots of cubics. However, they are misleading in that they all fail to mention that his methods are fully sufficient to find the negative roots as we have seen above. This is in contrast to the common assertion (see, for example, Davis & Hersch (1981)) that Girolamo Cardano (16th century Italian) was the first to publish the general solution of cubic equations when in fact, as we shall see, he himself admitted that his methods are insufficient to find the real roots of many cubics.

Cardano published his algebraic solutions in his book, Artis Magnae (The Great Art) which was published in 1545. For a readable English translation and historical summary, see Witmer (1968). Cardano used only positive coefficients and thus divided the cubic equations into the same 13 types (excluding x3 = c and equations reducible to quadratics) used earlier by al'Khayyam. Cardano also used geometry to prove his solutions for each type. As we did above we can make a substitution to reduce these to the same types as above:
(I) x3 + ax = b, (II) x3 + b = ax, (III) x3 = ax + b, and (IV) x3 + ax + b = 0.
If we allow ourselves the convenience of using negative numbers and lengths then we can reduce these to one type: x3 + ax + b = 0, where now we allow a and b to be either negative or positive.
The main "trick" that Cardano used was to assume that there is a solution of x3 + ax + b = 0 of the form x = t1/3 + u1/3 . Plugging this into the cubic we get
(t1/3 + u1/3)3 + a(t1/3 + u1/3) + b = 0.
If you expand and simplify this you get to
t + u + b + (3t1/3u1/3 + a)(t1/3 + u1/3) = 0.
Thus x = t1/3 + u1/3 is a root if
t + u = - and t u = -(a/3)3.
Solving, we find that t and u are the roots of the quadratic equation z2 + bz - (a/3)3 = 0 which Cardano solved geometrically (and you can use the quadratic formula) to get
t = -b/2 + Ö[(b/2)2 + (a/3)3] and u = -b/2 - Ö[(b/2)2 + (a/3)3] .
Thus the cubic has roots
x = t1/3 + u1/3 = {-b/2 + Ö[(b/2)2 + (a/3)3] }1/3 + {-b/2 - Ö[(b/2)2 + (a/3)3] }1/3.
This is Cardano's cubic formula. But, a strange thing happened, Cardano noticed that the cubic x= 15x + 4 has a positive real root 4 but, for this equation, a = -15 and b = -4, and if we put these values into his cubic formula we get that the roots of x= 15x + 4 are
x = { 2 + Ö-121 }1/3 + { 2 Ö-121 }1/3 .
In Cardano's time there was no theory of complex numbers and so he reasonably concluded that his method would not work for this equation; Cardano writes (Witmer (1968), page 103):
"When the cube of one-third the coefficient of x is greater than the square of one-half the constant of the equation ... then the solution of this can be found by the aliza problem which is discussed in the book of geometrical problems."
It is not clear what book he is referring to but the "aliza problem" presumably refers to al'Hazen, an Arab, who lived around 1000 AD and whose works were known in Europe in Cardano's time. Al'Hazen had used intersecting conics to solve specific cubic equations and the problem of describing the image seen in a spherical mirror - this later problem is in some books called "Alhazen's problem".
In addition, we know today that each complex number has three cube roots and so the formula x = { 2 + Ö-121 }1/3 + { 2 Ö-121 }1/3 is ambiguous. In fact, some choices for the two cube roots give roots of the cubic and some do not. (Experiment with x3 = 15x + 4.) Faced with Cardano's Formula and equations like x3 = 15x + 4, Cardano and other mathematicians of the time started exploring the possible meanings of these complex numbers and thus started the theory of complex numbers. This leads to another interesting path which we may take another day.

6: So What Does This All Point To?
It points to different things for each of us. I conclude that it is worthwhile paying attention to the meaning in mathematics. Often in our haste to get to the modern, powerful, analytic tools we ignore and trod upon the meanings and images that are there. Sometimes it is hard even to get a glimpse that some meaning is missing. One way to get this glimpse and find meaning is to listen to and follow questions of "What does it mean?" that come up in oneself and in one's students. We must listen creatively because we and our students often do not know how to express precisely what is bothering us.
Another way to find meaning is to read the mathematics of old and keep asking "Why did they do that?" or "Why didn't they do this?" Why did the early algebraists (up until at least 1600 and much later I think) insist on geometric proofs? I have suggested some reasons above. Today, we normally pass over geometric proofs in favor of analytic ones based on the 150 year old notion of Cauchy sequences and the Axiom of Completeness. However, for most students and, I think, most mathematicians, our intuitive understanding of the real numbers is based on the geometric real line. As an example, think about multiplication: What does a x b mean? Compare the geometric images of a x b with the multiplication of two infinite, nonrepeating, decimal fractions. What isÖx p?
There is another reason for why a geometric solution may be more meaningful: Sometimes we want a geometric result instead of a numerical one. As an example, I shall describe an experience that I had while a friend and I were building a small house using wood. The roof of the house consists of 12 isosceles triangles which together form a 12-sided cone (or pyramid). It was necessary for us to determine the angle between two adjacent triangles in the roof so that we could appropriately cut the log rafters. I immediately started to calculate the angle using (numerical) trigonometry and algebra. But then I ran into a problem. For finding square roots and values of trigonometric functions I had only a slide rule with three-place accuracy. At one point in the calculation I had to subtract two numbers that differed only in the third place (e.g. 5.68 - 5.65) thus my result had little accuracy. As I started to figure out a different computational procedure that would avoid the subtraction, I suddenly realized - I didn't want a number, I wanted a physical angle. In fact, a numerical angle would be essentially useless - imagine taking two rough boards and putting them at a given numerical angle apart using only an ordinary protractor! What I needed was the physical angle, full size. So I constructed the angle on the floor of the house using a rope as a compass. Note the relationship between this and Baudhayana's descriptions of using cords. This geometric solution had the following advantages over a numerical solution:
  • The geometric solution resulted in the desired physical angle, while the numerical solution resulted in a number.
  • The geometric solution was quicker than the numerical solution.
  • The geometric solution was immediately understood and trusted by my friend (and follow builder), who had almost no mathematical training, while the numerical solution was beyond my friend's understanding because it involved trigonometry (such as the "Law of Cosines").
  • And, since the construction was done full-size, the solution automatically had the degree of accuracy appropriate for the application.

I close with the words written in 1934 by the "father of Formalism", David Hilbert, from the Preface to Geometry and the Imagination (see Hilbert, Cohn-Vossen (1952), page iii):
"In mathematics, as in any scientific research, we find two tendencies present. On the one hand, the tendency toward abstraction seeks to crystallize the logical relations inherent in the maze of material that is being studied, and to correlate the material in a systematic and orderly manner. On the other hand, the tendency toward intuitive understanding fosters a more immediate grasp of the objects one studies, a live rapport with them, so to speak, which stresses the concrete meaning of their relations.
"As to geometry, in particular, the abstract tendency has here led to the magnificent systematic theories of Algebraic Geometry, of Riemannian Geometry, and of Topology; these theories make extensive use of abstract reasoning and symbolic calculation in the sense of algebra. Notwithstanding this, it is still as true today as it ever was that intuitive understanding plays a major role in geometry. And such concrete intuition is of great value not only for the research worker, but also for anyone who wishes to study and appreciate the results of research in geometry.
"In this book, it is our purpose to give a presentation of geometry, as it stands today, in its visual, intuitive aspects. With the aid of visual imagination we can illuminate the manifold facts and problems of geometry, ...
"In this manner, geometry being as many-faceted as it is and being related to the most diverse branches of mathematics, we may even obtain a summarizing survey of mathematics as a whole, and a valid idea of the variety of its problems and the wealth of ideas it contains."
Hilbert is emphasizing the point which I am trying to make in this paper: Meaning is important in mathematics and geometry is an important source of that meaning.


References:
Amir-Moez, A.R. (1963). A Paper of Omar Khayyam, Scripta Mathematica26, 323-337.
Davis, P.J. & Hersh, R. (1981). The Mathematical Experience. Boston: Birkhäuser.
Eves, H. (1963). A Survey of Geometry, Vol. 1. Boston: Allyn and Bacon.
Heath, T.L. (1956). The Thirteen Books of Euclid's Elements. New York: Dover.
Heath, T.L. (1961). Appolonios of Perga, Treatise on Conic Sections. New York: Dover.
Hilbert, David, & Cohn-Vossen (1952). Geometry and the Imagination. New York: Chelsea.
Karpinski, L.C., editor (1915). Robert of Chester's Latin Translation of the Algebra of al-Khowarizmi. New York: Macmillan. (This is an English translation.)
Kasir, D.S., editor (1931). The Algebra of Omar Khayyam. New York: Columbia Teachers College.
Prakash (1968). Baudhayana-Sulbasutram. Bombay.
Seidenberg, A. (1961). The Ritual Origin of Geometry, Archive for the History of the Exact Sciences1, 488-527.
Valens, E.G. (1976). The Number of Things: Pythagoras, Geometry and Humming Strings. New York: Dutton.
Witmer, T.R., editor (1968). The Great Art or the Rules of Algebra by Girolano Cardano. Cambridge: The MIT Press.


1 1 This paper was written while I was a visiting member of the faculty at Birzeit University, a Palestinian university in the Israeli-occupied West Bank. I appreciate the hospitality and support given me by the students, faculty and staff during my visit.

No comments:

Post a Comment