Wednesday, April 27, 2016

non-linear systems :Why use gradient descent in machine learning instead of least-squares fitting algorithms on matrices?

submitted by AldousHuxleysDog

I recently wrote a python script that uses cubic, polynomial and linear regression on a set of data to return a best fit line. A professor of mine told me that I should be using a Gradient descent algorithm instead of purely matrix operations but offered no explanation. Why does he suggest this?
Edit: To be more specific if I'm given an MxN matrix, to find a least squares fit to the data, I'm just using the formula: (X'X)-1 * X'y = b to find the matrix b that gives me my line's coefficients.
Perhaps because this method requires functions for taking the transpose, inverse and determinant of the matrix as well, it's too taxing on the system running the program?
all 8 comments

[–][deleted]  (11 children)

[deleted]

    [–][deleted]  (10 children)

    [deleted]

      [–][deleted]  (5 children)

      [deleted]

        [–][deleted]  (2 children)

        [deleted]

          [–]remington_steele 2 points3 points  (0 children)

          I don't believe he said gradient descent is the best. In fact the only way it will beat simple linear regression is if you actually try to invert the cov. matrix and it hits a memory limit doing that. A very good implementation of linear regression is matlab's mldivide function and it uses some special algorithms, based on special conditions the matrix may satisfy, otherwise defaults to Gaussian elimination with partial pivoting:
          http://scicomp.stackexchange.com/questions/1001/how-does-the-matlab-backslash-operator-solve-ax-b-for-square-matrices
          Since you're using Python, I'll bet the numpy.linalg.lstsq implementation is also very good. Try those and see how they compare to grad. descent - they should blow it out of the water.

          [–]pete-the-tea-drinker 1 point2 points  (1 child)

          I think there's approximations which are stable, and efficient.
          The reason the professor said "try least squares instead" probably isn't because it's the best method for a linear system, but because it's good for some non-linear systems too.

          [–]mighty-byte 1 point2 points  (0 children)

          As an addendum to the other comments, it's often happens that the data you're facing yield a singular X'X. So there could be no analytical solution, but a "gradient descent" type solution would still find a "good" solution.

          [–]AldousHuxleysDog[S] 0 points1 point  (2 children)

          Also for the time being, the data is a just a matrix of randomly generated numbers, for testing purposes. This is a sample project I'm doing. The overall goal would be to build a machine that can predict the prices of aviation equipment.

          [–][deleted]  (1 child)

          [deleted]

            [–]AldousHuxleysDog[S] 0 points1 point  (0 children)

            I will keep that in mind. Thanks for the tip!

            [–]carmichael561 2 points3 points  (0 children)

            You shouldn't be literally finding the inverse. Solving the linear system with the particular data set is more efficient.

            [–]mafikpl 2 points3 points  (0 children)

            It's because in AI the best fit is always an overfit. Also you can add regularization to gradient descent more easily.

            [–]JustFinishedBSG 2 points3 points  (0 children)

            Because of the size of the matrix...
            Inverting a matrix takes AT BEST O(n2.373) while an algorithm like Conjugate Gradient takes at most n operations.
            It's a case of solving ALL equations vs solving ONE equation. Of course one is more expensive :D


            Del. ,δ, : Correct enunciation

            I've come across various different symbols being pronounced as "del". What is the internationally accepted del? If not internationally, then what's the English/American(specify which one if they are different) one that most lecturers/&c use?
            • : I have heard x being called "del by del x", and (rarely) "dou by dou x" and "der by der x". can be used without a fraction (einstein notation), in which case it gets confusing.
            • : Called Nabla or del. This has four different uses, which can be easily distinguished while reading out loud, but it gets confusing when the first and last uses (grad and covariant derivative) get mixed up with and δ
              • Gradient/grad: ϕ (phi is a scalar). Read as "nabla phi", or "del phi".
              • Divergence/div: v Pretty clear, can be read as "nabla dot" or "del dot"
              • Curl/rot: ×v Also clear, can be read as "nabla cross" or "del cross"
              • Covariant derivative: uv : Can be read as "del v" or "nabla v" . I've seen it called "del u v" also.
            • δ : Read out as "delta", but I've heard it used as "del" as well.
            This entire thing has confused me. My questions are:
            1. Which one can be correctly called "del"? I'm fine with div/curl being read out as del, as the dot/cross can be read out as well. The confusion is between the convariant derivative, grad, partial derivative, and lowercase delta. Or is it just a matter of context?
            2. Where did this confusing terminology come from in the first place? Why name something del when we already have a bunch of other dels? A timeline of the dels would be appreciated, but not necessary :-).
            share|cite|improve this question
            3  
            The system I'm used to reads x as "partial with respect to x (of...)"; I personally do not like shortening "delta" to "del" as it could be a source of confusion, but that might just be me... – J. M. Feb 18 '12 at 6:28
                
            I agree with J.M., I am used to saying "Partial wrt x" or Div/Grad/Curl and delta. As far as I've seen, people usually associate del to to both and – Inquest Feb 18 '12 at 6:37
            2  
            What's this “internationally” meaning? As you may be aware, people around the world do not all speak of mathematics in English... If you're interested in the English pronunciation of these symbols, please say so and do not use the word “international.” – PseudoNeo Feb 24 '12 at 16:12
            1  
            I'm with J.M. and Nunoxic on this: I've never heard /x called anything like "del" (and I'm in the US). I've heard the operator /x pronounced as "partial x " and 2/x2 as "partial x squared", which are shorter than "partial with respect to x ...". – KCd Feb 24 '12 at 17:07
            3  
            My two cents: 1. you shouldn't be reading out long PDEs in a talk. If you are, your audience is not following you. 2. if you are speaking to an international group, you should avoid verbal shorthand, just as you should avoid slang. To add to your list: it is common to write e.g. Δu=uxx+uyy and read it as "u x x plus u y y". I would say that most talks that discuss PDE don't read out the actual equation, but write out the equation and then explain the relevant terms (e.g. "we consider the metric Laplacian with a lower order quadratic term with such-and-so growth rate") – Sam Lisi Feb 26 '12 at 13:09
            For what it's worth, in the community I hang out with, we generally just say "partial ecks" for x , and when we are feeling even lazier and when the context is clear, we call the same operator "dee-dee ecks", as if it were the ordinary ddx .
            , however, is always "nabla", unless it is used for the gradient of a function, in which case we say "gradient of eff" for f .

            In class, however, if the expression is embedded in prose (say as part of a theorem statement), I would never read the symbol. I would instead say what it means. So while I may write
            Important, we always have xyf=yxf
            I would say,
            Important, we always have that partial derivatives commute.
            Or if I write
            Therefore xf=0
            I would say
            Therefore the partial derivative of eff with respect to ecks is zero.
            Or if I write
            By the Maxwell's equations, E=0
            I would say
            By the Maxwell's equation, ee is divergence free.
            The only time I might read the symbols as symbols is if I am performing a computation on the board and am just copying stuff directly from my notes. In those cases I honestly cannot remember what I would usually say.
            share|cite|improve this answer
                
            Makes sense, thougb you've killed del ;-) . The reason I was asking was that while LaTeXifying someone elses post, I wrot del as . The context was ambiguous (could have been either), and I wasn't familiar with the topic. Later I came back to the question and saw that it had been replaced with nabla. So I got confused. Anyways, you're answer helps a lot, thanks! Ill wait until the bounty period is over to see if any other answers pop up, though. – Manishearth Feb 28 '12 at 1:10
                
            Also, those are the conventions I and those around me adopt. I distinctly remember in an introductory physics class when I was an undergrad, the TA calling "del" in the context of the vector operations , × and . So there may be different cultural norms, with the lines between the community being somewhat fuzzy and indistinct. – Willie Wong Feb 28 '12 at 8:39
                
            I do hope the bounty worked. I remember readind somewhere that CW posts still get bounties. Not sure, though. – Manishearth Mar 2 '12 at 4:19
                
            @Manishearth: it worked alright. I was wondering how I got rep for an answer I remember marking as CW. :) – Willie Wong Mar 2 '12 at 8:39
            Alright, I've sort of realized that there is no correct answer to this question. I'm summing up my thoughts and what I've gleaned from the other answers in this CW answer so that I can accept it.
            It really depends upon context. Usually you won't see used with together in an ambiguous manner. Usually will be in the form of x , so it won't be confused if read out "del by del x". δ is never called del. Covariant derivatives can have their bases declared beforehand. So, here is the list of ways of pronouncing stuff:
            • :
              • Usually: It seems to be generally read as "partial derivative of ... wrt ..." or something similar.
              • Other pronunciations: Can be also read "del", "dou", "die", "derronde", and lots of other things.
              • I guess in unambiguous situations the much faster "del by del x" can be used.
              • If reading out xf , read it out the long way.
            • :
              • Usually: When you hear "del", it's probably this fellow.
              • Other pronunciations: Also can be read out as "nabla" ("nabla dot"/"nabla cross" &c). One can use "div"/"grad"/"curl"/"covariant derivative of" to be specific.
            • δ : This poor fellow is hardly used. I guess it's called del because it looks like and/or is a "shortened" version of Δ --"DELta".
              • Usually: Just read it out as "delta" or "small delta".
              • Other pronunciations: None. Hopefully.
            I personally read as "nabla", and keep the rest as "del". While reading, it's OK to think of all of these as del I guess.
            share|cite|improve this answer
                
            Note that I'm still going to give out the bounty to the most helpful post. I'll wait till tomorrow and choose, there may be some other interesting answers. – Manishearth Mar 1 '12 at 13:27
            I don't know how relevant this is, but in Brazil, none of those symbols is called "del." is Nabla, δ is (lowercase) Delta, and is "Derronde." (A bastardizazion of the French for "round D.")
            share|cite|improve this answer
                
            Helps, but I sort of wanted the international (atleast the English) version. As in what would one generally expect at a conference? – Manishearth Feb 24 '12 at 13:43
            2  
            I'm from Brazil as well, and every professor I've came across at my university call the symbol "del". But I have watched lessons (on YouTube) given by a professor from some other city, and I've indeed heard him call it "derronde". – pagliuca Apr 20 '14 at 16:21
            Mathematicians i know refer in general to the differential operator represented by the symbol (nabla) as del. Like someone refers to the operator of addition, represented by the symbol of + with the word "plus". But when it comes to a specific vector operation like x or ×x they refer to this operation as div x or curl x. But theoretically speaking as the word del describes the operator represented with the symbol , someone could combine the words and refer to the operation ×x as del cross x.
            share|cite|improve this answer
                
            Aah, but how would they distinguish ϕ from ϕ ? Especially in einstein notation, we can confuse covariant derivatives with partials. – Manishearth Feb 27 '12 at 15:32
            1  
            Here,we refer to ϕ as "theta phi" and ϕ as nabla ϕ or grad ϕ . As far as the covariant derivatives are concerned we clearly declare the tangent vector and refer to the whole operation as "differentiation across the tangent vector u " so there is no misunderstanding or misinterpretation. – chemeng Feb 27 '12 at 16:02
            1  
            @chemeng: Why would you call the differential operator "theta" when it is not the Greek letter theta? – KCd Mar 1 '12 at 17:09
            2  
            Actually I come from Greece and is just a caligraphic way that many people use to write θ . – chemeng Mar 1 '12 at 20:31
            This is an interesting discussion. In the context of partial derivatives, I've only ever known ∂ to be pronounced "dar", so that ∂y/∂x is read aloud as "dar y dar x", and there is no mistaking it for dy/dx which is read as "dee y dee x". But I have probably not heard it in speech beyond the Australian universities I studied at, so I'm in no position to judge how international is this pronunciation. (BTW, pronounce "dar" so it rhymes with bar, car, far, Mar(s), etc.)

            No comments:

            Post a Comment