Monday, April 27, 2015

Wissner-Gross The Nabla operator ∇ is used to assign a directional vector (the direction of the force of intelligence) to each state (in our case: all possible future states). The more freedom of action a state provides the stronger the force is pulling in that direction. So, ∇Sτ is the pointing into the direction with the most freedom of action. The multiplication with T means the more power we have to act, the stronger the force can be


A new Equation for Intelligence F = T ∇ Sτ - a Force that Maximises the Future Freedom of Action

Intelligence is a Force with the Power to Change the World


Describing intelligence as a physical force that maximises the future freedom of action, adds a new aspect to intelligence that is often forgotten: the power to change the world. This, I think, was the biggest revelation for me, when I started thinking about the the new equation for intelligence. The second revelation was, that intelligent systems are survival engines, that increase their chances of survival by maximising a single quantity: the freedom of action. Both insights may sound trivial or obvious, but I don't think they are.

A few days ago a saw the TED talk "A new equation for intelligence" by Alex Wissner-Gross. He presents an equation he published in April 2013 in a physics journal. It may not be the most impressive talk I have ever seen. And I had to watch it twice to fully understand it. But the message excites me so much, that I don't sleep well since a few days. I thought everybody must be excited about this equation. But, it seems that this is not the case. Either I am not understanding it correctly or others don't get it. Or maybe it resonates with me, because I am physicist, with a strong background in computing, who has done research in computational biology. To find this out, let me explain my understanding of the equation. Please tell what your think and what's wrong with my excitement (I need sleep)....

So, why did the equation blow me away? Because this very simple physical equation can guide us in our decisions and it makes intelligent behaviour measurable and observable. It adds a new real physical force to the world, the force of intelligence. From the equation we can deduce algorithms to act intelligently, as individuals, as societies and as mankind. And we can build intelligent machines using the equation. Yes, I know, you may ask: "How can the simple equation F = T  Sτ do all of that?"

Intelligence is a Force that Maximises the Future Freedom of Action

Before we look at the equation in more detail, let me describe its essence in every day terms. Like many physical laws or equations the idea behind it is simple:
  • Intelligence is a force that maximises the future freedom of action.
  • It is a force to keeps options open.
  • Intelligence doesn't like to be trapped.
But what is necessary to keep options open and not to be trapped? Intelligence has to to predict the future and change the world in a direction that leads to the "best possible future".  In order to predict the future, an intelligent system has to observe the world and create a model of the world. Since the future is not deterministic the prediction has to be based on some heuristics. Prediction is a kind of statistical process. In order to change the world, the intelligence has to interact with the world. Just thinking about the world, without acting, is not intelligence, because it produces no measurable force (well, sometimes it is intelligent not to act, because the physical forces drive you already in the right direction, but that is a way of optimising resources). The better in can predicted the future and the better a it can change the world in the desired direction, the more intelligent the system is.

The new Equation for Intelligence F = T ∇ Sτ

Note: skip this section, if you are not interested in understanding the mathematics of the equation!


This is the equation:

   F = T

Where F is the force, a directed force (therefore it is bold), T is a system temperature, Sτ is the entropy field of all states reachable in the time horizon τ (tau). Finally, ∇ is the nabla operator. This is the gradient operator that "points" into the direction of the state with the most freedom of action. If you are not a physicist this might sound like nonsense. Before I try to explain the equation in more detail, let's look at a another physical equation of force.

The intelligence equation very similar to the equation for potential energy F = ∇ Wpot. Wpot is the potential energy at each point is space. The force F pulls into the direction of lower energy. This is why gravitation pulls us in direction of the center of the earth. Or think of a landscape. At each point the force points downhill. The direction is the direction a ball would roll starting at that point. The strength of the force is determined by the steepness of the slope. The steeper the slope, the stronger the force. Like the ball is pulled downhill by the gravitational force to reach the state with the lowest energy, an intelligent system is pulled by the force of intelligence into a future with lowest number of limitations. In physics we use the  Nabla operator or gradient to turn a "landscape" into a directed force (a force field).

Back to our equation F = T  Sτ. What it says is that intelligence is a directed force F that pulls into the direction of states with more freedom of action. T is a kind of temperature, that defines the overall strength (available resources) the intelligent system has (heat can do work, think of a steam engine: the more heat the more power).   is the "freedom of action" of each state that can be reached by the intelligence within a time horizon τ (tau).  The time horizon is how far into future the intelligence can predict. Alex Wissner-Gross uses the notion of entropy S to express the freedom of action in the future. The force of intelligence is pointing into that direction. As we have seen, in physics the direction of the force at each state is calculated by a gradient operation ∇ (think of the direction the ball is pulled). The Nabla operator ∇ is used to assign a directional vector (the direction of the force of intelligence) to each state (in our case: all possible future states). The more freedom of action a state provides the stronger the force is pulling in that direction. So, Sτ is the pointing into the direction with the most freedom of action. The multiplication with T means the more power we have to act, the stronger the force can be.

Note: the optimal future state is the optimal state form the viewpoint of the intelligent system. It might not the optimal state for other systems or for the entire system.

If you want to understand the equation in more detail read the original paper 'Causal Entropic Forces - by A. D. Wissner-Gross and C. E. Freer'.


Understanding the Laplace operator conceptually

The Laplace operator: those of you who now understand it, how would you explain what it "does" conceptually? How do you wish you had been taught it?
Any good essays (combining both history and conceptual understanding) on the Laplace operator, and its subsequent variations (e.g. Laplace-Bertrami) that you would highly recommend?
share|improve this question
    
    
@JohnM You are very right! I feel terrible for asking a duplicate now. Can it be merged? Should it be made into a community wiki? –  user89 Jun 3 '14 at 5:05
    
I just wanted to point out those other answers in case they are of interest to you. –  John M Jun 3 '14 at 15:44

5 Answers 5


up vote38down voteaccepted
+50
The Laplacian Δf(p)  is the lowest-order measurement of how f  deviates from f(p)  "on average" - you can interpret this either probabilistically (expected change in f  as you take a random walk) or geometrically (the change in the average of f  over balls centred at p  ). To make this second interpretation precise, write the Taylor series

f(p+x)=f(p)+f i (p)x i +12 f ij (p)x i x j + 

and integrate:

 B r (p) f=f(p)V(B r )+f i (p) B r (0) x i dx+12 f ij (p) B r (0) x i x j dx+. 

The integrals x i dx  vanish because x i   is an odd function under reflection in the x i   direction, and similarly the integrals x i x j dx  vanish whenever ij  ; so this simplifies to

1V(B r )  B r (p) f=f(p)+CΔf(p)r 2 + 

where C  is a constant depending only on the dimension.
The Laplace-Beltrami operator is essentially the same thing in the more general Riemannian setting - all the nasty curvy terms will be higher order, so the same formula should hold.
share|improve this answer
    
Dear Anthony, I have accepted your answer, but wonder if you could write a little about why the Laplace operator is sometimes called the "diffusion operator"? Also, do I understand it correctly that in only one "spatial dimension" x  , it simply reduces to the second derivative? I.e. u xx =Δu  ? –  user89 May 28 '14 at 8:35
3  
@user89: Correct on the second question. Diffusion is a process that smooths out some density function by changing the value at each point to be closer to the values at surrounding points, and is modelled by the equation f/t=Δf  . –  Anthony Carapetis May 28 '14 at 10:08
    
+1 for "all the nasty curvy terms will be higher order" –  Neal Jun 12 '14 at 13:41

I think the most important property of the Laplace operator Δ  is that it is invariant under rotations. In fact, if a differential operator on Euclidean space is rotation and translation invariant, then it must be a polynomial in Δ  . That is why it is of such prominence in physical problems.
Some good books on the subject:
  1. Rosenberg's The Laplacian on a Riemannian Manifold.
  2. Gurarie's Symmetries and Laplacians.
share|improve this answer
    
Is the Laplacian invariant under rotations because it takes into account all points in the neighbourhood of the point in question? –  user89 May 28 '14 at 8:32
3  
Yes, because it takes into account all the points in a symmetric way. –  John M May 28 '14 at 12:59
    
John, if I consider the function f(x)=x 2   in 2D, rotating my axes can give me different values for the second derivative at a particular point (the second derivative being what the Laplace operator reduces to in one spatial dimension). This does not seem to be invariant -- why is that? –  user89 May 29 '14 at 0:23
    
Nice thing to try to check! It is invariant: If f(x,y)=x 2   in R 2   , then Δf=( f /x 2 )+( f /y 2 )=2+0=2  . On the other hand, if you rotate your axes 45 degrees clockwise, you get f(x,y)=x 2 /2+y 2 /2  . This also has Δf=1+1=2  . Rotate you axes another 45 degrees to get f(x,y)=y 2   . Again Δf=0+2=2  . –  John M May 29 '14 at 1:34
    
Ah. Hmm. I was simply flipping the graph of f(x)=x 2   , so that in some new axis system, it would be f  (x  )=(x 2 )  (i.e. rotating it 180 degrees). Then, the second derivative of that would be 2  . I know I am making an incredibly silly error here, but I can't seem to be able to catch it. –  user89 May 29 '14 at 5:13

To gain some (very rough) intuition for the Laplacian, I think it's helpful to think of the Laplacian on R  , which is just the second derivative d 2 dx 2    . (This answer may be more elementary than the OP was looking for, but I wish I had kept some of these things in mind when I first learned about the Laplacian.)
Just as Anthony's answer discusses, the second derivative at pR  measures how much f(p)  deviates from average values of f  on either side of it. If the second derivative is positive, then f(p)  is smaller than the average of f(p+h)  and f(ph)  for small h  . (As I would tell my calculus students, the trapezoid rule for Riemann sums is an overestimate when the second derivative is positive.)
Generally, a function is harmonic if and only if it satisfies the mean value property. In R  , harmonic functions are simply linear polynomials, which of course are precisely the functions that satisfy the mean value property.
The maximum principle states roughly that if Δu0  , then local maxima of u  do not occur. This is a generalization of the familiar "second derivative test" from calculus, which says that if the second derivative of u  is positive, then local maxima of u  do not occur (the graph of u  is concave up).
Finally, let me go up one dimension and mention some of my intuition for harmonic functions u(x,y)  of two variables, in which case Δu= 2 ux 2  + 2 uy 2    . If u  is harmonic, then  2 ux 2  = 2 uy 2    . This says that the graph of u  must always look like a saddle: if, say, the graph is concave up in the x  -direction ( 2 ux 2  >0  ), then it must be concave down in the y  -direction ( 2 uy 2  <0  ). When I picture a saddle-shaped graph in my head, I think I can also see why the maximum principle has to hold for harmonic functions, since a saddle has no local extrema.
share|improve this answer

Another view along the lines of the answers above:
Suppose you have some region in the plane Ω  , and you are given the value of some scalar function f  along the boundary Ω  . You now want to fill in f  on the interior of Ω  "as smoothly as possible." (A common physical interpretation is that f  is the heat of the region: you are fixing the temperature of the boundary of Ω  and want to know what the temperature on the interior will be at steady state.)
What does "as smooth as possible" mean? Well, one measure of the smoothness of f  is to look at its gradient f  and measure
E(f)= Ω f 2 dA. 
Notice that this integral, called the Dirichlet energy of f  , achieves its lowest possible value of 0  when f  is constant. The less smooth (to first order) that f  is, the higher the Dirichlet energy will be. Making f  as smooth as possible means finding the f  that satisfies the boundary conditions and minimizes E  .
How do we minimize E  ? We "take the derivative and set it to zero":
 f E=0. 
It may look a little weird to differentiate a scalar (the Dirichlet energy) with respect to a function, but the idea is the same as when you work with the ordinary gradient. Recall that for an ordinary scalar function g(x,y,z):R 3 R  , the gradient g  at a point is the unique vector that, when you dot it with any direction v  , tells you the directional derivative of g  in that direction:
g(x,y,z)v=ddt g[(x,y,z)+tv]∣ ∣ ∣  t0 . 
The gradient of E  works the same way: it gives you the unique function over Ω  that, when you take the inner product of E(f)  with any variation δf  of f  , gives you the directional derivative of E  in that "direction":
 Ω E(f)δfdA=ddt E(f+tδf)∣ ∣ ∣  t0 . 
You can do the multivariable calculus and after some integration by parts, you will see that E(f)=Δf.  Several takeaways from this:
  • The function f  that interpolates the boundary conditions as smoothly as possible (in the sense of minimizing the Dirichlet energy) is the solution to the Laplace equation Δf=0  .
  • Given some function f  that interpolates the boundary conditions but does not minimize the Dirichlet energy, the gradient of E  , Δf  , is the "direction of steepest ascent" of E  -- the direction to change f  if you want to most quickly increase E  . The negative of this, Δf  , is the direction that most quickly decreases E  : if you are trying to smooth f  , this is then the direction that you want to flow f  in. This insight leads to the heat equation
    dfdt =Δf 
    which, given initial temperatures on Ω  , flows in the direction that best decreases the Dirichlet energy until the heat has diffused as smoothly as possible over the surface.
  • Nowhere in the above discussion was it essential that Ω  was a piece of a plane: as long as you can define functions on Ω  and take gradients of f  to get the Dirichlet energy, the above works equally well, and is one way of motivating the Laplace-Beltrami operator on arbitrary manifolds in R 3   . The physical picture here is that you have some conductive plate in empty space, and heat up the boundary of the plate, and look at how the heat equalizes over the plate.
share|improve this answer
    
It's interesting that some approaches to image denoising / deblurring / restoration minimize an objective function that contains a discrete version of the Dirichlet energy of the restored image, in order to encourage the restored image to be smooth. More sophisticated approaches use a penalty term E(f)= Ω f 1 dA  , which allows sharp edges in an image to be preserved. (It seems interesting to work out a differential equation based on this energy, as you did for the Dirichlet energy.) –  littleO Jun 3 '14 at 6:42

Here is some intuition:
I think the most basic thing to know about the Laplacian Δ  is that Δ=div  , and div  is the adjoint of   . Hence, Δ  has the familiar form A T A  which recurs throughout linear algebra. We see that Δ  is a self-adjoint positive semidefinite operator, and so we would expect (or hope) that the familiar properties of positive semidefinite operators in linear algebra hold true for Δ  . Namely, we expect that Δ  has real nonnegative eigenvalues, and that there should exist (in some sense) an orthonormal basis of eigenfunctions for Δ  . This provides some intuition or motivation for the topic of "eigenfunctions of the Laplacian". (By the way, I think the Laplacian should have been defined to be div  .)
Notice that the integration by parts formula can be interpreted as telling us that ddx   is the adjoint of ddx   (in a setting where boundary terms vanish). Fourier series can be discovered by computing the eigenfunctions of the anti-self-adjoint operator ddx   in an appropriate setting. Moreover, a multivariable integration by parts formula can be interpreted as telling us that div  is the adjoint of   . Green's second identity can be interpreted as expressing the self-adjointness of the Laplacian.

No comments:

Post a Comment