Friday, August 9, 2013

In general \Markov" means the future depends on the past only through the present.

In general \Markov" means the future depends on the past only through the present.

http://math.arizona.edu/~tgk/529/section2.pdf

2 Brownian Motion
 
We begin with Brownian motion for two reasons. First, it is an essential ingredient in the

de nition of the Schramm-Loewner evolution. Second, it is a relatively simple example

of several of the key ideas in the course - scaling limits, universality, and conformal

invariance.

The article by Kager and Nienhuis has an appendix on probability and stochastic

processes (Appendix B). It includes a couples of pages on Brownian motion. Lawler's

book and Werner's St. Flour article assume the reader is familiar with Brownian motion.
For this chapter, I am following two books: Chapter 7 of Probability: Theory and Examples

by Richard Durrett and chapter 2 of Brownian Motion and Stochastic Calculus by Ioannis


Karatzas and Steven Shreve.
2.1 De nition and properties
 
We recall a basic construction from probability theory. Let (;F; P) be a probability

space, i.e., a measure space with P() = 1. Let X1;X2; ;Xm be random variables, i.e.,

measurable functions. Then we can de ne a Borel measure on Rm by

(B) = P((X1;X2; ;Xm) 2 B) (6)

where B is a Borel subset of Rm. One can then prove that for a fuction f(x1; x2; ; xm)

which is integrable with respect to , we have

Ef(X1;X2; ;Xm) = ZRm

f(x1; x2; ; xm)d (7)


Of course, this measure depends on the random variables; when we need to make this
explicit we will write it as X1; ;Xn.

The random variables X1;X2; ;Xm are said to be independent if the measure

X1; ;Xn equals the product of the measures X1 ; X2 ; Xm. Two collections of random

variables (X1; ;Xm) and (Y1; ; Ym) are said to be equal in distribution if X1; ;Xn =

Y1; ;Yn.


We now turn to Brownian motion. It is a continuous time stochastic process. This
means that it is a collection of random variables Xt indexed by a real paramter t.

De nition 1 A one-dimensional (real valued) Brownian motion is a stochastic process

Bt, t 0, with the following properties.

(i) If t0 < t1 < t2 < tn, then Bt0 , Bt1 􀀀Bt0 , Bt2 􀀀Bt1 ; ;Btn 􀀀Btn􀀀1 are independent


random variables.
(ii) If s; t 0, then Bt+s 􀀀 Bs has a normal distribution with mean zero and variance t.


So
P(Bt+s 􀀀 Bs 2 A) = ZA

(2 t)􀀀1=2 exp(􀀀x2=2t)dx (8)


6
where A is a Borel subset of the reals.

(iii) With probability one, t ! Bt is continuous.


In short, Brownian motion is a stochastic process whose increments are independent,

stationary and normal, and whose sample paths are continuous. Increments refer to the
random variables of the form Bt+s 􀀀 Bs. Stationary means that the distribution of this

random variable is independent of s. Independent increments means that increments


corresponding to time intervals that do not overlap are independent. Proving that such a

process exists is not trivial, but we will not give the proof. The above de nition makes no

mention of the underlying probability space . One can take it to be the set of continuous
functions !(t) from [0;1) to R with !(0) = 0. Then the random variables are given by

Bt(!) = !(t). Unless otherwise stated, we will take B0 = 0. We list some standard


consequences of the above properties.
Theorem 1 If Bt is a Brownian motion then

(a) Bt is a Gaussian process, i.e., for any times t1; ; tn, the distribution of Bt1 ; ;Btn


has a multivariate normal distribution.
(b) EBt = 0 and EBsBt = minfs; tg.

(c) Let
>
0. De ne a stochastic process Xt by

Xt =
􀀀2B
t
(9)

for t 0. Then Xt is a Brownian motion.


(d) De ne
p(t; x; y) = (2 t)􀀀1=2 exp(􀀀(x 􀀀 y)2

2t


) (10)
Then for Borel subsets A1;A2; ;An of R,

P(Bt1 2 A1;Bt2 2 A2; ;Btn 2 An) =

RA1

dx1 RA2

dx2 RAn

dxn p(t1; 0; x1) p(t2 􀀀 t1; x1; x2) ; p(tn 􀀀 tn􀀀1; xn􀀀1; xn)

Exercise: Prove parts (b) and (c) of the above. Hint for (b): If random variables X and

Y are independent, then E XY = EX EY . For s > t, write Bs as (Bs 􀀀 Bt + Bt). The


ambitious reader is welcome to prove parts (a) and (d) as well.
The de nition of d-dimensional Brownian motion is easy. We take d independent

copies of one-dimensional Brownian motion, and label them as B1

t ;B2

t ; ;Bd

t . Then

(B1

t ;B2

t ; ;Bd

t ) is a d-dimensional Brownian motion. We can also think of the twodimensional

Brownian motion (B1

t ;B2

t ) as a complex valued Brownian motion by considering

B1

t + iB2

t .


The paths of Brownian motion are continuous functions, but they are rather rough.
With probability one, the Brownian path is not di
erentiable at any point. If
< 1=2,


7
then with probability one the path is Holder continuous with exponent . But if > 1=2,

then the path is not Holder continuous with exponent . For any interval (a; b), with

probability one the path is neither increasing or decreasing on (a; b). With probability


one the path does not have bounded variation. This last fact is important because it says

that one cannot use the Riemann-Stieltjes integral to de ne integration with respect to
Bt.


One of the key tools in the stochastic calculus which we will learn about later is the Ito
formula. It is often summarized in the statement (dB)2 = dt. The following proposition


is in that spirit.
Proposition 1 Fix t > 0. For each n let Pn be a partition of [0; t]. Let jjPnjj be the width

of the largest subinterval in the partition, and suppose that jjPnjj ! 0. (For example, we

could let Pn consist of n intervals of width t=n.) For a partition P = ft0; t1; ; tmg we


de ne
i = Bti 􀀀 Bti􀀀1 (11)


and then de ne
X(P) =



m
 
Xi=1

2

i (12)


Then with probability one,
lim
n!1

X(Pn) = t (13)


We do not give a proof, but we note that a particular case of the theorem follows from

the law of large numbers. We leave the details to the reader:
Exercise: Take Pn to be the uniform partition with n subintervals. Use the law of large


numbers to prove (13) for this special case.
2.2 Brownian motion as scaling limit of random walks
 
A sequence of random variables Xn is independent if every nite subset is independent. It

is identically distributed if each random variable has the same distribution, i.e., P(Xn 2 A) = P(Xm 2 A) for any Borel set A and any n;m. A sequence of random variables


which is both independent and identically distributed is called an i.i.d. sequence. Note

that for an identically distributed sequence, the random variables all have the same mean

and variance.
A one-dimensional random walk is de ned as follows. Let Xn be an i.i.d. sequence of

random variables. We assume that the mean, EXn, is zero and the variance is nite. It

is no loss of generality to take the variance to be 1, so EX2

i = 1. Let

Sm =



m
 
Xi=1

Xi (14)


8

This is a rather general random walk in the sense that it allows very general steps. The
simplest random walk is to just take Xi to be 1 with probability 1=2. We can picture


this as follows. We start at the origin and ip a fair coin. For heads we take a step
forward, for tails a step backwards. We repeat this. Then Sm is our position after m

steps. In this simple case Sm is always an integer, so the random walk lives on the lattice

Z.

The Sm form a discrete time stochastic process. We make this into a continuous time


stochastic process by linear interpolation. More precisely,
St = St if t is an integer

linear on [m,m+1] if t 2 [m;m + 1]


(15)

Since the variance of a sum of independent random variables is the sum of their
individual variances, the variance of Sm is m. So the typical size of St is pt. This

motiviates the following rescaling. For each positive integer n, we let

Sn

t = n􀀀1=2Snt (16)

If we picture a graph of St, then to get Sn

t we shrink the horizontal (time) axis by a factor

of n and shrink the vertical (space) axis by a factor of pn. Note that for t which are

equal to an integer divided by n, the variance of Sn

t is t.

Now consider times 0 < t1 < t2 < tm where each time is equal to some integer

divided by n. Consider the random variables St1 ; St2 􀀀 St1 ; Stn 􀀀 Stn􀀀1 . Each of them

is a sum of a subset of the Xi and no Xi appears in more than one of these sums. Thus

these random variables are independent. If n is large, each of the random variables is the

sum of a large number of i.i.d. random variables and so is approximately normal. So Sn

t is

looking like Brownian motion, at least at the times which are multiples of 1=n. So we can

hope that as n ! 1, Sn

t will converge to Brownian motion. This is indeed a theorem,

proved by Donsker in 1951 and sometimes called the invariance principle. To state it


in its strongest form requires a de nition about convergence of measures. We start by

stating a weaker form that is a bit easier to digest.
Theorem 2 (invariance principle) Fix times 0 < t1 < t2 < < tm. We use Erw to


denote expectation with respect to the probability measure for the original i.i.d. sequence
Xi. Let Xt be a Brownian motion. We use Ebm to denote expectation with respect to

its probability measure. Then for every bounded continuous function f(x1; x2; ; xm) on

Rm, we have


lim
n!1

Erwf(Sn

t1 ; Sn

t2 ; ; Sn

tm) = Ebmf(Xt1 ;Xt2 ; ;Xtm) (17)


This is already a pretty good theorem and the following somewhat technical discussion

is only to get a stronger statement of the above and can be skipped without a big loss.

The technical stu
ends where we consider how Brownian motion illustrates the ideas of

scaling limits, critical phenomena and universality.

9
De nition 2 Suppose that the sample space is a metric space. Suppose that Pn is a

sequence of probability measures on de ned on the Borel subsets. Let P be another such

probability measure. We say that Pn converges weakly to P if


lim
n!1Z fdPn = Z fdP (18)

for every bounded, continuous real-valued function f on .

Now look at the conclusion of the theorem. For each n let n be the probability

measure on Rm that comes from the random variables Sn

t1 ; Sn

t2 ; ; Sn

tm. Let be the

probability measure on Rm that comes from Xt1 ;Xt2 ; ;Xtm. Then the conclusion of

the above theorem is that n converges weakly to . A probabilist says that the sequence

of random vectors (Sn

t1 ; Sn

t2 ; ; Sn

tm) converges in distribution to (Xt1 ;Xt2 ; ;Xtm). And

the conclusion of the above theorem is that the nite dimensional distributions of Sn



t
 
converge in distribution to those of Brownian motion.

The stronger form of the theorem does not just look at the process at a nite set of
times. Let C[0;1) be the space of continuous functions on [0;1). We let P denote the

probability measure on this space for Brownian motion. For each n, Sn

t is a continuous

function of t. So Sn

t also de nes a probability measure on C[0;1). We denote it by Pn.


It is supported on piecewise linear functions.
Theorem 3 (Invariance principle of Donsker) Let Xi be an i.i.d. sequence of random

varibles de ned on the probability space (;F; P). Suppose that they have mean zero and

variance 1. De ne Sn

t by the linear interpolation and scaling de ned above, and let Pn be

the probabilty measure on C[0;1) induced by the process Sn

t . Then Pn converges weakly

to a probability measure P for which Bt(!) = !(t) is standard one-dimensional Brownian


motion.
We now consider how Brownian motion illustrates the ideas of scaling limits, critical

phenomena and universality. We start with the scaling limit. Usually in statistical physics

one starts with a model de ned on a lattice and then tries to understand what the scaling
limit is. If we take Xi = 1 with equal probability, then the random walk stays on the

lattice Z. The scaling limit is what we did above when we shrunk time by a factor of n

and space by a factor of pn. For this model we have a candidate for the scaling limit


(Brownian motion) and a theorem that says the scaling limit is indeed equal to Brownian

motion. This is not the typical situation in statistical physics. There we are lucky if we

have an explicit candidate for the scaling limit and extremely lucky if we have a theorem

that says the scaling limit does coverge to the candidate. What is exciting about SLE is

that it de nes in a fairly explicit way candidates for scaling limits, and for some models

we even have a theorem.

Now consider universality. The invariance principle is a very strong form of universality.

It says that we can start with any random walk, subject only to the conditions that

10

the steps have mean zero and variance 1, and the scaling limit will converge to the same

stochastic process, i.e., Brownian motion. We have stated the invariance principle only

for one dimension. But it is true in any number of dimensions. For example, we can take
a random walk on the lattice Zd which at each step moves by ei with probability 1=2d

where ei is the unit vector in the ith coordinate direction. We then take a scaling limit

as we did above. This will converge to a d-dimensional Brownian motion. (I am ignoring


a slight rescaling that needs to be done here.)

Finally we consider criticality. In the scaling limit the steps of the random walk are
of size 1=pn. So the random walk is formed by combining in nitely many microscopic


random inputs. The result, Brownian motion, is clearly random. So it appears that

Brownian motion is a critical phenomena. This is a bit confusing from the viewpoint of

statistical physics. Usually in a statistical physics model one must adjust a parameter,

e.g., the temperature, to a particular value to make the model have critical behavior.

There appears to be no such parameter in the random walk model. In some sense the
condition that the mean of the step Xi must be zero plays the role of adjusting a parameter

to make the model critical. Consider a one-dimensional random walk with steps of 1,

but now take Xi = 1 with probability p and Xi = 􀀀1 with probability 1􀀀p with p 6= 1=2.

Now the typical size of Sn is n, not pn as before. So to construct a scaling limit we must


de ne
Sn

t = n􀀀1Snt (19)

Now in the scaling limit, Sn

t will converge to a straight line with a slope which depends

only on p. So the scaling limit has no randomness at all.

Exercise: For p 6= 1=2, nd the slope m of the line to which (19) converges. Prove that

for t > 0,


lim
n!1

Sn

t = mt (20)


with probability one. Hint: law of large numbers.
2.3 Conditional expectation
 
This subsection is not primarily about Brownian motion. We are going to de ne the
conditional expectation of a random variable with respect to a - eld. This is needed in


the next section on the Markov property of Brownian motion. It is an essential part of

the de nition of a martingale, a concept that we will use throughout the course.

The material in this subsection is covered in Math 563a. See Prof. Watkins notes or

any graduate level probability book for proofs. The student who has not seen conditional

probability or conditional expectation at an undergraduate level should take a look at
an undergrad probability book, e.g., The Essentials of Probability by Richard Durrett,

Elementary Probability by David Stirzaker or Probability: An introduction by Geo
rey


Grimmett and Dominic Welsh.

11

To develop some intution we will rst look at the conditional expectation of one

random variable given another. We start with the notion of conditional probability. Let
A be an event with P(A) > 0. The conditional probability of A given B is

P(AjB) =

P(A \ B)

P(B)


(21)
What does P(AjB) mean? This is the probability of A if we know for certain that

the outcome is in B. Put another way, if we do the experiment N times, keep only the

outcomes that belong to B, then the fraction of these outcomes that belong to A converges

to P(AjB) as N ! 1.

If we x the given event B and think of this conditional probability as a function of A,

A ! P(AjB), then this de nes a new probability measure. So given a random variable

X, we can compute the integral of X with respect to this new probability measure.

This is called the conditional expectation of X given B and written as E(XjB). For

example, consider a discrete random variable X, i.e., a random variable that only takes

on a countable set of values. Let x1; x2; be the values that X takes on. Then the

integral of X with respect to a general measure is

Z X d =Xi

xi (Ei) (22)

where Ei = X􀀀1(xi). So

E(XjB) =Xi

xi P(EijB) (23)

Probabilists usually write the event X􀀀1(xi) as just X = xi. So

E(XjB) =Xi

xi P(X = xijB) (24)

If we do the experiment N times, keep only the outcomes that belong to B, and average

the value of X we get for these outcomes, then this average will converge to E(XjB) as

N ! 1.

Now let Y be another discrete random variable whose possible values are y1; y2; .

Then for each i, Y = yi is an event. We can use the above to de ne E(XjY = yi).

Now think of this as a function of yi. More precisely, let (yi) = E(XjY = yi) and

de ne (y) = 0 when y is not one of the values yi. Now we de ne a random variable by

! ! (Y (!)). This random variable is called the conditional expectation of X given Y

and denoted E(XjY ). Note that for an event B, E(XjB) is a number, but for a random

variable Y , E(XjY ) is another random variable.

This is a rather convoluted de nition, so some intuition is in order. Let Bi denote the

event Y = yi. Note that the Bi are disjoint and their union is the full sample space .


12
So they form a partition of . We can write the conditional expectation of X given Y as

E(XjY ) =Xi

E(XjBi)1Bi (25)

Here 1B denotes the random variable which equals 1 when the outcome is in B and equal

0 when it is not in B. Note that with : R ! R de ned as above, then E(XjY ) = (Y ).

In other words, the random variable E(XjY ) is a function of Y . Of course, there are a

zillion functions of Y . What is special about the function E(XjY )? It is the function of

Y which best approximates X in the following sense.

Proposition 2 For any Borel function f : R ! R,

E[E(XjY ) 􀀀 X]2 E[f(Y ) 􀀀 X]2 (26)

Exercise: Prove the above proposition.

Let (;F; P) be a probability space. Recall that a random variable X is a measurable

function, i.e., for every Borel subset B of the reals, X􀀀1(B) must belong to F. In analysis

one typically has a single eld. In probability there is often more than one eld. So

one cannot just say measurable, one must specify the - eld. We will abbreviate \X is

measurable with respect to F" by just X 2 F.

Given a random variable Y , the collection of events fY 2 Ag, for Borel subsets A of

R is a - eld which we denote by (X) and call \the - eld generated by X." Obviously,

X is measurable with respect to the - eld it generates, i.e., X 2 (X).

Exercise: Let Y be a discrete random variable with values y1; y2; . Let Bi be the event

Y = yi. Show that the following are equivalent for a random variable X.

(i) X is measurable with respect to (Y )

(ii) X is constant on each Bi, i.e., it is of the form

X =Xi

ci1Bi (27)

for some real numbers ci.

(iii) There is a Borel function : R ! R such that X = (Y ).

Thus for two discrete random variables X and Y , the conditional expection of X given

Y is the best approximation to X (in the above L2 sense) using random variables that

are measurable with respect to (Y ). The following exercise gives another way to think

about how E(XjY ) approximates X.

Exercise: Show that for any random variable Z 2 (Y ), we have

EXZ = E[E(XjY )Z] (28)


This way of looking at the conditional expectation now generalizes easily.

13
De nition 3 Let (;F0; P) be a probability space. Let F be a sub - eld of F0, i.e., a

subset of F0 that is itself a - eld. Let X 2 F0 with EjXj < 1. The conditional expectation

of X given F, denoted E(XjF), is the unique random variable Y which satis es

(i) Y 2 F (ii) EXZ = EY Z for any random variable Z 2 F.

The de nition asserts both that such a Y exists and is unique. Of course, uniqueness

means that any other random variable with these two properties must equal Y with


probability one. Uniqueness is not too hard to prove. Existence is harder to prove and

requires the Radon-Nikodym theorem.
Theorem 4 (i) E(E(XjF)) = EX

(ii) If X 2 F then E(XjF) = X.

(iii) If X is independent of F, then E(XjF) = EX.

(iv) If Y 2 F, EjXY j < 1, then E(XY jF) = XE(Y jF).

(v) E(aX + BY jF) = aE(XjF) + bE(Y jF)

(vi) If X Y a.s., then E(XjF) E(Y jF) a.s.

(vii) If F1 F2 then E(E(XjF1)jF2) = E(E(XjF2)jF1) = E(XjF1)

Here is some intuition for these properties. The conditional expectation of X given

F is the random variable which best approximates X within the set of F measurable

functions. (\Best" is in the sense of minimizing the L2 distance.) Property (i) says that


the mean of the approximation is the same as the mean of the random variable we are
approximating. Property (ii) says that if X is already in the set of functions you are using

to approximate X, then the best approximation is X itself. Property (iii) says that if X is


independent of the set of random variables your are using to approximate, then the best

you can do is to take the approximating function to just be a constant. Property (iv) says
that if Y is in the space of functions we are using to approximate with, then we get the

approximation of XY by just multiplying the approximation of X by Y . Properties (v)

and (vi) say that if we x F, then X ! E(XjF) acts like the usual expectation, X ! EX.


I don't know a simple interpretation of property (vii). To remember it think \the smaller
- eld wins." Note that if F1 and F2 are not related by inclusion, then E(E(XjF1)jF2)

and E(E(XjF2)jF1) need not be equal. (This is very similar to projections in a Hilbert


space commuting if their ranges are related by inclusion, but typically not commuting if

their ranges are not so related.)
Property (iii) refers to a random variable being independent of a - eld. What does

this mean? One way to de ne it is that the random variable X and the - eld F are

independent if for every Borel set B R and every A 2 F, we have P(A \ fX 2 Bg) =

P(A)P(X 2 B). The notation is being tortured in the usual way: A \ fX 2 Bg means

the intersection of A and the event f! : X(!) 2 Bg.


14
2.4 Markov properties
 
Filtrations
The above de nition of Brownian motion is missing a key component - a ltration.
A - eld is a collection of subsets of the sample space satisfying some axioms. If X is

a random variable, then the collection of subsets of of the form X 2 B where B is a

Borel subset of R is a - eld. We will denote it by (X). Given a collection of random

variables X
,
2 A, we let (X
;
2 A) denote the - eld generated by the X
, i.e.,

the smallest - eld which contains all sets of the form X
2 B where B is a Borel subset

of R.

De nition 4 A ltration is a family of - elds Ft indexed by t 0 such that Ft Fs

for t < s. It is right continuous if

\s:s>tFs = Ft (29)

A stochastic process Xt is said to be adapted to the ltration Ft if for every t 0, Xt is

measurable with respect to Ft.

Given a stochastic process Xt we can construct a ltration by simply de ning Ft =

(Xs; s t). Obviously, Xt is adapted to this ltration. Unfortunately if we do this with


Brownian motion the resulting ltration is not right continuous. This can be xed. One
can construct a ltration which is right continuous and is the same as (Xs; s t) up


to sets of measure zero. See Durrett for details. We will denote this right continuous
ltration by Ft. I will fudge and just pretend that Ft is (Xs; s t).


Markov property
In general \Markov" means the future depends on the past only through the present.

We make this precise. Start with a random walk Sn = Pn

i=1 Xi (without the interpolation

and rescaling). We x a time m which we think of as the present and assume that we

know Sm, the location of the walk at time m, as well as how the walk got there, i.e., Sj for

j < m. Consider what the walk does after m. Let n > m and write Sn as Sm+(Sn􀀀Sm).

The random variable Sn 􀀀 Sm is independent of the steps the walk took to get to Sm. So

what the walk does after time m only depends on Sm, not on how the walk got to Sm.

We can give a mathematical formulation using conditional expectation and - elds. We

start with a special case in which we can prove the statement. Let n > m. Then

E(Snj (S1; ; Sm)) = E(Sm + (Sn 􀀀 Sm)j (S1; ; Sm))

= E(Smj (S1; ; Sm)) + E(Sn 􀀀 Smj (S1; ; Sm)) = Sm + E(Sn 􀀀 Sm)

= E(Smj (Sm)) + E(Sn 􀀀 Smj (Sm)) = E(Snj (Sm))


(30)

15
In fact a stronger statment is true. For any Borel function f : R ! R,

E(f(Sn)j (S1; ; Sm)) = E(f(Sn)j (Sm)) (31)

In words, the distribution of f(Sn) given S1; S2; ; Sm is the same as the distribution of

f(Sn) given only Sm.

Brownian motion should have a similar propery. For t > s, Bt 􀀀 Bs is independent

of what happened before time s, i.e., independent of Bu for u s. So Bt will depend on

fBu : 0 u sg only through Bs. Thus we should have for any Borel function f and

t > s,

E(f(Bt)jFs) = E(f(Bt)j (Bs)) (32)

This is true, but it is only a statement about the Brownian motion at a single time t in


the future. The entire future should depend on the past only through the present. To

state a stronger form of the Markov property we need to revisit the de nition of Brownian

motion.

Until now we have only considered Brownian motions that start at 0. We now need
Brownian motions that start at other points. Let C[0;1) be the set of continuous functions

on [0;1). Let Bt(!) = !(t). There is a probability measure P which makes these

random variables a Brownian motion and for which B0 = 0 with P probability 1. One

can prove that there is a family of probability measures Px for x 2 R on C[0;1) such

that Bt and Px form a Brownian motion, and Px(f! : !(0) = xg) = 1. We let Ex denote

the expectation with respect to Px.

De ne s : C[0;1) ! C[0;1) by

( s !)(t) = !(t + s); t 0 (33)

So s removes the part of the path for the time interval [0; s] and then shifts time so that


the path begins at time 0.
Theorem 5 (Markov property) Let Y be a bounded random variable on C[0;1) and

s 0. De ne -(z) = EzY , and let EBsY denote -(Bs). Then for any x

Ex(Y sjFs) = EBsY (34)

Note that the right side does not depend on x, so one of the assertions of the theorem

is that the left side does not depend on x. The left side of the above equation is a random


variable since it is a conditional expectation. By properties of conditional expectation it
is measurable with respect to Fs. Intuitively, this means it is a function of the Bt with

t s. EzY is a number. When we replace z by Bs, we get a function of Bs, and so a

random variable which is measurable with respect to (Bs). There is an even stronger


Markov property for Brownian motion involving stopping times. We will return to it at

the end of this section.

16
Martingales
We rst de ne a martingale in the discrete time case. Suppose that our stochastic
process comes from some form of gambling. Xm represents the amount of money we have

at time m. This is often called our \stake." Heuristically, a martingale is a fair game.

Think of m as the present. We know what has happened up to time m. Let n > m be

a time in the future. Then the expected value of Xn given our knowledge of the present

and the past should be the current value of the stake, i.e., Xm. More formally,

E(Xnj (X1;X2; ;Xm)) = Xm (35)

For example, consider a random walk Sm. This can be thought of as a simple gambling

game where we play the same game at each time step. Xi represents the amount of money

we win (or lose if Xi < 0) on the ith play, and Sm is our stake at time m. Then as we


saw when we considered the Markov property of random walks,
E(Snj (S1; S2; ; Sm)) = Sm + E(Sn 􀀀 Sm) = Sm + (n 􀀀 m)EX1 (36)

So the random walk is a martingale if and only if EX1 = 0. In words, it is a fair game if


the average amount we win on a single play is zero. We now turn to a continuous time
process Xt. We assume that EjXtj < 1 for all t 0 so that E(XtjFt) is de ned.

De nition 5 A stochastic process Xt is a martingale with respect to the ltration Ft if

for t > s,

E(XtjFs) = Xs (37)


We construct three examples of martingales involving Brownian motion. As always,
Bt denotes a Brownian motion. For the rst example, we compute E(BtjFs) for t > s.

Since Bt 􀀀 Bs is independent of Fs, E(Bt 􀀀 BsjFs) = E(Bt 􀀀 Bs) = 0. So

E(BtjFs) = E((Bt 􀀀 Bs) + BsjFs) = E(Bt 􀀀 BsjFs) + E(BsjFs) = Bs (38)


Thus Brownian motion itself is a martingale.
For the second example we compute E(B2

t jFs) for t > s.

E(B2

t jFs) = E((Bt􀀀Bs+Bs)2jFs) = E((Bt􀀀Bs)2jFs)+E(B2

sjFs)+2E((Bt􀀀Bs)BsjFs)


(39)
Using independence, E((Bt􀀀Bs)2jFs) = E(Bt􀀀Bs)2 = t􀀀s. Of course, E(B2

sjFs) = B2

s .


In the last term we use property (iv) to get
E((Bt 􀀀 Bs)BsjFs) = BsE((Bt 􀀀 Bs)jFs) = BsE(Bt 􀀀 Bs) = 0 (40)


Thus
E(B2

t jFs) = Bs + t 􀀀 s (41)


17
So B2

t is not a martingale, but we can rewrite the above as

E(B2

t 􀀀 tjFs) = Bs 􀀀 s (42)

Thus Xt = B2

t 􀀀 t is a martingale.


We leave the third example for the reader.
Exercise: (a) For t > s, compute E(exp(Bt)jFs). Use the result to show that exp(Bt 􀀀

t=2) is a martingale.

(b) Find f(t) so that exp(iBt + f(t)) is a martingale.


Stopping times
Consider a two-dimensional Brownian motion starting from the origin. Let T be the


rst time it hits the unit circle. This is a random variable and it is a special kind of
random variable called a stopping time. Let t > 0. We can tell if T t if we know Bs for

0 s t. We do not need to know Bs for s > t. This motivates the following de nition

De nition 6 A stopping time is a random variable T taking values in [0;1] such that

for all t 0, the event fT < tg belongs to Ft.


It is not hard to show that if the ltration is right continuous then the above de nition
is equivalent to the de nition with fT < tg replaced by fT tg. We make the trivial


observation that a nonnegative constant is a stopping time. One of the most common

ways stopping times arise is by looking at when the Brownian motion enters or leaves

some set.
Proposition 3 If A is an open or closed subset of Rd and Bt is a d-dimensional Brownian


motion, then
T = infft : Bt 2 Ag (43)


is a stopping time.
The stopping times from the above proposition are often called the hitting time for A.


There are many ways to combine stopping times to get new stopping times.
Proposition 4 (i) If S and T are stopping times, then minfS; Tg, maxfS; Tg and S+T


are all stopping times.
(ii) If Tn is a sequence of stopping times, then supn Tn, infn Tn, lim infn Tn, and lim supn Tn

are all stopping times. In particular, if limn Tn exists a.s., then the limit is a stopping


time.
18
Caution: S 􀀀 T need not be a stopping time.


Strong Markov property
We now return to the Markov property of Brownian motion. Consider a two-dimensional
Brownian motion and let S be the rst time it hits the unit circle. If we look at the Brownian

motion from time S onwards, it should look the same as a 2d Brownian motion


started at the point on the circle where the Brownian motion rst hit the circle. In other

words, when we say the future depends on the past only through the present, we can take

the present to be a stopping time.
To make this precise we need to de ne FS for a stopping time S.

De nition 7 Let S be a stopping time. FS is the set of events A such that for all t 0,

A \ fS tg is in Ft.

In words, the part of A that lies in S t should be measurable with respect to

the information available at time t. We also need to de ne the time shift S. We let

( S !)(t) = !(S(!)+t). So the path ! is shifted backwards in time by S(!) and the part

of the part between times 0 and S(!) is discarded.

Theorem 6 (Strong Markov property) Let Y be a bounded random variable on C[0;1).

Let S be a stopping time. De ne -(z) = EzY , and let EBSY = -(BS). Then for any x

Ex(Y SjFS) = EBSY (44)

on the event S < 1.

To help digest the theorem, consider a Y which only depends on the sample path at

one time. So let t > 0 and let f : R ! R. De ne Y (!) = f(!(t)). Then the theorem says

Ex(f(!(t + S(!))jFS) = EBS f(!(t)) (45)

We start the Brownian motion at some point x and run the Brownian motion up to the

stopping time S. Knowing what the path is up to this time, we look at the distribution

of Bt+S. The theorem says it has the same distribution as Bt if we start the Brownian

motion at !(S).


Optional sampling theorem
If we think of our stochastic process as a gambling game, then a stopping time is a

simple kind of system. It is a rule for when to quit playing given complete knowledge of

what has happened up to the present, but no knowledge about the future. Suppose we
quit playing at the stopping time T and ask what is the average of XT . For a fair game

we expect that it is the same as our intial stake, i.e., X0. So we expect EXT = EX0.


The equality is not true without some further conditions. For example, consider a 1d
Brownian motion which starts at 0, and let T be the rst time it hits +1. Then BT = 1.

So EBT = 1, but EB0 = 0.


19
Theorem 7 (Optional sampling theorem) Let Xt be a martingale with respect to the

ltration Ft, and let T be a stopping time. Assume that P(T < 1) = 1, EjXT j < 1 and


lim
s!1

E[jXsj jT > s] P(T > s) = 0 (46)


Then
EXT = EX0 (47)

In the example before the theorem, P(T < 1) = 1 and EjXT j = 1. So it must be


that (46) is not satis ed.
Exercise: Let a < 0 < b. Consider a 1d Brownian motion started at 0. Let T be the

rst time it reaches a or b, i.e., the hitting time for fa; bg. The hypothese of the optional


sampling theorem are true in this case. Use the conclusion of the theorem to compute
the probability that the Brownian motion reaches b before it reaches a.


There are deep connections between Brownian motions and PDE's. Here is a nice

example. We consider Brownian motion and think of it as a complex valued Brownian
motion. Let D C be a simply connected domain with continuous boundary. Let f(z)

be a bounded harmonic function on D that extends continuously to @D. (f is harmonic

if f = 0. ) Let Bt be a complex Brownian motion which starts at z 2 D. We will show

later that Xt = f(Bt) is a martingale. (This requires the stochastic calculus and Ito's

formula.) Now let T be the hitting time for @D. Then the optional sampling theorem


says
Ezf(BT ) = Ezf(B0) = f(z) (48)

By the de nition of the stopping time, BT 2 @D. Thus the left side of the above depends

only on the boundary values of f. So the above tells you how to solve Laplace's equation

with given boundary values. To nd the value of the solution at z, you start a Brownian

motion at z and run it until you hit the boundary. Then you average the value of f at


this exit point.
Borel-Cantelli
De nition 8 Let En be a sequence of events. The event fEn i:o:g is the event that En

occurs in nitely often, i.e., the outcome belongs to in nitely many of the En.


A little thought convinces you that
fEn i:o:g = \1

m=1 [n:n>m En (49)

Theorem 8 (Borel-Cantelli Lemma) If En is a sequence of events with Pn P(En) < 1,

then P(En i:o:) = 0. If En is a sequence of independent events with Pn P(En) = 1,

then P(En i:o:) = 1.


20
Note that for a sequence of of independent events, the probability that En occurs i.o.


can only be 0 or 1; which value depends only on whether the sum converges or diverges.
Exercise: The rst half of the Borel-Cantelli lemma is an easy exercise in measure theory


using (49). Prove it. Prove the second half. This is harder, so here are some hints. Prove
that it su ces to show P([n:n>mEn) = 1 for all m. Show that Pn P(En) = 1 implies

Qn(1􀀀P(En)) = 0. Now use 1􀀀P(En) = P(Ecn

) and the independence of the Ecn


to see

what this says.

21

No comments:

Post a Comment