Monday, February 10, 2014

the log-normal distribution of stock prices

http://stat.ethz.ch/~stahel/lognormal/bioscience.pdf

http://www.fintools.com/wp-content/uploads/2012/02/StochasticStockPriceModeling.pdf
Log-normal Distributions

across the Sciences:

Keys and Clues
ECKHARD LI MPERT, WERNER A. STAHEL, AND MARKUS ABBT
A
s the need grows for conceptualization,
formalization, and abstraction in biology, so too does mathematics’

relevance to the field (Fagerström et al. 1996). Mathematics

is

particularly

important

for

analyzing

and

characterizing

random

variation

of,

for example, size and weight of

individuals in populations, their sensitivity to chemicals, and

time-to-event cases, such as the amount of time an individual

needs

to

recover

from

illness.

The frequency distribution

of such data is a major factor determining the type of statistical

analysis

that

can

be

validly

carried

out

on

any

data

set.

Many

widely

used

statistical

methods,

such as ANOVA (analysis

of

variance) and regression analysis, require that the data

be normally distributed, but only rarely is the frequency distribution

of

data tested when these techniques are used.

The Gaussian (normal) distribution is most often assumed

to describe the random variation that occurs in the data from

many scientific disciplines; the well-known bell-shaped curve

can easily be characterized and described by two values: the
arithmetic mean ¯x and the standard deviation s, so that data

sets are commonly described by the expression ¯x ± s. A historical



example

of

a normal distribution is that of chest measurements

of Scottish soldiers made by Quetelet, Belgian

founder of modern social statistics (Swoboda 1974). In addition,

such disparate phenomena as milk production by

cows and random deviations from target values in industrial

processes fit a normal distribution.

However, many measurements show a more or less skewed

distribution. Skewed distributions are particularly common

when mean values are low, variances large, and values cannot

be negative, as is the case, for example, with species abundance,

lengths of latent periods of infectious diseases, and distribution

of

mineral resources in the Earth’s crust. Such skewed dis-

tributions often closely fit the log-normal distribution (Aitchison

and Brown 1957, Crow and Shimizu 1988, Lee 1992,

Johnson et al. 1994, Sachs 1997). Examples fitting the normal

distribution, which is symmetrical, and the lognormal

distribution,

which is skewed, are given in Figure 1.

Note that body height fits both distributions.

Often, biological mechanisms induce log-normal distributions

(Koch

1966),

as when, for instance, exponential growth
O
N THE CHARMS OF STATISTICS
,
AND

HOW MECHANICAL MODELS RESEMBLING

GAMBLING MACHINES OFFER A LINK TO A

HANDY WAY TO CHARACTERIZE LOG
-
NORMAL DISTRIBUTIONS
,
WHICH CAN

PROVIDE DEEPER INSIGHT INTO

VARIABILITY AND PROBABILITY

NORMAL

OR LOG

NORMAL
:

T
HAT IS THE QUESTION

is combined with further symmetrical variation: With a mean

concentration of, say, 10
6
bacteria, one cell division more—

or less—will lead to 2 × 10
6
—or 5 × 10
5
—cells. Thus, the range

will be asymmetrical—to be precise, multiplied or divided by

2 around the mean. The skewed size distribution may be

why "exceptionally"big fruit are reported in journals year after

year

in

autumn.

Such exceptions, however, may well be the

rule: Inheritance of fruit and flower size has long been known

to fit the log-normal distribution (Groth 1914, Powers 1936,

Sinnot 1937).
Articles
What is the difference between normal and log-normal

variability? Both forms of variability are based on a variety

of forces acting independently of one another. A major

difference, however, is that the effects can be additive or

multiplicative, thus leading to normal or log-normal

distributions, respectively.
Eckhard Limpert (e-mail: Eckhard.Limpert@ipw.agrl.ethz.ch) is a

biologist and senior scientist in the Phytopathology Group of the Institute

of

Plant

Sciences

in

Zurich,

Switzerland. Werner A. Stahel (email:

stahel@stat.math.ethz.c

h) is a mathematician and head of the

Consulting Service at the Statistics Group, Swiss Federal Institute

of Technology (ETH), CH-8092 Zürich, Switzerland. Markus Abbt is

a mathematician and consultant at FJA Feilmeier & Junker AG, CH-

8008 Zürich, Switzerland. © 2001 American Institute of Biological

Sciences.
May 2001 / Vol. 51 No. 5 • BioScience 341






Articles
frequency

0 50 100 150 200

55 60 65 70

height
Figure 1. Examples of normal and log-normal distributions. While the

distribution of the heights of 1052 women (a, in inches; Snedecor and

Cochran 1989) fits the normal distribution, with a goodness of fit
p
value of

0.75, that of the content of hydroxymethylfurfurol (HMF, mg·kg

) in 1573

honey samples (b; Renner 1970) fits the log-normal (
p
= 0.41) but not the

normal (
p
= 0.0000). Interestingly, the distribution of the heights of women

fits the log-normal distribution equally well (
p
= 0.74).
Some basic principles of additive and multiplicative

effects can easily be demonstrated with the help of two

ordinary dice with sides numbered from 1 to 6. Adding the

two numbers, which is the principle of most games, leads to

values from 2 to 12, with a mean of 7, and a symmetrical

frequency distribution. The total range can be described as

7 plus or minus 5 (that is, 7 ± 5) where, in this case, 5 is not

the standard deviation. Multiplying the two numbers, however,

leads to values between 1 and 36 with a highly skewed

distribution. The total variability can be described as 6 multiplied

or

divided

by

6

(or

6

/ 6). In this case, the symmetry

has

moved

to

the

multiplicative

level.
×
Although these examples are neither normal nor lognormal

distributions,

they do clearly indicate that additive

and multiplicative effects give rise to different distributions.

Thus, we cannot describe both types of distribution in the

same way. Unfortunately, however, common belief has it

that quantitative variability is generally bell shaped and

symmetrical. The current practice in science is to use symmetrical

bars in graphs to indicate standard deviations or

errors, and the sign ± to summarize data, even though the

data or the underlying principles may suggest skewed distributions

(Factor et al. 2000, Keesing 2000, Le Naour et al.

2000, Rhew et al. 2000). In a number of cases the variability

is clearly asymmetrical because subtracting three standard

deviations

from

the

mean

produces

negative

values,

as

in the example 100 ± 50. Moreover, the example of the dice

shows that the established way to characterize symmetrical,

additive variability with the sign ± (plus or minus) has its

equivalent in the handy sign
×
/ (times or divided by), which

will be discussed further below.

Log-normal distributions are usually characterized in

terms of the log-transformed variable, using as parameters

the expected value, or mean, of its distribution, and the

standard deviation. This characterization can be advanta-
342 BioScience • May 2001 / Vol. 51 No. 5



(a)
100 200 300 400

0

0 10 20 30 40

concentration
–1
(b)

geous as, by definition, log-normal distributions

are

symmetrical

again

at

the

log

level.

Unfortunately, the widespread aversion to

statistics becomes even more pronounced as

soon as logarithms are involved. This may be

the major reason that log-normal distributions

are so little understood in general,

which leads to frequent misunderstandings

and errors. Plotting the data can help, but

graphs are difficult to communicate orally. In

short, current ways of handling log-normal

distributions are often unwieldy.

To get an idea of a sample, most people

prefer to think in terms of the original

rather than the log-transformed data. This

conception is indeed feasible and advisable

for log-normal data, too, because the familiar

properties of the normal distribution

have their analogies in the log-normal distribution.

To improve comprehension of

log-normal distributions, to encourage

their proper use, and to show their importance in life, we

present a novel physical model for generating log-normal

distributions, thus filling a 100-year-old gap. We also

demonstrate the evolution and use of parameters allowing

characterization of the data at the original scale.

Moreover, we compare log-normal distributions from a

variety of branches of science to elucidate patterns of variability,

thereby reemphasizing the importance of lognormal

distributions

in

life.
A physical model demonstrating the

genesis of log-normal distributions
There was reason for Galton (1889) to complain about colleagues

who

were

interested

only

in

averages

and

ignored

random

variability.

In his thinking, variability was even part of

the "charms of statistics." Consequently, he presented a simple

physical

model

to

give

a

clear

visualization

of

binomial and,

finally, normal variability and its derivation.

Figure 2a shows a further development of this "Galton

board," in which particles fall down a board and are deviated

at

decision

points

(the

tips

of

the triangular obstacles)

either left or right with equal probability. (Galton used simple

nails

instead

of

the isosceles triangles shown here, so his

invention resembles a pinball machine or the Japanese game

Pachinko.) The normal distribution created by the board reflects

the

cumulative

additive

effects

of

the sequence of decision

points.

A particle leaving the funnel at the top meets the tip of the

first obstacle and is deviated to the left or right by a distance

c with equal probability. It then meets the corresponding triangle

in

the

second

row,

and is again deviated in the same manner,

and so forth. The deviation of the particle from one row

to the next is a realization of a random variable with possible

values +c and –c, and with equal probability for both of them.
Finally, after passing r rows of triangles, the particle falls into




b

a
Figure 2. Physical models demonstrating the genesis of normal and log-normal distributions. Particles fall from a funnel

onto tips of triangles, where they are deviated to the left or to the right with equal probability (0.5) and finally fall into

receptacles. The medians of the distributions remain below the entry points of the particles. If the tip of a triangle is at

distance
x
from the left edge of the board, triangle tips to the right and to the left below it are placed at
x
+ c and
x
– c

for the normal distribution (panel a), and
x
· c and
x
/ c for the log-normal (panel b, patent pending), c and c being

constants. The distributions are generated by many small random effects (according to the central limit theorem) that are

additive for the normal distribution and multiplicative for the log-normal. We therefore suggest the alternative name

multiplicative normal distribution for the latter.
one of the r + 1 receptacles at the bottom. The probabilities

of ending up in these receptacles, numbered 0, 1,...,r, follow

a binomial law with parameters r and p = 0.5. When many particles



have

made

their

way

through

the

obstacles,

the height of

the particles piled up in the several receptacles will be approximately

proportional

to

the

binomial

probabilities.

For a large number of rows, the probabilities approach a

normal density function according to the central limit theorem.

In its simplest form, this mathematical law states that the
sum of many (r) independent, identically distributed random

variables is, in the limit as r , normally distributed. Therefore,



a Galton board with many rows of obstacles shows normal

density

as

the

expected

height

of

particle piles in the receptacles,
and its mechanism captures the idea of a sum of r



independent random variables.

Figure 2b shows how Galton’s construction was modified

to describe the distribution of a product of such variables,

which ultimately leads to a log-normal distribution. To this

aim, scalene triangles are needed (although they appear to be

isosceles in the figure), with the longer side to the right. Let

the distance from the left edge of the board to the tip of the
first obstacle below the funnel be x



m
. The lower corners of the
first triangle are at x



m
· c and x



/c (ignoring the gap necessary

to allow the particles to pass between the obstacles).

Therefore, the particle meets the tip of a triangle in the next
row at X = x



m
· c, or X = x



m

m
/c, with equal probabilities for both

values. In the second and following rows, the triangles with
the tip at distance x from the left edge have lower corners at

x · c and x/c (up to the gap width). Thus, the horizontal position



of

a particle is multiplied in each row by a random variable

with

equal

probabilities

for

its

two

possible

values

c

and

1/c.

Once again, the probabilities of particles falling into any receptacle

follow the same binomial law as in Galton’s

device, but because the receptacles on the right are wider

than those on the left, the height of accumulated particles is

a "histogram" skewed to the left. For a large number of rows,

the heights approach a log-normal distribution. This follows

from the multiplicative version of the central limit theorem,

which proves that the product of many independent, identically

distributed, positive random variables has approximately

a

log-normal

distribution.

Computer implementations

of the models shown in Figure 2 also are available at the Web
site http://stat.ethz.ch/vis/log-normal (Gut et al. 2000).






Articles
May 2001 / Vol. 51 No. 5 • BioScience 343






Articles
0 0.005

µ* = 100, * = 2

density

0 50 100 200 300 400

original scaleµ* µ* × *
Figure 3. A log-normal distribution with original scale (a) and with logarithmic

scale (b). Areas under the curve, from the median to both sides, correspond to one and

two standard deviation ranges of the normal distribution.
J. C. Kapteyn designed the direct predecessor of the lognormal

machine

(Kapteyn

1903,

Aitchison and Brown 1957).

For that machine, isosceles triangles were used instead of the

skewed shape described here. Because the triangles’ width is

proportional to their horizontal position, this model also

leads to a log-normal distribution. However, the isosceles

triangles with increasingly wide sides to the right of the entry

point

have

a

hidden

logical

disadvantage:

The median of

the particle flow shifts to the left. In contrast, there is no such

shift and the median remains below the entry point of the particles

in the log-normal board presented here (which was

designed by author E. L.). Moreover, the isosceles triangles in
344 BioScience • May 2001 / Vol. 51 No. 5



(a)
µ= 2, = 0.301

density

0 1
(b)

(a)
0 0.01 0.02

density
1.0 1.5 2.0 2.5 3.0

log scale

µ µ+
(b)
the Kapteyn board create additive effects

at each decision point, in contrast to the

multiplicative, log-normal effects apparent

in

Figure

2b.

Consequently,

the log-normal board

presented here is a physical representation

of

the multiplicative central limit

theorem in probability theory.
Basic properties of lognormal

distributions
The basic properties of log-normal distribution

were established long ago

(Weber 1834, Fechner 1860, 1897, Galton

1879,

McAlister 1879, Gibrat 1931,

Gaddum 1945), and it is not difficult to

characterize log-normal distributions
mathematically. A random variable X is said to be log-normally

distributed if log(X) is normally distributed (see the box on



the facing page). Only positive values are possible for the

variable, and the distribution is skewed to the left (Figure 3a).

Two parameters are needed to specify a log-normal distribution.

Traditionally, the mean µ and the standard deviation

(or the variance
2
) of log(X) are used (Figure 3b). However,



there are clear advantages to using "back-transformed"
values (the values are in terms of x, the measured data):



(1) µ: = e

, : = e .
We then use X (µ, ) as a mathematical expression



µ
meaning that X is distributed according to the log-normal law



with median µ and multiplicative standard

deviation

.

The median of this log-normal distribution

is
med(X)



=

µ

=
e
0 50 100 150 200 250
Figure 4. Density functions of selected log-normal distributions compared with a
normal distribution. Log-normal distributions (



µ
, ) shown for five values of




multiplicative standard deviation,
s*
, are compared with the normal distribution

(100 ± 20, shaded). The values cover most of the range evident in Table 2. While

the median µ is the same for all densities, the modes approach zero with increasing

shape parameters . A change in µ affects the scaling in horizontal and vertical

directions, but the essential shape remains the same.
1.2

1.5

2.0

4.0

8.0
, since µ
is the median of log(X). Thus, the probability

that the value of X is greater



than µ is 0.5, as is the probability that

the value is less than µ . The parameter

, which we call multiplicative

standard deviation, determines the

shape of the distribution. Figure 4

shows density curves for some selected

values of . Note that µ is a scale parameter;
hence, if X is expressed in different



units (or multiplied by a constant

for other reasons), then µ

changes accordingly but * remains

the same.

Distributions are commonly char-
µ
acterized by their expected value µ and

standard deviation . In applications for

which the log-normal distribution adequately

describes

the

data,

these parameters

are

usually

less

easy

to

interpret

than the median µ (McAlister

1879) and the shape parameter . It is

worth noting that is related to the

coefficient of variation by a monotonic, increasing transformation

(see

the

box

below,

eq. 2).

For normally distributed data, the interval µ ± covers a

probability of 68.3%, while µ ± 2 covers 95.5% (Table 1).

The corresponding statements for log-normal quantities are

[µ / , µ ] = µ
x
/ (contains 68.3%) and

[µ /( )
2
, µ ( )
2
] = µ
x
/ ( )
2
(contains 95.5%).

This characterization shows that the operations of multi-

plying and dividing, which we denote with the sign

/

(times/divide), help to determine useful intervals for lognormal

distributions (Figure 3), in the same way that the

operations of adding and subtracting (± , or plus/minus) do

for normal distributions. Table 1 summarizes and compares

some properties of normal and log-normal distributions.

The sum of several independent normal variables is itself

a normal random variable. For quantities with a log-normal

distribution, however, multiplication is the relevant operation

for combining them in most applications; for example, the

product of concentrations determines the speed of a simple
×
chemical reaction. The product of independent log-normal

quantities also follows a log-normal distribution. The median

of this product is the product of the medians of its factors. The

formula for of the product is given in the box below

(eq. 3).

For a log-normal distribution, the most precise (i.e.,

asymptotically most efficient) method for estimating the parameters

µ*

and

* relies on log transformation. The mean

and empirical standard deviation of the logarithms of the data

are calculated and then back-transformed, as in equation 1.
These estimators are called ¯x * and s*, where ¯x * is the



geometric mean of the data (McAlister 1879; eq. 4 in the box

below). More robust but less efficient estimates can be obtained

from the median and the quartiles of the data, as described

in the box below.

As noted previously, it is not uncommon for data with a

log-normal distribution to be characterized in the literature
by the arithmetic mean ¯x and the standard deviation s of a



sample, but it is still possible to obtain estimates for µ* and
Definition and properties of the log-normal distribution
A random variable X is log-normally distributed if log(X) has a normal distribution. Usually, natural logarithms are used, but other bases would lead to the



same family of distributions, with rescaled parameters. The probability density function of such a random variable has the form

A shift parameter can be included to define a three-parameter family. This may be adequate if the data cannot be smaller than a certain bound different

from zero (cf. Aitchison and Brown 1957, page 14). The mean and variance are exp(µ + /2) and (exp(
2
) – 1)exp(2µ+

), respectively, and therefore,

the coefficient of variation is
(2)
cv =
t

which is a function in only.

The product of two independent log-normally distributed random variables has the shape parameter
(3)
since the variances at the log-transformed variables add.

Estimation: The asymptotically most efficient (maximum likelihood) estimators are
(4)
-

-
The quartiles q1 and q2 lead to a more robust estimate (q1/q2)
c
for s*, where 1/c = 1.349 = 2 ·



–1
(0.75),

denoting the inverse
standard normal distribution function. If the mean ¯x and the standard deviation s of a sample are available, i.e. the data is summarized



in

the

form
¯x

± s, the parameters µ* and s* can be estimated from them by using

respectively, with , cv = coefficient of variation. Thus, this estimate of s* is determined only by the cv (eq. 2).


-
.

and
–1
,
2
Articles
May 2001 / Vol. 51 No. 5 • BioScience 345






Articles
Table 1. A bridge between normal and log-normal distributions.
* (see the box on page 345). For example, Stehmann and

De Waard (1996) describe their data as log-normal, with the
arithmetic mean ¯x and standard deviation s as 4.1 ± 3.7.



Taking the log-normal nature of the distribution into account,
the probability of the corresponding ¯x ± s interval



(0.4 to 7.8) turns out to be 88.4% instead of 68.3%. Moreover,

65% of the population are below the mean and almost

exclusively within only one standard deviation. In contrast,

the proposed characterization, which uses the geometric
mean ¯x * and the multiplicative standard deviation s*, reads



3.0
x
Normal distribution Log-normal distribution

(Gaussian, or additive (Multiplicative

Property normal, distribution) normal distribution)
Effects (central limit theorem) Additive Multiplicative

Shape of distribution Symmetrical Skewed

Models

Triangle shape Isosceles Scalene

Effects at each decision point x x ± c x

Characterization
Mean ¯x , Arithmetic ¯¯x *, Geometric

Standard deviation s, Additive s*, Multiplicative

Measure of dispersion cv = s/¯x s*



Interval of confidence
68.3% ¯x ± s ¯x *



/ s*
95.5% ¯x ± 2s ¯x *

99.7% ¯x ± 3s ¯x *

Notes: cv = coefficient of variation;



x
/ 2.2 (1.36 to 6.6). This interval covers approximately

68% of the data and thus is more appropriate than the

other interval for the skewed data.
Comparing log-normal distributions

across the sciences
Examples of log-normal distributions from various branches

of

science reveal interesting patterns (Table 2). In general,

values of s* vary between 1.1 and 33, with most in the

range of approximately 1.4 to 3. The shapes of such distributions

are apparent by comparison with selected instances

shown in Figure 4.
Geology and mining.
In the Earth’s crust, the concentration

of

elements and their radioactivity usually follow a lognormal

distribution.

In geology, values of s* in 27 examples

varied from 1.17 to 5.6 (Razumovsky 1940, Ahrens 1954,

Malanca et al. 1996); nine other examples are given in Table

2. A closer look at extensive data from different reefs (Krige

1966) indicates that values of s* for gold and uranium increase

in concert with the size of the region considered.
Human medicine.
A variety of examples from medicine

fit the log-normal distribution. Latent periods (time from infection

to

first

symptoms)

of infectious diseases have often
346 BioScience • May 2001 / Vol. 51 No. 5



x
/ c
/ = times/divide, corresponding to plus/minus for the

established sign ±.
x

x

x
/ (s*)

/ (s*)
2

3
been shown to be log-normally distributed

(Sartwell 1950, 1952, 1966,

Kondo 1977); approximately 70% of

86 examples reviewed by Kondo (1977)

appear to be log-normal. Sartwell

(1950, 1952, 1966) documents 37 cases

fitting the log-normal distribution. A

particularly impressive one is that of

5914 soldiers inoculated on the same

day with the same batch of faulty vaccine,

1005 of whom developed serum

hepatitis.

Interestingly, despite considerable
differences in the median ¯x * of latency



periods

of

various diseases (ranging

from

2.3

hours

to

several

months;

Table

2),

the majority of s* values were

close to 1.5. It might be worth trying to

account for the similarities and dissimilarities

in

s*.

For instance, the small

s* value of 1.24 in the example of the

Scottish soldiers may be due to limited variability within this

rather homogeneous group of people. Survival time after diagnosis

of

four types of cancer is, compared with latent periods

of

infectious diseases, much more variable, with s* values

between

2.5

and

3.2

(Boag

1949,

Feinleib and McMahon
1960). It would be interesting to see whether ¯x * and s* values



have

changed

in

accord

with

the

changes

in

diagnosis

and

treatment

of

cancer in the last half century. The age of onset

of Alzheimer’s disease can be characterized with the geometric

mean
¯x
* of 60 years and s* of 1.16 (Horner 1987).
Environment.
The distribution of particles, chemicals,

and organisms in the environment is often log-normal. For

example, the amounts of rain falling from seeded and unseeded

clouds differed significantly (Biondini 1976), and

again s* values were similar (seeding itself accounts for the

greater variation with seeded clouds). The parameters for

the content of hydroxymethylfurfurol in honey (see Figure 1b)

show that the distribution of the chemical in 1573 samples can

be described adequately with just the two values. Ott (1978)

presented data on the Pollutant Standard Index, a measure of

air quality. Data were collected for eight US cities; the extremes
of ¯x * and s* were found in Los Angeles, Houston, and Seattle,



allowing interesting comparisons.
Atmospheric sciences and aerobiology.
Another component

of

air quality is its content of microorganisms, which

was—not surprisingly—much higher and less variable in

the air of Marseille than in that of an island (Di Giorgio et al.

1996). The atm

No comments:

Post a Comment