Agent Based Models and RNetLogo
(This article was first published on Revolutions, and kindly contributed to R-bloggers)
by Joseph Rickert
Snippet of NetLogo Code that drives the Fire model
The general idea is that turtles represent the frontier of the fire run through a grid of randomly placed trees. Not shown in the above snippet is the logic that shows that the entire model is controlled by a single parameter representing the density of the trees.
This next bit of R code shows how to launch the Fire model from R, set the density parameter, and run the model.
Here we see the Fire model running in the NetLogo GUI after it was launched from RStudio.
This next bit of code tracks the progression of the fire as a function of time (model "ticks"), returns results to R and plots them. The plot shows the non-linear behavior of the system.
As with many dynamical systems, the Fire model displays a phase transition. Setting the density lower than 55 will not result in the complete destruction of the forest, while setting density above 75 will very likely result in complete destruction. The following plot shows this behavior.
RNetLogo makes it very easy to programatically run multiple simulations and capture the results for analysis in R. The following two lines of code runs the Fire model twenty times for each value of density between 55 and 65, the region surrounding the pahse transition.
The plot below shows the variability of the percent of trees burned as a function of density in the transition region.
My code to generate plots is available in the file: Download NelLogo_blog while all of the code from Thiele's JSS paper is available from the journal website.
Finally, here are a few more interesting links related to ABMs.
Middle-East Journal of Scientific Research 9 (3): 431-436, 201 1
ISSN 1990-9233
© IDO SI Publications, 2011
Estimation of Markov Chains Transition Probabilities
Using Conjoint Analysis (Expert Preference)
N. Akhondi, G. Yari, E. Pasha and R. F amoosh
Department of Statistics, Science and Research Branch, Islamic Azad University, Teharan, Iran
Abstract: This paper proposes methodology to estimate transition probabilities on the base of judgments
by experts that may be useful in situations of data absence. The Fractional Factorial Design (FFD) is used to
cope with the curse of dimensionality. By means of Conjoint Analysis (CA) approach we finally reconstruct
the complete Markov Chain transition probabilities. The experiment results show it is promising for us to use
(CA) in estimating of the entropy rate of Markov Chains with a finite state space.
Key words:Markov Chain - Transition probabilities -
Design in Conjoint Analysis
Conjoint Analysis - Design of experiments - Efficient
INT RODUC TION
The present paper proposes a framework based on
expert opinion elicitation, developed to estimate the
transition probability matrix of an irreducible, discrete
time, homogeneous Markov Chain with a finite state
space. In this article we address the question of
estimating the transition probability matrix of Markov
Chain in situations of data absence. In general, the full
probability distribution for a given stochastic problem is
unknown. When data are available, the most objective
estimation of them is the maximurnlikelihood estimation of
the transition probabilities (P11).
The difficulties grows when the aim is providing
scenarios analysis involving future states perhaps never
performed before. In this situation we need information
gathered from experts and we cannot resort to past data
[1]. Our methodology has the new idea of estimating
transition probabilities using conjoint (FFD) methods that
is useful in this conditions.
Conjoint analysis has as its roots the need to solve
important academic and industry problems [2]. It is a
popular marketing research technique. In order to respond
to consumers” needs, makers have to research consumers”
preferences of products, services and their selection
criteria of products. The conjoint analysis measures the
degree of importance which is given to particular aspects
of a product or service [3]. The real genius is making
appropriate tradeoffs so that real consumers in real market
research settings are answering questions from which
useful information can be inferred. In the thirty years
since the original conjoint analysis articles, researchers in
marketing and other disciplines, have explored these
tradeoffs [2]. In conjoint experiments, each respondent
receives a set of profiles to rate (or rank). Designing these
experiments involves determining how many and
which profiles each respondent has to rate (or rank) and
how many respondents are needed [4]. Experimental
design is a fundamental component of (CA).
The complexity of experimental design arises from
the exponential growth inthe number of attributes, i.e.
the curse of dimensionality. Use of a full factorial
design (all profiles) will place an excessive burden
on respondent for providing evaluations. Therefore,
utilize (FFD), i.e.
orthogonal design, or a subset of all profiles [5].
The basic
partworths that best explain the overall preference
judgments made by respondents [2]. (CA) is a
technique based on a main effects analysis-of-variance
researchers fractional balanced
conjoint problem is to estimate the
model that decomposes the judgment data into
components, based on qualitative attribute of the
products or services [6]. Most commonly used methods
to acquire partworths are the Linear Programming
Technique for Multidimensional Analysis of Preference
(LINMAP), Hierarchical Bayes (HB) methods, Multivariate
Analysis of Variance (MANOVA) and Ordinary Least
Squares (OL S) Regression [7].
Corresponding Author: N. Akhondi, Department of Statistics, Science and Research Branch,
Islamic Azad University, Teharan, Iran. E-mial: akhondinasrin@gmail.com.
Middle-Basil Sci. Res, 9 (3): 431-436, 2011
Methods have been developed to take conjoint data
to approach to optimal or near—optimal products and
systems designs in tourism, entertainment, health
maintenance, gambling and etc. We introduce a new
application of (CA) in this article.
Leone and Fucili [1] used (CA) to estimate Markov
Chains transition probabilitiesThey built Fractional
Factorial Designs (FFD) on the starting states and in their
method the experts are asked to identify the presumably
destination states and quantify the probability of
occurrence of the transitions towards each destination
proposed scenarios(for each state included in a (FFD)).
In their method psychological critics may be raised
because of the respondents are not asked according
to the
(FFD) treatments by comparing each one of them.
The difficulties may grow if the number of attributes
well procedure of rating or ranking the
grows. We overcome the difficulties in this paper by
building 2 (FFDs) on the starting and destination states
and we ask experts to give ratings on the likelihood of
various states occurring in the future. we need two sets of
states to get transition probabilities. The Fractional
Factorial Design (FFD) can tackle the large number of
states in an elegant way. We used (CA) approach and
Logistic Regression to construct the complete Markov
Chain transition probabilities (under the assumptions of
independence of the attributes at an individual level for
the respondents). The conjoint methods used for ratings
data are now essentially dummy variable regression
methods.
The remainder of this paper is organized as follows:
First, we provide a review of (CA) and references to
related substantial theoretical and empirical work, then we
discuss our methodology and simulation data. Finally we
conclude the most important strengths and weaknesses
of the proposed methodology. We used Minitab (ver:15)
for generating (Pu), SPSS (ver: 1 6) for generating 2 (FFDs)
and Multivariate Analysis of Variance (MANOVA) for
estimating parameters.
Conjoint Analysis: The essence of conjoint analysis is to
identify and measure a mapping from more detailed
descriptors of a product or service onto a overall measure
of the customer’s evaluation of that product. Full-profile
analysis remains the most common form of conjoint
analysis and has the advantage that the respondent
evaluates each profile holistically and in the context of all
others profile. Its weakness is that the respondent’s
burden grows dramatically with the number of profiles
432
that must be ranked or rated. The respondent can be
asked to rank order all stimuli (profiles) or to provide a
metric rating of each stimulus [2]. When appropriate,
efficient experimental designs, (FFD) are used so that the
respondent need consider only a small fraction of all
possible product profiles ]8.[ If the number of attributes is
large often respondents can evaluate partial profiles (PP)
in which some of the features are explicit and the other
features are assumed constant [2]. (FFD) orthogonal
arrays are categorized by their resolution. For example,
resolution III designs enable the estimation of all main
effects free of each other, but some of them are
confounded with two-factor interactions. Higher
resolution designs require larger number of profiles.
Resolution 111 designs are most frequently used in
marketing conjoint studies. Orthogonal arrays can be
either balanced or unbalanced in terms of levels of
attributes. An unbalanced design gives larger standard
errors the parameter estimates for those attributes that are
less frequently administered. The minimum standard error
is attained when a full factorial design is used [5]. Various
measures for discussing the efficiency of an experimental
design can be described as follows for the linear model
(Kuhfeld, Tobias and Garratt 1994),
Y = XB + e (1)
Where [5 is a pxl vector of parameters, X is an nxp design
matrix, and e is random error. With the usual assumption
on errors, the least squares estimate of B is given by
(X'XY1 X'Y. The variance-covariance matrix of the
partworth (parameter) estimates of the attributes is
proportional to (X'XY‘. The efficiency of a design is
based on the information matrix. An efficient design will
have a smaller variance matrix.Two famous efficiency
measures (all based on the eigenvalues of (X'XY‘) are:
Orthogonal designs for linear models are generally
considered to be efficient because their efficiency
measure is close to l [5].
Ieng-Iong Lin [7] successfully presented an
integrated product design model to be applied in clothing
product design. His methodology focused not only on
either expertise of designers or demands of consumer
but on both of them. He used relationship matrix to
combine both the (CA) data from the two individual
Middle-Basil Sci. Res., 9 (3): 431-436, 201]
groups (i.e., designer and consumer) to design
product. Van Houtven er al. [9] applied (CA) to
health-related benefit-risk tradeoffs
non-expected-utility framework. They demonstrate how
this
nonlinear weighting of adverse-event probabilities.
Jeremy J. Michalek et a1. [10] presented a unified
methodology for
estimate in a
method can be used to test for and estimate
product line optimization that
coordinates positioning and design models to achieve
realizable firm—level optima. This method is demonstrated
for a line of dial-readout scales, using physical models
and conjoint—based consumer choice data. Hiromi Yamada
et a1. [3] administrated the study to estimate the structure
of the variable to specify the quality requirement of the
new product using the conjoint analysis and the entropy
model. As a result, it was understood that the conjoint
analysis and the entropy model are effective methods to
estimate the quality requirement. Lekemoto and Yomaoka
[l l] proposesd a method of analysis by using (CA) that
makes it possible to use a lower number of profile cards
than that provided by
experiment even when a large number of items is being
the orthogonal design of
surveyed. An Internet survey of 1,600 consumers using
this method indicated that it generated identical analytical
results to those produced when the orthogonal design of
experiment was used. Byungun Yoon and Yongtae Park
[12] applied a new hybrid approach that enhances the
performance of morphology analysis (MA) by combining
it with conjoint analysis (CA) and citation analysis of
patent information. Alternatives for new technology
development from among the emerging technologies are
presented by combining the valuable levels of each
attribute in a morphology matrix predefined by domain
experts. The technological competitiveness of a company
can be analyzed by a newly suggested index, “technology
share,” which is analogous to the concept of market share
in traditional CA.
Proposed Methodology: Expert opinion is one of the key
research areas in Probabilistic risk analysis (PRA)
in engineering, public health, environment, program
management, regulatory policy, finance and etc. The use
of expert judgment is critical, and often inevitable, when
there are no empirical data or information available on the
variables of interest [13]. We illustrat our motivation for
resorting to experts in this section.
Assume a dynamic system with components] a set
of states S, a set of actions A, a reward function.
433
S x A -' R, a transition probability matrix that the full
probability distribution for this stochastic problem is
unknown. We restrict attention to time separable
Markovian decision problems for dealing with the curse
of dimensionality in dynamic systems [1 4]. An irreducible
discrete time Markov chain with a finite state space has
been studied previously by Papangelou [15], who
establishes a Large Deviation Principle (LDP) for Markov
chains whose order is unknown. Baris Tan and Kamil
Yilmaz [16] presented a complete analytical framework for
the testing procedure based on statistical theory of
Markov chains. They studied the time dependence and
time homogeneity properties of the Markov chains.
We
homogeneous Markov Chain with a finite state space that
assume an irreducible, discrete time
the starting and the destination states of the system are
defined by combinations of “n” key attributes, each with
Lk (K=1,...,n)
discretizing). For example, if n:3 the starting state i and
the destination state j are given by In, In, L3, ljl, Iii, ljs,
levels (continuous variables are
respectively. The total number of states is #1121 LK
and the number of transition probabilities
Pij=P(lJ-1, Iii, lj3 | In, In, IB) is equal to 12. Under the
assumptions of independence of the attributes at an
individual level for the respondents, Pij is given by
P-- (2)
lf transition probability matrix is unknown and
data are available we use, p (the maximurnlikelihood
estimator of
P119111 = Nij " Ni (3)
Where Niis the number of times that the starting state
Hi
N.
11'
observed to go from the starting state
” has occurred when the process has been observed,
is the number of times that the process has been
to the
1
destination state " " in one step.
When data are not available we need information
gathered from experts.
Our initial purpose of estimating P1] may be seen as
the purpose of estimating P(ljK | In, In, ...,lin).
We build 2 (FFD)s on the starting I :HE:1LK and
the destination states.
start from I instead of 12. In our method the judges
By using this solution we can
are asked to assign transition probabilities to the
Middle-Basil Sci. Res., 9 (3): 431-436, 2011
destination states in the (FFD’s) for each state included
in the (FFD) of starting states. The experts will be asked
about a reduced number of starting states (the (FFD)
ones) and under the assumptions of independence of the
attributes atanindividual level for the respondents the
results will be generalized to the others non included in
the (F FD) by means of Logistic Regression (LR). The (LR)
the between the
independent variables and the log-odds of the outcome
model examines relationships
variable;
(odds: p ). (4)
1'P
The model on log—odds (Logit) scale is linear
(L0g1t=10g(i)=xn +e) (5)
1-11
As above mentioned we want to estimate
P(l]§ | 111, L2, ...,lm) by means of logistic regression.
The estimated parameters are used to reconstruct the
probability of arriving in ljK also for starting states not
included in the (FFD). Finally the probabilities of
destination states are given by:
Application on Simulation Data: We propose an
simulation data to illustrate
methodology. We suppose n:4 (The number
our
of
attributes) each with 3 levels and the number of experts is
application on
8 in 2 groups (in each group 4 experts).We used SPSS
(verzlo) for generating 2 resolution 111 designs (FFDs)
for the starting and the destination states and Minitab
(ver:15) for generating P,J_ All experts received the same
(FPDs). We
P for each of respondents as mentioned below:
simulated transition probability matrix
In each of starting states in (FFD) we generated 4
independent bivariate normally distributed vectors:
zl,zz,zs,z4_ from N (p, Z), 11=(0,0), Zl=[l,-.9; -.9,l]
Where:
zj=(log (Ljk/ 1— Ljk)) j=1,2,3,4, k=1,2, (6)
Ljk= P013 | 111,112, li39li4) (7)-
We computed Ljk based on zj. Under the
assumptions of independence of the attributes at
434
an individual level for the respondents we computed
n:4. We repeated this
Pi- = nzq P(ljK |1i1,1i2,...,1in),
computation for 500 times and finally mean of them
recorded as Pij :P(lj1, [1-2, [1-1 11-4 | In, In, 1,-3‘ 1,4). Then we
updated Ljk based on Pm In the Table l we showed the
partially of these computations for L12 for the first
respondent. Finally we concerned eq. (5) to zj‘ j=1,2,3,4_.
The Multivariate Analysis of Variance (MANOVA) is
used for estimating of the parameters [17]. One of the
levels for each factor, regarded as dummy variables, is
eliminated. The estimated parameters are used to
reconstruct the probability of arriving in 11K also for
starting states not included in the
The Generalized Least Squares (GLS) Regression
estimates of the model:
The (LR) coefficients (significant at.05 level) for the
first 4 respondents are given in the following:
10g(L,, / 1- L12 ) =.465 1,1 +518 1,,+ e
(.183) (.183), (9)
10g(L,, / 1- L22 ) =.49';1,2 + e
(.161), (10)
10g(L11 / 1- Ln) = -1.244-.409 1,, + e
(.169), (11)
10g(L,, / 1- L4,) = -.908-.314l,1 +e
(.139), (12)
10g(L,1 / 1- L31) =.273 1,2 + e
(.132), (13)
Where the estimated coefficient standard errors are in
parentheses.
Also
log(Lnl 1- L21)= -1.207 (1 4)
log(L4l I 1' L41): 0 (15)
10g(L32/ 1- L32) = 4.208 (1 6)
The (LR) coefficients (significant at.05 level) for the
seconds 4 respondents are given in the following:
Middle-Basil Sci. Res, 9 (3): 431-436, 2011
Table 1: The FFDs(for starting and destination states) and disclosed transition probabilities(P,J) for the first respondent
Destination
Starting 3231 3312 2132 2321 2213 1333 1111 3123 1222 L1; Odds I L12/(1-L12) Log(Odds)
2231 .20 .12 .11 .08 .11 .21 .06 .01 .10 .31 .44 -.82
3213 .06 .12 .08 .11 .19 .20 .15 .08 .02 .37 .60 -.51
3132 .11 .18 .03 .22 .18 .19 .04 .01 .05 .43 .75 -.28
1111 .15 .14 .10 .16 .04 .14 .10 .10 .07 .30 .42 -.86
1333 .05 .07 .18 .15 .15 .16 .09 .07 .09 .48 .91 -.10
3321 .11 .15 .12 .15 .09 .18 .06 .10 .03 .36 .57 -.56
2123 .06 .14 .14 .14 .15 .00 .10 .13 .14 .42 .74 -.30
2312 .08 .18 .14 .13 .03 .05 .17 .14 .07 .30 .43 -.84
1222 .13 .17 .16 .11 .16 .06 .09 .01 .10 .43 .76 -.27
Note:2231 in FFD for starting states means that i: 112, In, 133, I41 and 3231 in FFD for destination states means that j: 113, In, 133, 141,
P,]=0.2O ( obtained from simulation),L12: P(2131 l 2231) + P(2321 | 2231) + P(2213 l 2231) etc
If I had to pick just one application to be the “killer app” for the digital computer I would probably choose Agent Based Modeling (ABM). Imagine creating a world populated with hundreds, or even thousands of agents, interacting with each other and with the environment according to their own simple rules. What kinds of patterns and behaviors would emerge if you just let the simulation run? Could you guess a set of rules that would mimic some part of the real world? This dream is probably much older than the digital computer, but according to Jan Thiele’s brief account of the history of ABMs that begins his recent paper, R Marries NetLogo: Introduction to the RNetLogo Package in the Journal of Statistical Software, academic work with ABMs didn’t really take off until the late 1990s.
Now, people are using ABMs for serious studies in economics, sociology, ecology, socio-psychology, anthropology, marketing and many other fields. No less of a complexity scientist than Doyne Farmer (of Dynamic Systems andPrediction Company fame) has argued in Nature for using ABMs to model the complexity of the US economy, and has published on using ABMs to driveinvestment models. in the following clip of a 2006 interview, Doyne talks about building ABMs to explain the role of subprime mortgages on the Housing Crisis. (Note that when asked about how one would calibrate such a model Doyne explains the need to collect massive amounts of data on individuals.)
Fortunately, the tools for building ABMs seem to be keeping pace with the ambition of the modelers. There are now dozens of platforms for building ABMs, and it is somewhat surprising that NetLogo, a tool with some whimsical terminology (e.g. agents are called turtles) that was designed for teaching children, has apparently become a defacto standard. NetLogo is Java based, has an intuitive GUI, ships with dozens of useful sample models, is easy to program, and is available under the GPL 2 license.
As you might expect, R is a perfect complement for NetLogo. Doing serious simulation work requires a considerable amount of statistics for calibrating models, designing experiments, performing sensitivity analyses, reducing data, exploring the results of simulation runs and much more. The recent JASS paperFacilitating Parameter Estimation and Sensitivity Analysis of Agent-Based Models: a Cookbook Using NetLogo and R by Thiele and his collaborators describe the R / NetLogo relationship in great detail and points to a decade’s worth of reading. But the real fun is that Thiele’s RNetLogo package lets you jump in and start analyzing NetLogo models in a matter of minutes.
Here is part of an extended example from Thiele's JSS paper that shows R interacting with the Fire model that ships with NetLogo. Using some very simple logic, Fire models the progress of a forest fire.Snippet of NetLogo Code that drives the Fire model
to go if not any? turtles ;; either fires or embers [ stop ] ask fires [ ask neighbors4 with [pcolor = green] [ ignite ] set breed embers ] fade-embers tick end ;; creates the fire turtles to ignite ;; patch procedure sprout-fires 1 [ set color red ] set pcolor black set burned-trees burned-trees + 1 end
This next bit of R code shows how to launch the Fire model from R, set the density parameter, and run the model.
# Launch RNetLogo and control an initial run of the # NetLogo Fire Model library(RNetLogo) nlDir <- "C:/Program Files (x86)/NetLogo 5.0.5" setwd(nlDir) nl.path <- getwd() NLStart(nl.path) model.path <- file.path("models", "Sample Models", "Earth Science","Fire.nlogo") NLLoadModel(file.path(nl.path, model.path)) NLCommand("set density 70") # set density value NLCommand("setup") # call the setup routine NLCommand("go") # launch the model from R
This next bit of code tracks the progression of the fire as a function of time (model "ticks"), returns results to R and plots them. The plot shows the non-linear behavior of the system.
# Investigate percentage of forest burned as simulation proceeds and plot library(ggplot2) NLCommand("set density 60") NLCommand("setup") burned <- NLDoReportWhile("any? turtles", "go", c("ticks", "(burned-trees / initial-trees) * 100"), as.data.frame = TRUE, df.col.names = c("tick", "percent.burned")) # Plot with ggplot2 p <- ggplot(burned,aes(x=tick,y=percent.burned)) p + geom_line() + ggtitle("Non-linear forest fire progression with density = 60")
As with many dynamical systems, the Fire model displays a phase transition. Setting the density lower than 55 will not result in the complete destruction of the forest, while setting density above 75 will very likely result in complete destruction. The following plot shows this behavior.
RNetLogo makes it very easy to programatically run multiple simulations and capture the results for analysis in R. The following two lines of code runs the Fire model twenty times for each value of density between 55 and 65, the region surrounding the pahse transition.
d <- seq(55, 65, 1) # vector of densities to examine res <- rep.sim(d, 20) # Run the simulation
My code to generate plots is available in the file: Download NelLogo_blog while all of the code from Thiele's JSS paper is available from the journal website.
Finally, here are a few more interesting links related to ABMs.
- On validating ABMs
- ABMs and
matically generates html versions of documents as we crawl the web.
No comments:
Post a Comment