Wednesday, April 13, 2016

Estimation of Markov Chains Transition Probabilities Using Conjoint Analysis Expert Preference

1.
Choosing Between Multinomial Logit and Multinomial Probit Models for Analysis of Unordered Choice Data Jonathan Kropko A Thesis submitted to the faculty of the The University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master of Arts in the Department of Political Science. Chapel Hill 2008 Approved by: George Rabinowitz, Advisor Georg Vanberg, Member John Aldrich, Member Abstract Choosing Between Multinomial Logit and Multinomial Probit Models for Analysis of Unordered Choice Data Jonathan Kropko (Under the direction of George Rabinowitz.)

2.

This is the html version of the file http://www.idosi.org/mejsr/mejsr9(3)11/22.pdf.
Google automatically generates html versions of documents as we crawl the web.
Page 1
Middle-East Journal of Scientific Research 9 (3): 431-436, 201 1 ISSN 1990-9233 © IDO SI Publications, 2011 Estimation of Markov Chains Transition Probabilities Using Conjoint Analysis (Expert Preference) N. Akhondi, G. Yari, E. Pasha and R. F amoosh Department of Statistics, Science and Research Branch, Islamic Azad University, Teharan, Iran Abstract: This paper proposes methodology to estimate transition probabilities on the base of judgments by experts that may be useful in situations of data absence. The Fractional Factorial Design (FFD) is used to cope with the curse of dimensionality. By means of Conjoint Analysis (CA) approach we finally reconstruct the complete Markov Chain transition probabilities. The experiment results show it is promising for us to use (CA) in estimating of the entropy rate of Markov Chains with a finite state space. Key words:Markov Chain - Transition probabilities - Design in Conjoint Analysis Conjoint Analysis - Design of experiments - Efficient INT RODUC TION The present paper proposes a framework based on expert opinion elicitation, developed to estimate the transition probability matrix of an irreducible, discrete time, homogeneous Markov Chain with a finite state space. In this article we address the question of estimating the transition probability matrix of Markov Chain in situations of data absence. In general, the full probability distribution for a given stochastic problem is unknown. When data are available, the most objective estimation of them is the maximurnlikelihood estimation of the transition probabilities (P11). The difficulties grows when the aim is providing scenarios analysis involving future states perhaps never performed before. In this situation we need information gathered from experts and we cannot resort to past data [1]. Our methodology has the new idea of estimating transition probabilities using conjoint (FFD) methods that is useful in this conditions. Conjoint analysis has as its roots the need to solve important academic and industry problems [2]. It is a popular marketing research technique. In order to respond to consumers” needs, makers have to research consumers” preferences of products, services and their selection criteria of products. The conjoint analysis measures the degree of importance which is given to particular aspects of a product or service [3]. The real genius is making appropriate tradeoffs so that real consumers in real market research settings are answering questions from which useful information can be inferred. In the thirty years since the original conjoint analysis articles, researchers in marketing and other disciplines, have explored these tradeoffs [2]. In conjoint experiments, each respondent receives a set of profiles to rate (or rank). Designing these experiments involves determining how many and which profiles each respondent has to rate (or rank) and how many respondents are needed [4]. Experimental design is a fundamental component of (CA). The complexity of experimental design arises from the exponential growth inthe number of attributes, i.e. the curse of dimensionality. Use of a full factorial design (all profiles) will place an excessive burden on respondent for providing evaluations. Therefore, utilize (FFD), i.e. orthogonal design, or a subset of all profiles [5]. The basic partworths that best explain the overall preference judgments made by respondents [2]. (CA) is a technique based on a main effects analysis-of-variance researchers fractional balanced conjoint problem is to estimate the model that decomposes the judgment data into components, based on qualitative attribute of the products or services [6]. Most commonly used methods to acquire partworths are the Linear Programming Technique for Multidimensional Analysis of Preference (LINMAP), Hierarchical Bayes (HB) methods, Multivariate Analysis of Variance (MANOVA) and Ordinary Least Squares (OL S) Regression [7]. Corresponding Author: N. Akhondi, Department of Statistics, Science and Research Branch, Islamic Azad University, Teharan, Iran. E-mial: akhondinasrin@gmail.com.
Page 2
Middle-Basil Sci. Res, 9 (3): 431-436, 2011 Methods have been developed to take conjoint data to approach to optimal or near—optimal products and systems designs in tourism, entertainment, health maintenance, gambling and etc. We introduce a new application of (CA) in this article. Leone and Fucili [1] used (CA) to estimate Markov Chains transition probabilitiesThey built Fractional Factorial Designs (FFD) on the starting states and in their method the experts are asked to identify the presumably destination states and quantify the probability of occurrence of the transitions towards each destination proposed scenarios(for each state included in a (FFD)). In their method psychological critics may be raised because of the respondents are not asked according to the (FFD) treatments by comparing each one of them. The difficulties may grow if the number of attributes well procedure of rating or ranking the grows. We overcome the difficulties in this paper by building 2 (FFDs) on the starting and destination states and we ask experts to give ratings on the likelihood of various states occurring in the future. we need two sets of states to get transition probabilities. The Fractional Factorial Design (FFD) can tackle the large number of states in an elegant way. We used (CA) approach and Logistic Regression to construct the complete Markov Chain transition probabilities (under the assumptions of independence of the attributes at an individual level for the respondents). The conjoint methods used for ratings data are now essentially dummy variable regression methods. The remainder of this paper is organized as follows: First, we provide a review of (CA) and references to related substantial theoretical and empirical work, then we discuss our methodology and simulation data. Finally we conclude the most important strengths and weaknesses of the proposed methodology. We used Minitab (ver:15) for generating (Pu), SPSS (ver: 1 6) for generating 2 (FFDs) and Multivariate Analysis of Variance (MANOVA) for estimating parameters. Conjoint Analysis: The essence of conjoint analysis is to identify and measure a mapping from more detailed descriptors of a product or service onto a overall measure of the customer’s evaluation of that product. Full-profile analysis remains the most common form of conjoint analysis and has the advantage that the respondent evaluates each profile holistically and in the context of all others profile. Its weakness is that the respondent’s burden grows dramatically with the number of profiles 432 that must be ranked or rated. The respondent can be asked to rank order all stimuli (profiles) or to provide a metric rating of each stimulus [2]. When appropriate, efficient experimental designs, (FFD) are used so that the respondent need consider only a small fraction of all possible product profiles ]8.[ If the number of attributes is large often respondents can evaluate partial profiles (PP) in which some of the features are explicit and the other features are assumed constant [2]. (FFD) orthogonal arrays are categorized by their resolution. For example, resolution III designs enable the estimation of all main effects free of each other, but some of them are confounded with two-factor interactions. Higher resolution designs require larger number of profiles. Resolution 111 designs are most frequently used in marketing conjoint studies. Orthogonal arrays can be either balanced or unbalanced in terms of levels of attributes. An unbalanced design gives larger standard errors the parameter estimates for those attributes that are less frequently administered. The minimum standard error is attained when a full factorial design is used [5]. Various measures for discussing the efficiency of an experimental design can be described as follows for the linear model (Kuhfeld, Tobias and Garratt 1994), Y = XB + e (1) Where [5 is a pxl vector of parameters, X is an nxp design matrix, and e is random error. With the usual assumption on errors, the least squares estimate of B is given by (X'XY1 X'Y. The variance-covariance matrix of the partworth (parameter) estimates of the attributes is proportional to (X'XY‘. The efficiency of a design is based on the information matrix. An efficient design will have a smaller variance matrix.Two famous efficiency measures (all based on the eigenvalues of (X'XY‘) are: Orthogonal designs for linear models are generally considered to be efficient because their efficiency measure is close to l [5]. Ieng-Iong Lin [7] successfully presented an integrated product design model to be applied in clothing product design. His methodology focused not only on either expertise of designers or demands of consumer but on both of them. He used relationship matrix to combine both the (CA) data from the two individual
Page 3
Middle-Basil Sci. Res., 9 (3): 431-436, 201] groups (i.e., designer and consumer) to design product. Van Houtven er al. [9] applied (CA) to health-related benefit-risk tradeoffs non-expected-utility framework. They demonstrate how this nonlinear weighting of adverse-event probabilities. Jeremy J. Michalek et a1. [10] presented a unified methodology for estimate in a method can be used to test for and estimate product line optimization that coordinates positioning and design models to achieve realizable firm—level optima. This method is demonstrated for a line of dial-readout scales, using physical models and conjoint—based consumer choice data. Hiromi Yamada et a1. [3] administrated the study to estimate the structure of the variable to specify the quality requirement of the new product using the conjoint analysis and the entropy model. As a result, it was understood that the conjoint analysis and the entropy model are effective methods to estimate the quality requirement. Lekemoto and Yomaoka [l l] proposesd a method of analysis by using (CA) that makes it possible to use a lower number of profile cards than that provided by experiment even when a large number of items is being the orthogonal design of surveyed. An Internet survey of 1,600 consumers using this method indicated that it generated identical analytical results to those produced when the orthogonal design of experiment was used. Byungun Yoon and Yongtae Park [12] applied a new hybrid approach that enhances the performance of morphology analysis (MA) by combining it with conjoint analysis (CA) and citation analysis of patent information. Alternatives for new technology development from among the emerging technologies are presented by combining the valuable levels of each attribute in a morphology matrix predefined by domain experts. The technological competitiveness of a company can be analyzed by a newly suggested index, “technology share,” which is analogous to the concept of market share in traditional CA. Proposed Methodology: Expert opinion is one of the key research areas in Probabilistic risk analysis (PRA) in engineering, public health, environment, program management, regulatory policy, finance and etc. The use of expert judgment is critical, and often inevitable, when there are no empirical data or information available on the variables of interest [13]. We illustrat our motivation for resorting to experts in this section. Assume a dynamic system with components] a set of states S, a set of actions A, a reward function. 433 S x A -' R, a transition probability matrix that the full probability distribution for this stochastic problem is unknown. We restrict attention to time separable Markovian decision problems for dealing with the curse of dimensionality in dynamic systems [1 4]. An irreducible discrete time Markov chain with a finite state space has been studied previously by Papangelou [15], who establishes a Large Deviation Principle (LDP) for Markov chains whose order is unknown. Baris Tan and Kamil Yilmaz [16] presented a complete analytical framework for the testing procedure based on statistical theory of Markov chains. They studied the time dependence and time homogeneity properties of the Markov chains. We homogeneous Markov Chain with a finite state space that assume an irreducible, discrete time the starting and the destination states of the system are defined by combinations of “n” key attributes, each with Lk (K=1,...,n) discretizing). For example, if n:3 the starting state i and the destination state j are given by In, In, L3, ljl, Iii, ljs, levels (continuous variables are respectively. The total number of states is #1121 LK and the number of transition probabilities Pij=P(lJ-1, Iii, lj3 | In, In, IB) is equal to 12. Under the assumptions of independence of the attributes at an individual level for the respondents, Pij is given by P-- (2) lf transition probability matrix is unknown and data are available we use, p (the maximurnlikelihood estimator of P119111 = Nij " Ni (3) Where Niis the number of times that the starting state Hi N. 11' observed to go from the starting state ” has occurred when the process has been observed, is the number of times that the process has been to the 1 destination state " " in one step. When data are not available we need information gathered from experts. Our initial purpose of estimating P1] may be seen as the purpose of estimating P(ljK | In, In, ...,lin). We build 2 (FFD)s on the starting I :HE:1LK and the destination states. start from I instead of 12. In our method the judges By using this solution we can are asked to assign transition probabilities to the
Page 4
Middle-Basil Sci. Res., 9 (3): 431-436, 2011 destination states in the (FFD’s) for each state included in the (FFD) of starting states. The experts will be asked about a reduced number of starting states (the (FFD) ones) and under the assumptions of independence of the attributes atanindividual level for the respondents the results will be generalized to the others non included in the (F FD) by means of Logistic Regression (LR). The (LR) the between the independent variables and the log-odds of the outcome model examines relationships variable; (odds: p ). (4) 1'P The model on log—odds (Logit) scale is linear (L0g1t=10g(i)=xn +e) (5) 1-11 As above mentioned we want to estimate P(l]§ | 111, L2, ...,lm) by means of logistic regression. The estimated parameters are used to reconstruct the probability of arriving in ljK also for starting states not included in the (FFD). Finally the probabilities of destination states are given by: Application on Simulation Data: We propose an simulation data to illustrate methodology. We suppose n:4 (The number our of attributes) each with 3 levels and the number of experts is application on 8 in 2 groups (in each group 4 experts).We used SPSS (verzlo) for generating 2 resolution 111 designs (FFDs) for the starting and the destination states and Minitab (ver:15) for generating P,J_ All experts received the same (FPDs). We P for each of respondents as mentioned below: simulated transition probability matrix In each of starting states in (FFD) we generated 4 independent bivariate normally distributed vectors: zl,zz,zs,z4_ from N (p, Z), 11=(0,0), Zl=[l,-.9; -.9,l] Where: zj=(log (Ljk/ 1— Ljk)) j=1,2,3,4, k=1,2, (6) Ljk= P013 | 111,112, li39li4) (7)- We computed Ljk based on zj. Under the assumptions of independence of the attributes at 434 an individual level for the respondents we computed n:4. We repeated this Pi- = nzq P(ljK |1i1,1i2,...,1in), computation for 500 times and finally mean of them recorded as Pij :P(lj1, [1-2, [1-1 11-4 | In, In, 1,-3‘ 1,4). Then we updated Ljk based on Pm In the Table l we showed the partially of these computations for L12 for the first respondent. Finally we concerned eq. (5) to zj‘ j=1,2,3,4_. The Multivariate Analysis of Variance (MANOVA) is used for estimating of the parameters [17]. One of the levels for each factor, regarded as dummy variables, is eliminated. The estimated parameters are used to reconstruct the probability of arriving in 11K also for starting states not included in the The Generalized Least Squares (GLS) Regression estimates of the model: The (LR) coefficients (significant at.05 level) for the first 4 respondents are given in the following: 10g(L,, / 1- L12 ) =.465 1,1 +518 1,,+ e (.183) (.183), (9) 10g(L,, / 1- L22 ) =.49';1,2 + e (.161), (10) 10g(L11 / 1- Ln) = -1.244-.409 1,, + e (.169), (11) 10g(L,, / 1- L4,) = -.908-.314l,1 +e (.139), (12) 10g(L,1 / 1- L31) =.273 1,2 + e (.132), (13) Where the estimated coefficient standard errors are in parentheses. Also log(Lnl 1- L21)= -1.207 (1 4) log(L4l I 1' L41): 0 (15) 10g(L32/ 1- L32) = 4.208 (1 6) The (LR) coefficients (significant at.05 level) for the seconds 4 respondents are given in the following:
Page 5
Middle-Basil Sci. Res, 9 (3): 431-436, 2011 Table 1: The FFDs(for starting and destination states) and disclosed transition probabilities(P,J) for the first respondent Destination Starting 3231 3312 2132 2321 2213 1333 1111 3123 1222 L1; Odds I L12/(1-L12) Log(Odds) 2231 .20 .12 .11 .08 .11 .21 .06 .01 .10 .31 .44 -.82 3213 .06 .12 .08 .11 .19 .20 .15 .08 .02 .37 .60 -.51 3132 .11 .18 .03 .22 .18 .19 .04 .01 .05 .43 .75 -.28 1111 .15 .14 .10 .16 .04 .14 .10 .10 .07 .30 .42 -.86 1333 .05 .07 .18 .15 .15 .16 .09 .07 .09 .48 .91 -.10 3321 .11 .15 .12 .15 .09 .18 .06 .10 .03 .36 .57 -.56 2123 .06 .14 .14 .14 .15 .00 .10 .13 .14 .42 .74 -.30 2312 .08 .18 .14 .13 .03 .05 .17 .14 .07 .30 .43 -.84 1222 .13 .17 .16 .11 .16 .06 .09 .01 .10 .43 .76 -.27 Note:2231 in FFD for starting states means that i: 112, In, 133, I41 and 3231 in FFD for destination states means that j: 113, In, 133, 141, P,]=0.2O ( obtained from simulation),L12: P(2131 l 2231) + P(2321 | 2231) + P(2213 l 2231) etc. log(L21 / 1- L2,) = 4409-5413, + e (17 ) log(L32 / 1- L3,) = -.748 (18) log(L41/1— L41) = -1.138 (19) CONCLUSION Strengths of the Use of expert opinion in situations of data absence; overcome the curse of dimensionality by use of (FFD); reconstruct the transition probabilities Proposed Methodology Are: not included in the orthogonal design definition; use (CA) in estimating of the entropy rate an irreducible, discrete time and homogeneous Markov chain with a finite state space. Weaknesses of the The logistic regression techniques encounter problems when the number of the attributes or the number of level Proposed Methodology Are: for each attribute grow, it makes the problem using conjoint data more difficult especially when higher resolution designs were applied that required larger number of profiles. REFERENCES 1. Leone, D. and M. Fucili, 2008. Estimation of markov chains transition probabilities by means of conjoint analysis approach. working paper. 435 2. 10. 11. I ohn R. Hauser and Vithala R. Rao, 2002. Conjoint Analysis, Related Modeling, Chapter prepared for Advanced in Marketing Research: Progress and Prospects September 23. Hiromi Yamada, et a1. 2004. The Entropy Model Using Attribute of Conjoint Analysis, Proceedings of and Applications, the Fifth Asia Pacific Industrial Engineering and Management Systems Conference. Roselinde, et a1. 2008. Optimal designs for conjoint experiments, Computational Statistics & Data Analysis, 52(5): 2369-2387. Vithala R. Rao, 2007. Developments in Conjoint Analysis.Cornell University Revised. Warren F. Kuhfeld. Conjoint Analysis. Copies of this chapter (TS—722H) and all of the macros are available on the web http://support.sas.com. I eng-I ong Lin, 2008. An Optimal Design Search with Conjoint Analysis Using Genetic Algorithm. Tamkang I. Science and Engineering, 11(1): 73-84. Warren F. Kuhfeld, Randall D. Tobias and Mark Garratt, 1994. Efficient Experimental Design with Marketing Research Applications. I. Marketing Res., pp: 545-557. George Van Houtven, er a1. 2011. Eliciting Benefit—Risk Preferences and Probability—Weighted Utility Using Choice-Format Conjoint Analysis, Med Decis Making, 31 (3): 469-480. I eremy I. Michalek, et a1. 2010. Enhancing Marketing with Engineering: Optimal Product Line Design for Heterogeneous Markets. International I. Research in Marketing. Hiroyuki lkemoto and Toshiki Yamaoka, 2011. Conjoint Analysis Method That Minimizes the Number of Profile Cards, 173(1): 23-28. Forthcoming in
Page 6
12. Byungun Yoon 13. 14. Middle-Basil Sci. Res, 9 (3): 431-436, 2011 and Yongtae Park, 2007. Development of New Technology Forecasting Algorithm: Hybrid Approach for Morphology Analysis Analysis of Patent Information, leee Transactions on Engineering Management 54(3). Fumika Ouchi, 2004. A Literature Review on the Use of Expert opinion in Probabilistic Risk Analysis. World Bank Policy Research Working Paper, pp: 3201. John Rust, 2006. Dynamic Programmingentry for consideration by the New algrave Dictionary of Economics. University of Marylan


In my M.A. thesis I use one binary dependent variable and one unordered categorical dependent variable in separate estimations. I am therefore trying to get an oversight over the debate of whether to use linear or logistic regression when handling discrete dependent variables.
The debate of whether linear regression (the Linear Probability Model) is suitable to estimate a binary outcome seems to be still ongoing. Some authors argue that the LPM is inferior to logistic regression, while others argue that it functions just as well. I therefore report results using both these models when handling the binary dependent variable.
I am however struggling to find any similar debate with regards to unordered categorical variables with more than two choices. The sources Ive read simply recommend using a multinomial logistical model while referring to the fact that discrete dependent variables violate the normality and homoskedasticity assumptions. This violation is however also done by the LPM model.
Is there no viable alternatives using linear regression when estimating unordered categorical variables?



up vote32down votefavorite
27
I have a dataframe with many observations and many variables. Some of them are categorical (unordered) and the others are numerical.
I'm looking for associations between these variables. I've been able to compute correlation for numerical variables (Spearman's correlation) but :
·         I don't know how to measure correlation between unordered categorical variables.
·         I don't know how to measure correlation between unordered categorical variables and numerical variables.
Does anyone know how this could be done? If so, are there R functions implementing these methods?
https://www.gravatar.com/avatar/96ebcc67fba4e8626c011fdf66c96259?s=32&d=identicon&r=PG

66.1k18146271
asked Jul 15 '14 at 12:18
https://www.gravatar.com/avatar/ec24ea87706ef57e19676096623a3629?s=32&d=identicon&r=PG&f=1

161135
add a comment
4 Answers
up vote33down vote
It depends on what sense of a correlation you want. When you run the prototypical Pearson's product moment correlation, you get a measure of the strength of association and you get a test of the significance of that association. More typically however, the significance test and the measure ofeffect size differ.
Significance tests:
·         Continuous vs. Nominal: run an ANOVA. In R, you can use ?aov.
·         Nominal vs. Nominal: run a chi-squared test. In R, you use ?chisq.test.
Effect size (strength of association):
·         Continuous vs. Nominal: calculate the intraclass correlation. In R, you can use ?ICC in the psychpackage; there is also an ICC package.
·         Nominal vs. Nominal: calculate Cramer's V. In R, you can use ?assocstats in the vcd package.
answered Aug 20 '14 at 19:40
https://www.gravatar.com/avatar/96ebcc67fba4e8626c011fdf66c96259?s=32&d=identicon&r=PG

66.1k18146271
2

A very thorough explanation of the continuous vs. nominal case can be found here: Correlation between a nominal (IV) and a continuous (DV) variable. – gung Dec 23 '14 at 16:35
1

In the binary vs interval case there's the point-biserial correlation. – Glen_b Mar 12 '15 at 22:54
  

What would be a better alternative to the chi-squared test for large samples? – Waldir Leoncio Jul 19 '15 at 22:33
  

@WaldirLeoncio, "better" in what sense? What is wrong with the chi-squared if you want a test of independence? What constitutes a "large sample" for you? – gung Jul 19 '15 at 23:32
2

@WaldirLeoncio, yes but if the null is true, pp will be <.05<.05 only 5%5% of the time. That is the way it is supposed to work. If you want to know the magnitude of the effect as well as a test of the null, you may want to calculate Cramer's V along with the chi-squared test. – gung Jul 20 '15 at 12:58
http://cdn.sstatic.net/Sites/stats/img/apple-touch-icon.png
Did you find this question interesting? Try our newsletter
Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).
Top of Form

Bottom of Form
up vote4down vote
I've seen the following cheatsheet linked before:
It may be useful to you. It even has links to specific R libraries.
answered Jul 15 '14 at 16:01
https://graph.facebook.com/5605578/picture?type=large

24612
2

The issue with this cheatsheet is it only concerns categorical / ordinal / interval variables. What I'm looking for is a method allowing me to use both numerical and categorical independant variables. – Clément F Jul 17 '14 at 14:01
1

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review – gung Nov 13 '15 at 3:45
add a comment
up vote2down vote
Depends on what you want to achieve. Let XX be the continuous, numerical variabke and KK the (unordered) categorical variable. Then one possible approach is to assign numerical scores titi to each of the possible values of KKi=1,…,pi=1,…,p. One possible criterion is to maximize the correlation between the XX and the scores titi. With only one continuous and one categorical vaRIABLE, this might not be very helpful, since the maximum correlation will always be one (to show that, and find some such scores, is an exercise in using Lagrange multipliers! With multiple variables, we try to find compromise scores for the categorical variables, maybe trying to maximize the multiple correlation R2R2. Then the individual correlations will not more (except very special cases!) equal one.
Such an analysis can be seen as a generalization of multiple correspondence analysis, and is known under many names, such as canonical correlation analysis, homogeneity analysis, and many others. An implementation in R is in the homals package (on CRAN). googling for some of this names will give a wealth of information, there is a complete book: Albert Gifi, "Nonlinear Multivariate Analysis". Good luck!
answered Jul 15 '14 at 14:20
https://www.gravatar.com/avatar/ae4704a5e599686e87c5affeecab92f7?s=32&d=identicon&r=PG

8,69532051
1

(+1) Why use Lagrange multipliers? Just use the values of the continuous variable to score the categorical one. This also reveals why the max correlation is not necessarily 11, which is attainable only when each category is paired with an unvarying set of values of the continuous variable. – whuber Nov 17 '14 at 10:04
add a comment

up vote1down vote
I had a similar problem and I tried the Chi-squared-Test as suggested but I got very confused in assessing the P-Values against NULL Hypothesis.
I will explain how I interpreted categorical variables. I am not sure how relevant it is in your case. I had Response Variable Y and two Predictor Variables X1 and X2 where X2 being a categorical variable with two levels say 1 and 2. I was trying to fit a Linear Model
ols = lm(Y ~ X1 + X2, data=mydata)
But I wanted to understand how different level of X2 fits the above equation. I came across a R function by()
by(mydata,X2,function(x) summary(lm(Y~X1,data=x)))
What this code does is, it is trying to fit in Linear Model for each level of X2. This gave me all P-value and R-square, Residual standard error which I understand and can interpret.
Again I am not sure if this is what you want. I sort of compared different values of X2 in predicting Y.


Algorithms (1). Adaptive gradient methods
• Gradient methods for one-by-one estimation straightforward.
• Stochastic gradient ascent for likelihood (Bell-Sejnowski 1995)
(W−1)+g(Wx)xT
(17)
with = (log ps). Problem: needs matrix inversion!
• Better: natural/relative gradient ascent of likelihood
(Amari et al, 1996, Cardoso and Laheld, 1994)
[I+g(y)y]W
(18)
with Wx. Obtained by multiplying gradient by WW.
23
Slide from Hyvärinen
Algorithms (2). The FastICA fixed-point algorithm
(Hyvärinen 1997,1999)
• An approximate Newton method in block (batch) mode.
• No matrix inversion, but still quadratic (or cubic) convergence.
• No parameters to be tuned.
• For a single IC (whitened data)
← E{xg(wx)}−E{g(wx)}w, normalize w
where is the derivative of G



Structural Equation Modeling (SEM)

faculty.cas.usf.edu/mbrannick/regression/SEM.html
1.      
2.      
What is a latent variable? What is an observed (manifest) variable? How does SEMhandle measurement errors? Why does SEM have an advantage over ...

[PDF]Basics of SEM

people.ucsc.edu/.../09SEM3a.pdf
1.      
2.      
University of California, Santa Cruz
Latent vs. observed variables. • Exogenous vs. endogenous variables. • Multiple regression as a SEM model. • Steps in SEM analysis. • Interpreting output ...

Structural equation modeling - Wikipedia, the free ...

https://en.wikipedia.org/wiki/Structural_equation_modeling
1.      
Wikipedia
An example structural equation model. Latent variables are drawn as circles. Manifest or measured variables are shown as squares. Residuals and variances ...

SEM with Latent Variables (David A. Kenny)

davidakenny.net/cm/sem.htm
1.      
2.      
August 29, 2011Structural Models with Latent Variables ... simultaneously estimated by a structural equation modeling program such as AMOS, LISREL, or EQS.

SEM: Terminology and Basics (David A. Kenny)

davidakenny.net/cm/basics.htm
1.      
2.      
by DA Kenny - ‎2011 - ‎Cited by 5 - ‎Related articles
Sep 6, 2011 - The disturbance is treated as a latent variable. Structural ... A diagram that pictorially represents a structural equation model. Curved lines ...

[PDF]LATENT VARIABLE STRUCTURAL EQUATION MODELING ...

https://www.statmodel.com/download/Article_0091.pdf
1.      
by B MUTHkN - ‎1983 - ‎Cited by 385 - ‎Related articles
Structural equation modeling with latent variables is overviewed for situations ...latent variable structural equation models, with particular emphasis on the.

[PDF]Structural Equation Models: Path Analysis with Latent ...

jonathantemplin.com/.../sem/sem13psyc948/sem13psyc948_lecture11.pd...
1.      
Apr 3, 2013 - Model-predicted covariance matrices for path analysis with observed andlatent variables. • Examples of SEM uses. PSYC 948: Lecture 11. 2 ...

Introduction to Structural Equation Modeling with Latent ...

https://support.sas.com/.../statug_introcalis_sect001.htm
1.      
SAS Institute
Overview of Structural Equation Modeling with Latent Variables. Structural equation modeling includes analysis of covariance structures and mean structures, ...

[DOC]Structural Equation Modeling

www.stat.purdue.edu/.../Structural%20Equation%20M...
1.      
2.      
Purdue University
Latent variables increase the complexity of a structural equation model because one needs to take into account all of the questionnaire items and measured ...

[DOC]An Introduction to Structural Equation Modeling (SEM) - East ...

core.ecu.edu/psyc/wuenschk/MV/SEM/SEM-Intro.doc
1.      
Special cases of SEM include confirmatory factor analysis and path analysis. You are already familiar with path analysis, which is SEM with no latent variables.




Simplex factor models for multivariate unordered categorical data Anirban Bhattacharya, David B. Dunson Department of Statistical Science, Duke University, NC 27708 email: ab179@stat.duke.edu, dunson@stat.duke.edu May 9, 2011 Abstract Gaussian latent factor models are routinely used for modeling of dependence in continuous, binary and ordered categorical data. For unordered categorical variables, Gaussian latent factor models lead to challenging computation and complex modeling structures. As an alternative, we propose a novel class of simplex factor models. In the single factor case, the model treats the different categorical outcomes as independent with unknown marginals. The model can characterize flexible dependence structures parsimoniously with few factors, and as factors are added, any multivariate categorical data distribution can be accurately approximated. Using a Bayesian approach for computation and inferences, an MCMC algorithm is proposed that scales well with increasing dimension, with the number of factors treated as unknown. We develop an efficient proposal for updating the base probability vector in hierarchical Dirichlet models. Theoretical properties are described and we evaluate the approach through simulation examples. Applications are described for modeling dependence in nucleotide sequences and prediction from high-dimensional categorical 1 features. KEYWORDS: Classification; Contingency table; Factor analysis; Latent variable; Nonparametric Bayes; Non-negative tensor factorization; Mutual information; Polytomous regression. 1. INTRODUCTION Multivariate unordered categorical data are routinely encountered in a variety of application areas, with interest often in inferring dependencies among the variables. For example, the categorical variables may correspond to a sequence of A, C, G, T nucleotides or responses to questionnaire data on race, religion and political affiliation for an individual. We shall use yi = (yi1, . . . , yip) T to denote the multivariate observation for the i th subject, with yij {1, . . . , dj}. Complicated dependence can potentially be expressed in terms of simpler conditional independence relationships via graphical models (Dawid & Lauritzen, 1993). Such models have been used for continuous (Lauritzen, 1996; Dobra et al., 2004), categorical (Whittaker, 1990; Madigan & York, 1995) and mixed scale variables (Dobra & Lenkoski, 2011; Pitt et al., 2006). Although graphical models are popular due to their flexibility and interpretability, computation is daunting since the size of the model space grows exponentially with p. Even with highly efficient search algorithms (Jones et al. (2005); Carvalho & Scott (2009); Lenkoski & Dobra (2010); Dobra & Massam (2010) among others), it is only feasible to visit a tiny subset of the model space even for moderate p. Accurate model selection in this context is difficult when p is moderate to large and the number of samples is not enormous, because in such cases even the highest posterior probability models receive very small weight and there will typically be a large number of models having essentially identical performance according to any given model selection criteria (AIC, BIC, etc). Dobra & Lenkoski (2011) advocate model averaging to avoid the inferences to depend explicitly on the choice of the underlying graph. In parallel to the development of graphical models, factor models (West, 2003; Carvalho et al., 2008) have been widely used for modeling of high-dimensional variables and dimension reduction.

No comments:

Post a Comment