Tuesday, April 12, 2016

sas vs. R, MATLAB, SAS started on mainframe, site license fee, stanfordphd.com/Statistical_Software.html; SAS procedures are invoked using simple scripts. Stata procedures can be invoked by clicking buttons in the menu or by running simple scripts.

SAS procedures are invoked using simple scripts. Stata procedures can be invoked by clicking buttons in the menu or by running simple scripts. In the menu part, Stata resembles SPSS. Both SAS and Stata are programming languages, so they allow you to build analytics around standard procedures. 


. Stata is somewhat more flexible than SAS. Still, in terms of programming flexibility, Stata and SAS do not come even close to R or Matlab.

http://www.theanalysisfactor.com/

SPSS, SAS, R, Stata, JMP? Choosing a Statistical Software Package or Two.

by KAREN


STATISTICAL SOFTWARE
Almost all serious statistical analysis is done in one of the following packages: R (S-PLUS), Matlab, SAS, SPSS and Stata. I have expertise in each of those packages but it does not mean that each of those packages is good for a specific type of analysis. In fact, for most advanced areas only 2-3 packages will be suitable, providing enough functionality or enough tools to implement this functionality easily. For example, a very important area of Markov Chain Monte Carlo is doable in R, Matlab and SAS only, unless you want to rely on convoluted macros written by random users on the web. The table at the end of this page compares the five packages in great detail.


R & MATLAB
R and Matlab are the richest systems by far. They contain an impressive collection of libraries, which is growing every day. Even if a desired specific model is not part of the standard functionality you can implement the model yourself, because R and Matlab are really programming languages with relatively simple syntaxes. As "languages" they allow you to express any idea. The question is whether you are a good writer or not. In terms of modern applied statistics tools, R libraries are somewhat richer than those of Matlab. Also R is free. On the flip side, Matlab has much better graphics, which you will not be ashamed to put in a paper or a presentation.


SPSS
On the other end of the spectrum is a package like SPSS. SPSS is quite narrow in its capabilities and allows you to do only about half of the mainstream statistics. It is quite useless for ambitious modeling and estimation procedures which are part of kernel smoothing, pattern recognition or signal processing. Nonetheless, SPSS is very popular among the practitioners because it does not require almost any training. All you have to do is hit several buttons and SPSS does all the calculations for you. In those cases when you need something standard, SPSS may have it implemented fully. The SPSS output will be quite detailed and visually pleasing. It will contain all the major tests and diagnostic tools associated with the method and will allow you to write an informative statistics section of your empirical analysis. In short, when the method is there, it is faster to run than a similar functionality in R or Matlab. So I use SPSS often for standard requests from my clients, like linear regression, ANOVA or principal components analysis. SPSS gives you the ability to program macros but that feature is quite inflexible.


SAS & STATA
Somewhere in-between R, Matlab and SPSS lie SAS and Stata. SAS is more extensive analytics than Stata. It is composed of dozens of procedures with massive, massive output, often covering more than ten pages. The idea of SAS is not to listen to you that much. It is like an old grandfather, whom you approach with a simple question but instead he tells you the story of his life. Many procedures contain three times more than what you need to know about that segment. So some time has to be spent on filtering in the relevant output. SAS procedures are invoked using simple scripts. Stata procedures can be invoked by clicking buttons in the menu or by running simple scripts. In the menu part, Stata resembles SPSS. Both SAS and Stata are programming languages, so they allow you to build analytics around standard procedures. Stata is somewhat more flexible than SAS. Still, in terms of programming flexibility, Stata and SAS do not come even close to R or Matlab. Selected strengths of SAS compared to all other packages: large data sets, speed, beautiful graphics, flexibility in formatting the output, time series procedures, counting processes. Selected strengths of Stata compared to all other packages: manipulation of survey data (stratified samples, clustering), robust estimation and tests, longitudinal data methods, multivariate time series.


THE TABLE
The following table compares the standard procedures of the five packages in detail. By "standard" I mean built-in or readily available from the official or widely known and reliable public web-sites.
 TYPE OF STATISTICAL ANALYSIS MATLABSAS STATA  SPSS
      
 Nonparametric Tests Yes Yes Yes Yes Yes
 T-test Yes Yes Yes Yes Yes
 ANOVA & MANOVA Yes Yes Yes Yes Yes
 ANCOVA & MANCOVA Yes Yes Yes Yes Yes
 Linear Regression Yes Yes Yes Yes Yes
 Generalized Least Squares Yes Yes Yes  Yes Yes 
 Ridge Regression Yes Yes Yes  Limited Limited
 Lasso Yes Yes Yes  Limited 
 Generalized Linear Models Yes Yes Yes Yes Yes
 Logistic Regression Yes Yes Yes Yes Yes
 Mixed Effects Models Yes Yes Yes Yes Yes
 Nonlinear Regression Yes Yes Yes  Limited Limited
 Discriminant Analysis Yes Yes Yes  Yes  Yes 
 Nearest Neighbor Yes Yes Yes   Yes 
 Naive Bayes Yes Yes   Limited
 Factor & Principal Components Analysis Yes Yes Yes Yes Yes
 Canonical Correlation Analysis Yes Yes Yes Yes Yes
 Copula Models Yes Yes Experimental  
 Path Analysis Yes Yes Yes Yes Yes
 Structural Equation Modeling (Latent Factors) Yes Yes Yes Yes AMOS
 Extreme Value Theory Yes Yes   
 Variance Stabilization Yes Yes   
 Bayesian Statistics Yes Yes Limited  
 Monte Carlo, Classic Methods Yes Yes Yes  Yes  Limited
 Markov Chain Monte Carlo Yes Yes Yes   
 Bootstrap & Jackknife Yes Yes Yes  Yes  Yes
 EM Algorithm Yes Yes Yes   
 Missing Data Imputation Yes Yes Yes  Yes  Yes 
 Outlier Diagnostics Yes Yes Yes  Yes  Yes
 Robust Estimation Yes Yes Yes  Yes 
 Cross-Validation Yes Yes Yes   
 Longitudinal (Panel) Data Yes Yes Yes  Yes  Limited
 Survival Analysis Yes Yes Yes  Yes  Yes 
 Propensity Score Matching Yes Yes Limited  Limited  
 Stratified Samples (Survey Data) Yes Yes Yes  Yes  Yes 
 Experimental Design Yes Yes Limited  
 Quality Control Yes Yes Yes Yes  Yes 
 Reliability Theory Yes Yes Yes  Yes  Yes
 Univariate Time Series Yes Yes Yes  Yes  Limited
 Multivariate Time Series Yes Yes Yes  Yes  
 Stochastic Volatility Models, Discrete Case Yes Yes Yes Yes Limited
 Stochastic Volatility Models, Continuous Case Yes Yes Limited Limited 
 Diffusions Yes Yes   
 Markov Chains Yes Yes   
 Hidden Markov Models Yes Yes   
 Counting Processes Yes Yes Yes   
 Filtering Yes Yes Limited  Limited 
 Instrumental Variables Yes Yes Yes Yes  Yes
 Simultaneous Equations Yes Yes Yes  Yes  AMOS
 Splines Yes Yes Yes  Yes 
 Nonparametric Smoothing Methods Yes Yes Yes  Yes  
 Cluster Analysis Yes Yes Yes  Yes  Yes 
 Neural Networks Yes Yes Yes   Limited
 Classification & Regression Trees Yes Yes Yes   Limited
 Boosting Classification & Regression Trees Yes Yes Limited  
 Random Forests Yes Yes Limited  
 Support Vector Machines Yes Yes Yes  
 Signal Processing Yes Yes   
 Wavelet Analysis Yes Yes Yes  
 Bagging Yes Yes Yes  
 ROC Curves Yes Yes Yes  Yes  Yes 
 Deterministic Optimization Yes Yes Yes  Limited 
 Stochastic Optimization Yes Yes Limited  

Please read the detailed description of the services offered in the areas of statistical consulting and financial consulting: home pagetypes of serviceexperiencecase studies and payment options. You may also find the following pages useful: statistics resources and finance resources.
FacebookTwitterGoogle+Share
In addition to the five listed in this title, there are quite a few other options, so how do you choose which statistical software to use?
The default is to use whatever software they used in your statistics class–at least you know the basics.
And this might turn out pretty well, but chances are it will fail you at some point. Many times the stat package used in a class is chosen for its shallow learning curve,not its ability to handle advanced analyses that are encountered in research.
I think I’ve used at least a dozen different statistics packages since my first stats class. And here are my observations:
1. The first one you learn is the hardest to learn. There are many similarities in the logic and wording they use, even if the interface is different. So once you’re learned one, it will be easier to learn the next one.
2. You will have to learn another one. Just accept it.  If you have the self discipline to do it, I suggest learning two at the beginning. This will come in handy for a number of reasons
– My favorite stat package for a while was BMDP. Until the company was bought up by SPSS. I’m not sure if they stopped producing or updating it, but my university cancelled their site license.
– Many schools offer only a site license for only one package, and it may not be the one you’re used to. When I was at Cornell, they offered site licenses for 5 packages. But when a new stats professor decided to use JMP instead of Minitab, guess what happened to the Minitab site license? Unless you’re sure you’ll never leave your current university, you may have to start over.
– In case you decide to outwit the powers-that-be in IT who control the site licenses and buy your own (or use R, which is free), no software package does every type of analysis. There is huge overlap, to be sure, and the major ones are much more comprehensive than they were even 5 years ago. Even so, the gaps are in the most complicated analyses–some mixed models, gee, complex sampling, etc. And when you’re trying to learn a new, highly complicated statistical method is not the time to learn a new, highly complicated stats package.
For these reasons, I recommend that everyone who plans to do research for the forseeable future learn two packages.
I know, it’s hard enough to find the time to start over and learn one. Much less the self discipline. But if you can, it will save you grief later on. There are many great books, online tutorials, and workshops for learning all the major stats packages.
But I also recommend you choose one as your primary package and learn it really, really well. The defaults and assumptions and wording are not the same across packages. Knowing how yours handles dummy coding or missing data is imperative to doing correct statistics.
Which one? Mainly it depends on the field you’re in. Social scientists should generally learn SPSS as their main package, mainly because that is what their colleagues are using. You can then choose something else as a backup–either SAS, R, or Stata, based on availability and which makes most sense to you logically.
{ 24 comments… read them below or add one }
Paul May 25, 2015 at 9:03 am
Could anyone suggest me any site that has some good projects ( I am looking for beginners to intermediate level) that uses Stata as a tool?
Thanks
Brandon December 22, 2014 at 10:19 am
I definitely prefer NCSS, though not mentioned in the article. Now the newest version pre-released even in cloud.
https://www.apponfly.com/en/application/ncss10
 stanfordphd August 27, 2014 at 1:53 am
For a comparison of SPSS, SAS, R, Stata and Matlab for each type of statistical analysis, see
jim May 31, 2014 at 9:57 pm
I like to use Java since it has good graphics. Therefore, my choice is SCaVis (http://jwork.org/scavis). It integrates Java and Python with superb graphics.
Kule Sana December 7, 2013 at 12:08 pm
I am used to spss and stata for my data analysis, however today I tried adding “analyse-it” to my excel package. It really worked for me. Can I really go ahead with it?
 Karen December 9, 2013 at 10:48 am
Hi Kule,
I don’t know much about the excel plug ins (or whatever the correct software term is). As a general rule, I avoid excel for data analysis, but this add-on may be just fine.
Ragnar August 2, 2013 at 1:58 pm
Hi Karen, nice suggestions backed with arguments!
On a different note, I wish to hear your opinion on free software… Have you, for example, had an experience with EasyReg? It seems to have much of the econometrics methods covered — by far more than I would ever imagine to use –, it’s easy to operate and is supported with PDF-files about relevant theory. What do you think? (I have currently no access to commercial software, unfortunately.)
 Karen August 7, 2013 at 3:28 pm
Hi Ragnar,
Thanks! I haven’t used that software before, but I can tell you there are many good stat software packages out there. If you like using it and you’re confident that it’s accurate, go with it.
Karen
moRteza May 26, 2013 at 9:47 am
I often use r ! and sometimes work with SPSS and Excel,but at all, i prefer
to use R because i love programing and R is a wonderfull language.also R isn’t
limited! my goal idea is to create packages that cover shortage of other
softwares,and linking softwares toghether.Indeed,i like to ferret in softwares.
so,my first software is R but i hasn’t think about primiary software yet…!
so,i research about statistical softwares and decide to use STATA inside
R!
 Karen June 6, 2013 at 5:18 pm
Hi Morteza, I agree: R is awesome if you love programming. But do check out Stata too. :)
Willie M. Clifton September 19, 2014 at 6:29 am
Hi karen do you really think that R is more efficient then Stata. I think that you are right because in programming most of my fellows using R rather then Stata. So Agreed with you…….. 😀
Joe Trubisz May 23, 2013 at 8:34 pm
I use R and Stata regularly. Dollar for dollar, I personally think that Stata is the most comprehensive stats package you can buy. Excellent documentation and a great user community. R is excellent as well, but suffers from absolutely terrible online documentation, which (for me) requires third party sources (read: books).
If somebody is buying you a license, then you don’t care what it costs. If someone like me has to buy a license, then to me, Stata is a no-brainer, given all the stats you can do with it.
My college eliminated both SAS and SPSS for that reason and use R for most classes. Rumor has it SAS is offering a new “college” licensing fee, but I’m not privy to that information.
Small sidebar: SAS started on the mainframe and it annoys me that it still “looks” that way. JMP is probably better (and again, expensive) but doesn’t have anywhere the capabilities if base SAS, the last time I looked.
Just my opinions.
 Karen May 24, 2013 at 1:54 pm
Hi Joe,
I actually agree with you about Stata. If I were to start over, that’s what I would use, especially, as you’ve said, if you’re buying your own license.
And Stata has the *best* manuals, IMHO.
Karen
jeremy February 5, 2013 at 5:43 pm
Depends on which social scientists you are talking about. I doubt you will find many economists, for example, who do most (if any) of their analyses in SPSS. If you absolutely must have a gui JMP is clearly the superior platform, since its scripting language can interface with R, and you can do whatever you please. Try searching for quantile regression in the SPSS documentation, it says the math is too hard, and SPSS cannot compute.
 Karen February 6, 2013 at 2:25 pm
Hi Jeremy,
Agreed, most economists I’ve talked to use either Stata or Eviews.
SPSS also interfaces with R.
Sure, there are examples of specific analyses that can’t be done in any software. That’s one reason why it’s good to be able to use at least two.
Karen
Christos November 15, 2012 at 12:30 pm
I am SPSS and R lover…in my university they use JMP software…how should I convince them that SPSS is better than JMP…or First of all can I convince them???
Cheers
Christos
 Karen November 16, 2012 at 11:53 am
Well, I’m sure they’ll cite budget issues. But there are some statistical options in SPSS that are not available in JMP. I don’t know of any where the reverse is true, although that may just be my lack of knowledge of JMP. For example, to the best of my knowledge, JMP doesn’t have a Linear Mixed Model procedure.
Karen
Dave December 3, 2012 at 1:00 pm
When you add random effects to a linear model in JMP the default is REML. In fact the manual goes so far as to say REML for repeated measures data is the modern default, and JMP provides EMS solutions for univariate RM ANOVA only for historical reasons. JMP doesn’t do multilevel models (more than 1 level of random effects), and I don’t believe it does generalized linear mixed effects models (count or binary outcomes). I usually use Stata and R, but I keep an eye on JMP because it is a fun program sometimes. I have used it for repeated measures data by mixed model when a colleague wanted help doing it himself, where the posthoc tests where flexible and accessible, compared to his version of Stata or in R.
 Karen December 3, 2012 at 5:08 pm
Thanks, Dave. That’s great to know. The last time I used JMP (which was a few years ago), REML wasn’t an option.
Yes, I agree. JMP is very straightforward and for 95% of analyses that most researchers use, entirely sufficient.
Karen
Dennis October 11, 2011 at 11:58 pm
Good advice, all around. But… if you choose SPSS as your primary package, SAS has little to offer you, and vice versa. The overlap is just too great to make either a good complement to the other.
A factor to consider in choosing between the Big Two is your preferred user interface. If you don’t want to program (much) and you adore point-and-shoot interfaces, go with SPSS. If you don’t mind programming explicitly, and despise point-and-shoot interfaces SAS will make you happier.
Another factor in choosing among the Big Two is your use of structural equation models (SEMs). If you don’t use them it’s a non-issue. If you use them extensively, you should choose between EQS-like syntax (in SAS PROC CALIS) and SPSS’s AMOS. SEMs are confusing enough without worrying about converting from your preferred expression of the models into the expression your software wants.
Much better choices as a complement to one of the Big Two are Stata and some dialect of S (R, S, S-plus). Stata users say it has some very slick programming facilities. (I’m not among them, so I can’t say from experience.) The S dialects are killers for simulation studies. I benchmarked R against SAS/IML (in version 9.1) and found R was an order of magnitude faster. R is built entirely around an object-oriented programming interface. Language extensions are a snap. In my opinion bootstrap estimation is easier in R than in other languages. High resolution graphics are native to R, and (despite a lot of improvement from versions 6 to 7 to 9.1 and 9.2) not native to SAS.
Ryan March 10, 2015 at 1:47 pm
I think SAS becomes an asset over SPSS when the focus is on data preparation: Merging multiple tables, accessing SQL databases, using API functions, creating canned reports, etc..
peng January 29, 2010 at 10:08 am
hi friends,
I am new to R.I would like to know R-PLUS.Does any know where can I get the free training for R-PLUS.
Regards,
Peng.
Dave May 23, 2013 at 12:59 pm
I believe the above comment is spam. I am not aware of the existence of R-plus; googling revealed a word for word comment on another site:http://www.talkstats.com/showthread.php/10761-free-training-for-R-PLUS.
Apologies to the commenter if this is a genuine enquiry.
 Karen May 23, 2013 at 2:56 pm
Thanks, Dave.
I suspect it was real, only because there was no link back to another site (you wouldn’t believe the strange links I get). I figured it was a language difficulty, and they meant S-Plus, on which R was based.
Karen

No comments:

Post a Comment