SAS procedures are invoked using simple scripts. Stata procedures can be invoked by clicking buttons in the menu or by running simple scripts. In the menu part, Stata resembles SPSS. Both SAS and Stata are programming languages, so they allow you to build analytics around standard procedures.
. Stata is somewhat more flexible than SAS. Still, in terms of programming flexibility, Stata and SAS do not come even close to R or Matlab.
http://www.theanalysisfactor.com/
SPSS, SAS, R, Stata, JMP? Choosing a Statistical Software Package or Two.
Almost all serious statistical analysis is done in one of the following packages: R (S-PLUS), Matlab, SAS, SPSS and Stata. I have expertise in each of those packages but it does not mean that each of those packages is good for a specific type of analysis. In fact, for most advanced areas only 2-3 packages will be suitable, providing enough functionality or enough tools to implement this functionality easily. For example, a very important area of Markov Chain Monte Carlo is doable in R, Matlab and SAS only, unless you want to rely on convoluted macros written by random users on the web. The table at the end of this page compares the five packages in great detail.
R & MATLAB
R and Matlab are the richest systems by far. They contain an impressive collection of libraries, which is growing every day. Even if a desired specific model is not part of the standard functionality you can implement the model yourself, because R and Matlab are really programming languages with relatively simple syntaxes. As "languages" they allow you to express any idea. The question is whether you are a good writer or not. In terms of modern applied statistics tools, R libraries are somewhat richer than those of Matlab. Also R is free. On the flip side, Matlab has much better graphics, which you will not be ashamed to put in a paper or a presentation.
SPSS
On the other end of the spectrum is a package like SPSS. SPSS is quite narrow in its capabilities and allows you to do only about half of the mainstream statistics. It is quite useless for ambitious modeling and estimation procedures which are part of kernel smoothing, pattern recognition or signal processing. Nonetheless, SPSS is very popular among the practitioners because it does not require almost any training. All you have to do is hit several buttons and SPSS does all the calculations for you. In those cases when you need something standard, SPSS may have it implemented fully. The SPSS output will be quite detailed and visually pleasing. It will contain all the major tests and diagnostic tools associated with the method and will allow you to write an informative statistics section of your empirical analysis. In short, when the method is there, it is faster to run than a similar functionality in R or Matlab. So I use SPSS often for standard requests from my clients, like linear regression, ANOVA or principal components analysis. SPSS gives you the ability to program macros but that feature is quite inflexible.
SAS & STATA
Somewhere in-between R, Matlab and SPSS lie SAS and Stata. SAS is more extensive analytics than Stata. It is composed of dozens of procedures with massive, massive output, often covering more than ten pages. The idea of SAS is not to listen to you that much. It is like an old grandfather, whom you approach with a simple question but instead he tells you the story of his life. Many procedures contain three times more than what you need to know about that segment. So some time has to be spent on filtering in the relevant output. SAS procedures are invoked using simple scripts. Stata procedures can be invoked by clicking buttons in the menu or by running simple scripts. In the menu part, Stata resembles SPSS. Both SAS and Stata are programming languages, so they allow you to build analytics around standard procedures. Stata is somewhat more flexible than SAS. Still, in terms of programming flexibility, Stata and SAS do not come even close to R or Matlab. Selected strengths of SAS compared to all other packages: large data sets, speed, beautiful graphics, flexibility in formatting the output, time series procedures, counting processes. Selected strengths of Stata compared to all other packages: manipulation of survey data (stratified samples, clustering), robust estimation and tests, longitudinal data methods, multivariate time series.
THE TABLE
The following table compares the standard procedures of the five packages in detail. By "standard" I mean built-in or readily available from the official or widely known and reliable public web-sites.
R & MATLAB
R and Matlab are the richest systems by far. They contain an impressive collection of libraries, which is growing every day. Even if a desired specific model is not part of the standard functionality you can implement the model yourself, because R and Matlab are really programming languages with relatively simple syntaxes. As "languages" they allow you to express any idea. The question is whether you are a good writer or not. In terms of modern applied statistics tools, R libraries are somewhat richer than those of Matlab. Also R is free. On the flip side, Matlab has much better graphics, which you will not be ashamed to put in a paper or a presentation.
SPSS
On the other end of the spectrum is a package like SPSS. SPSS is quite narrow in its capabilities and allows you to do only about half of the mainstream statistics. It is quite useless for ambitious modeling and estimation procedures which are part of kernel smoothing, pattern recognition or signal processing. Nonetheless, SPSS is very popular among the practitioners because it does not require almost any training. All you have to do is hit several buttons and SPSS does all the calculations for you. In those cases when you need something standard, SPSS may have it implemented fully. The SPSS output will be quite detailed and visually pleasing. It will contain all the major tests and diagnostic tools associated with the method and will allow you to write an informative statistics section of your empirical analysis. In short, when the method is there, it is faster to run than a similar functionality in R or Matlab. So I use SPSS often for standard requests from my clients, like linear regression, ANOVA or principal components analysis. SPSS gives you the ability to program macros but that feature is quite inflexible.
SAS & STATA
Somewhere in-between R, Matlab and SPSS lie SAS and Stata. SAS is more extensive analytics than Stata. It is composed of dozens of procedures with massive, massive output, often covering more than ten pages. The idea of SAS is not to listen to you that much. It is like an old grandfather, whom you approach with a simple question but instead he tells you the story of his life. Many procedures contain three times more than what you need to know about that segment. So some time has to be spent on filtering in the relevant output. SAS procedures are invoked using simple scripts. Stata procedures can be invoked by clicking buttons in the menu or by running simple scripts. In the menu part, Stata resembles SPSS. Both SAS and Stata are programming languages, so they allow you to build analytics around standard procedures. Stata is somewhat more flexible than SAS. Still, in terms of programming flexibility, Stata and SAS do not come even close to R or Matlab. Selected strengths of SAS compared to all other packages: large data sets, speed, beautiful graphics, flexibility in formatting the output, time series procedures, counting processes. Selected strengths of Stata compared to all other packages: manipulation of survey data (stratified samples, clustering), robust estimation and tests, longitudinal data methods, multivariate time series.
THE TABLE
The following table compares the standard procedures of the five packages in detail. By "standard" I mean built-in or readily available from the official or widely known and reliable public web-sites.
TYPE OF STATISTICAL ANALYSIS | R | MATLAB | SAS | STATA | SPSS |
Nonparametric Tests | Yes | Yes | Yes | Yes | Yes |
T-test | Yes | Yes | Yes | Yes | Yes |
ANOVA & MANOVA | Yes | Yes | Yes | Yes | Yes |
ANCOVA & MANCOVA | Yes | Yes | Yes | Yes | Yes |
Linear Regression | Yes | Yes | Yes | Yes | Yes |
Generalized Least Squares | Yes | Yes | Yes | Yes | Yes |
Ridge Regression | Yes | Yes | Yes | Limited | Limited |
Lasso | Yes | Yes | Yes | Limited | |
Generalized Linear Models | Yes | Yes | Yes | Yes | Yes |
Logistic Regression | Yes | Yes | Yes | Yes | Yes |
Mixed Effects Models | Yes | Yes | Yes | Yes | Yes |
Nonlinear Regression | Yes | Yes | Yes | Limited | Limited |
Discriminant Analysis | Yes | Yes | Yes | Yes | Yes |
Nearest Neighbor | Yes | Yes | Yes | Yes | |
Naive Bayes | Yes | Yes | Limited | ||
Factor & Principal Components Analysis | Yes | Yes | Yes | Yes | Yes |
Canonical Correlation Analysis | Yes | Yes | Yes | Yes | Yes |
Copula Models | Yes | Yes | Experimental | ||
Path Analysis | Yes | Yes | Yes | Yes | Yes |
Structural Equation Modeling (Latent Factors) | Yes | Yes | Yes | Yes | AMOS |
Extreme Value Theory | Yes | Yes | |||
Variance Stabilization | Yes | Yes | |||
Bayesian Statistics | Yes | Yes | Limited | ||
Monte Carlo, Classic Methods | Yes | Yes | Yes | Yes | Limited |
Markov Chain Monte Carlo | Yes | Yes | Yes | ||
Bootstrap & Jackknife | Yes | Yes | Yes | Yes | Yes |
EM Algorithm | Yes | Yes | Yes | ||
Missing Data Imputation | Yes | Yes | Yes | Yes | Yes |
Outlier Diagnostics | Yes | Yes | Yes | Yes | Yes |
Robust Estimation | Yes | Yes | Yes | Yes | |
Cross-Validation | Yes | Yes | Yes | ||
Longitudinal (Panel) Data | Yes | Yes | Yes | Yes | Limited |
Survival Analysis | Yes | Yes | Yes | Yes | Yes |
Propensity Score Matching | Yes | Yes | Limited | Limited | |
Stratified Samples (Survey Data) | Yes | Yes | Yes | Yes | Yes |
Experimental Design | Yes | Yes | Limited | ||
Quality Control | Yes | Yes | Yes | Yes | Yes |
Reliability Theory | Yes | Yes | Yes | Yes | Yes |
Univariate Time Series | Yes | Yes | Yes | Yes | Limited |
Multivariate Time Series | Yes | Yes | Yes | Yes | |
Stochastic Volatility Models, Discrete Case | Yes | Yes | Yes | Yes | Limited |
Stochastic Volatility Models, Continuous Case | Yes | Yes | Limited | Limited | |
Diffusions | Yes | Yes | |||
Markov Chains | Yes | Yes | |||
Hidden Markov Models | Yes | Yes | |||
Counting Processes | Yes | Yes | Yes | ||
Filtering | Yes | Yes | Limited | Limited | |
Instrumental Variables | Yes | Yes | Yes | Yes | Yes |
Simultaneous Equations | Yes | Yes | Yes | Yes | AMOS |
Splines | Yes | Yes | Yes | Yes | |
Nonparametric Smoothing Methods | Yes | Yes | Yes | Yes | |
Cluster Analysis | Yes | Yes | Yes | Yes | Yes |
Neural Networks | Yes | Yes | Yes | Limited | |
Classification & Regression Trees | Yes | Yes | Yes | Limited | |
Boosting Classification & Regression Trees | Yes | Yes | Limited | ||
Random Forests | Yes | Yes | Limited | ||
Support Vector Machines | Yes | Yes | Yes | ||
Signal Processing | Yes | Yes | |||
Wavelet Analysis | Yes | Yes | Yes | ||
Bagging | Yes | Yes | Yes | ||
ROC Curves | Yes | Yes | Yes | Yes | Yes |
Deterministic Optimization | Yes | Yes | Yes | Limited | |
Stochastic Optimization | Yes | Yes | Limited |
In addition to the five listed in this title, there are quite a few other options, so how do you choose which statistical software to use?
The default is to use whatever software they used in your statistics class–at least you know the basics.
And this might turn out pretty well, but chances are it will fail you at some point. Many times the stat package used in a class is chosen for its shallow learning curve,not its ability to handle advanced analyses that are encountered in research.
I think I’ve used at least a dozen different statistics packages since my first stats class. And here are my observations:
1. The first one you learn is the hardest to learn. There are many similarities in the logic and wording they use, even if the interface is different. So once you’re learned one, it will be easier to learn the next one.
2. You will have to learn another one. Just accept it. If you have the self discipline to do it, I suggest learning two at the beginning. This will come in handy for a number of reasons
– My favorite stat package for a while was BMDP. Until the company was bought up by SPSS. I’m not sure if they stopped producing or updating it, but my university cancelled their site license.
– Many schools offer only a site license for only one package, and it may not be the one you’re used to. When I was at Cornell, they offered site licenses for 5 packages. But when a new stats professor decided to use JMP instead of Minitab, guess what happened to the Minitab site license? Unless you’re sure you’ll never leave your current university, you may have to start over.
– In case you decide to outwit the powers-that-be in IT who control the site licenses and buy your own (or use R, which is free), no software package does every type of analysis. There is huge overlap, to be sure, and the major ones are much more comprehensive than they were even 5 years ago. Even so, the gaps are in the most complicated analyses–some mixed models, gee, complex sampling, etc. And when you’re trying to learn a new, highly complicated statistical method is not the time to learn a new, highly complicated stats package.
For these reasons, I recommend that everyone who plans to do research for the forseeable future learn two packages.
I know, it’s hard enough to find the time to start over and learn one. Much less the self discipline. But if you can, it will save you grief later on. There are many great books, online tutorials, and workshops for learning all the major stats packages.
But I also recommend you choose one as your primary package and learn it really, really well. The defaults and assumptions and wording are not the same across packages. Knowing how yours handles dummy coding or missing data is imperative to doing correct statistics.
Which one? Mainly it depends on the field you’re in. Social scientists should generally learn SPSS as their main package, mainly because that is what their colleagues are using. You can then choose something else as a backup–either SAS, R, or Stata, based on availability and which makes most sense to you logically.
https://www.apponfly.com/en/application/ncss10
to use R because i love programing and R is a wonderfull language.also R isn’t
limited! my goal idea is to create packages that cover shortage of other
softwares,and linking softwares toghether.Indeed,i like to ferret in softwares.
so,my first software is R but i hasn’t think about primiary software yet…!
so,i research about statistical softwares and decide to use STATA inside
R!
I am new to R.I would like to know R-PLUS.Does any know where can I get the free training for R-PLUS.
Peng.