Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python statistics package: difference between statsmodel and scipy.stats [closed]

I need some advice on selecting statistics package for Python, I've done quite some search, but not sure if I get everything right, specifically on the differences between statsmodels and scipy.stats.

One thing that I know is those with scikits namespace are specific "branches" of scipy, and what used to be scikits.statsmodels is now called statsmodels. On the other hand there is also scipy.stats. What are the differences between the two, and which one is the statistics package for Python?

Thanks.

--EDIT--

I changed the title because some answers are not really related to the question, and I suppose that's because the title is not clear enough.

like image 505
herrfz Avatar asked Jan 29 '13 00:01

herrfz


People also ask

Is Statsmodels part of scipy?

Statsmodels is built on top of NumPy, SciPy, and matplotlib, but it contains more advanced functions for statistical testing and modeling that you won't find in numerical libraries like NumPy or SciPy.

What is the use of the describe () function when working with scipy stat module?

describe() function | Python. scipy. stats. describe(array, axis=0) computes the descriptive statistics of the passed array elements along the specified axis of the array.

What is scipy stats Mstats?

stats. mstats ) This module contains a large number of statistical functions that can be used with masked arrays. Most of these functions are similar to those in scipy.

Is Statsmodels a package?

Statsmodels is a Python package that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.


2 Answers

Statsmodels has scipy.stats as a dependency. Scipy.stats has all of the probability distributions and some statistical tests. It's more like library code in the vein of numpy and scipy. Statsmodels on the other hand provides statistical models with a formula framework similar to R and it works with pandas DataFrames. There are also statistical tests, plotting, and plenty of helper functions in statsmodels. Really it depends on what you need, but you definitely don't have to choose one. They have different aims and strengths.

like image 175
jseabold Avatar answered Oct 09 '22 23:10

jseabold


I try to use pandas/statsmodels/scipy for my work on a day-to-day basis, but sometimes those packages come up a bit short (LOESS, anybody?). The problem with the RPy module is (last I checked, at least) that it wants a specific version of R that isn't current---my R installation is 2.16 (I think) and RPy wanted 2.14. So either you have to have two parallel installations of R, or you have to downgrade. (If you don't have R installed, then you can just install the correct version of R and use RPy.)

So when I need something that isn't in pandas/statsmodels/scipy I write R scripts, and run them with the subprocess module. This lets me interact with R as little as possible (which I really don't like programming in), but I can still leverage all the stuff that R has that the Python packages don't.

The lesson is that there isn't ever one solution to any problem---you have to assemble a whole bunch of parts that are all useful to you (and maybe write some of your own), in a way that you understand, to solve problems. (R aficionados will disagree, of course!)

like image 5
BenDundee Avatar answered Oct 09 '22 23:10

BenDundee