I have mostly programmed in Python, but I am now learning the statistical programming language R. I have noticed some difference between the languages that tend to trip me.
Suppose v
is a vector/array with the integers from 1 to 5 inclusive.
v[3] # in R: gives me the 3rd element of the vector: 3
# in Python: is zero-based, gives me the integer 4
v[-1] # in R: removes the element with that index
# in Python: gives me the last element in the array
Are there any other pitfalls I have to watch out for?
Having written tens of thousands of lines of code in both languages, R is just a lot more idiosyncratic and less consistent than Python. It's really nice for doing quick plots and investigation on a small to medium size dataset, mainly because its built-in dataframe object is nicer than the numpy/scipy equivalent, but you'll find all kinds of weirdness as you do things more complicated than one liners. My advice is to use rpy2 (which unfortunately has a much worse UI than its predecessor, rpy) and just do as little as possible in R with the rest in Python.
For example, consider the following matrix code:
> u = matrix(1:9,nrow=3,ncol=3)
> v = u[,1:2]
> v[1,1]
[2] 1
> w = u[,1]
> w[1,1]
Error in w[1, 1] : incorrect number of dimensions
How did that fail? The reason is that if you select a submatrix from a matrix which has only one column along any given axis, R "helpfully" drops that column and changes the type of the variable. So w is a vector of integers rather than a matrix:
> class(v)
[1] "matrix"
> class(u)
[1] "matrix"
> class(w)
[1] "integer"
To avoid this, you need to actually pass an obscure keyword parameter:
> w2 = u[,1,drop=FALSE]
> w2[1,1]
[3] 1
> class(w2)
[1] "matrix"
There's a lot of nooks and crannies like that. Your best friend at the beginning will be introspection and online help tools like str
,class
,example
, and of course help
. Also, make sure to look at the example code on the R Graph Gallery and in Ripley's Modern Applied Statistics with S-Plus book.
EDIT: Here's another great example with factors.
> xx = factor(c(3,2,3,4))
> xx
[1] 3 2 3 4
Levels: 2 3 4
> yy = as.numeric(xx)
> yy
[1] 2 1 2 3
Holy cow! Converting something from a factor back to a numeric didn't actually do the conversion you thought it would. Instead it's doing it on the internal enumerated type of the factor. This is a source of hard-to-find bugs for people who aren't aware of this, because it's still returning integers and will in fact actually work some of the time (when the input is already numerically ordered).
This is what you actually need to do
> as.numeric(levels(xx))[xx]
[1] 3 2 3 4
Yeah, sure, that fact is on the factor
help page, but you only land up there when you've lost a few hours to this bug. This is another example of how R does not do what you intend. Be very, very careful with anything involving type conversions or accessing elements of arrays and lists.
This isn't specifically addressing the Python vs. R background, but the R inferno is a great resource for programmers coming to R.
The accepted answer to this post is possibly a bit outdated. The Pandas Python library now provides amazing R-like DataFrame support.
There may be... but before you embark on that have you tried some of the available Python extensions? Scipy has a list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With