Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python & Pandas - pd.Series difference between int32 and int64

I'm starting to learn python, numpy and panda's and I have a really basic question, about sizes.

Please see the next code blocks:

1. Length: 6, dtype: int64

# create a Series from a dict
pd.Series({key: value for key, value in zip('abcdef', range(6))})

vs.

2. Length: 6, dtype: int32

# but why does this generate a smaller integer size???
pd.Series(range(6), index=list('abcdef'))

Question So I think when you put a list, numpy array, dictionary etc. in the pd.Series you will get int64 but when you put just the range(6) in the pd.Series you will get int32. Can someone please make this a little bit clear to me?

Sorry for the very basic question.

@Edit : I'm using Pandas version 0.20.1 and Numpy 1.12.1

like image 615
Mike Evers Avatar asked Sep 15 '17 13:09

Mike Evers


People also ask

What is Python used for?

Python is commonly used for developing websites and software, task automation, data analysis, and data visualization. Since it's relatively easy to learn, Python has been adopted by many non-programmers such as accountants and scientists, for a variety of everyday tasks, like organizing finances.

What is A += in Python?

The Python += Operator. The Python += operator adds two values together and assigns the final value to a variable. This operator is called the addition assignment operator.

Which language is used in Python?

Python is written in C (actually the default implementation is called CPython).

Is Python hard to learn?

Python is widely considered among the easiest programming languages for beginners to learn. If you're interested in learning a programming language, Python is a good place to start. It's also one of the most widely used.


1 Answers

They're semantically different in that in the first version you pass a dict with a single scalar value so the dtype becomes int64, for the second, you pass a range which can be trvially converted to a numpy array and this is int32:

In[57]:
np.array(range(6)).dtype

Out[57]: dtype('int32')

So the construction of the pandas seriesinvolves a dtype matching in the first instance and none for the second because it's convertible to a numpy array and numpy has determined that int32 is preferred in this case

update

It looks like this is dependant on your numpy version and maybe pandas version. I'm running python 3.6, numpy 1.12.1 and pandas 0.20.3 and I get the above result. I'm also running Windows 7 64-bit

@jeremycg is running pandas 0.19.2 and numpy 1.11.2 and observes the same result whilst @coldspeed is running numpy 1.13.1 and observes int64.

The takeaway from this that the dtype will largely be determined by what numpy does.

I believe that this line is what is called when we pass range in this case.

subarr = np.array(arr, dtype=object, copy=copy)

The returned type is determined by numpy and OS, in my case windows has defined a C Long as being 32-bits. See related: numpy array dtype is coming as int32 by default in a windows 10 64 bit machine

like image 160
EdChum Avatar answered Sep 21 '22 13:09

EdChum