I'm starting to learn python, numpy and panda's and I have a really basic question, about sizes.
Please see the next code blocks:
1. Length: 6, dtype: int64
# create a Series from a dict
pd.Series({key: value for key, value in zip('abcdef', range(6))})
vs.
2. Length: 6, dtype: int32
# but why does this generate a smaller integer size???
pd.Series(range(6), index=list('abcdef'))
Question So I think when you put a list, numpy array, dictionary etc. in the pd.Series you will get int64 but when you put just the range(6) in the pd.Series you will get int32. Can someone please make this a little bit clear to me?
Sorry for the very basic question.
@Edit : I'm using Pandas version 0.20.1 and Numpy 1.12.1
Python is commonly used for developing websites and software, task automation, data analysis, and data visualization. Since it's relatively easy to learn, Python has been adopted by many non-programmers such as accountants and scientists, for a variety of everyday tasks, like organizing finances.
The Python += Operator. The Python += operator adds two values together and assigns the final value to a variable. This operator is called the addition assignment operator.
Python is written in C (actually the default implementation is called CPython).
Python is widely considered among the easiest programming languages for beginners to learn. If you're interested in learning a programming language, Python is a good place to start. It's also one of the most widely used.
They're semantically different in that in the first version you pass a dict with a single scalar value so the dtype becomes int64
, for the second, you pass a range
which can be trvially converted to a numpy array and this is int32
:
In[57]:
np.array(range(6)).dtype
Out[57]: dtype('int32')
So the construction of the pandas series
involves a dtype matching in the first instance and none for the second because it's convertible to a numpy array and numpy has determined that int32
is preferred in this case
update
It looks like this is dependant on your numpy
version and maybe pandas
version.
I'm running python 3.6, numpy 1.12.1 and pandas 0.20.3 and I get the above result. I'm also running Windows 7 64-bit
@jeremycg is running pandas 0.19.2
and numpy
1.11.2 and observes the same result whilst @coldspeed is running numpy
1.13.1 and observes int64
.
The takeaway from this that the dtype
will largely be determined by what numpy
does.
I believe that this line is what is called when we pass range
in this case.
subarr = np.array(arr, dtype=object, copy=copy)
The returned type is determined by numpy
and OS, in my case windows has defined a C Long as being 32-bits. See related: numpy array dtype is coming as int32 by default in a windows 10 64 bit machine
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With