Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame, default data type for 1, 2, 3, and NaN values

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df ['one']

Output:

    a    1.0

    b    2.0

    c    3.0

    d    NaN

Name: one, dtype: float64

The value is set as float

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}

df = pd.DataFrame(d)
print df ['one']

Output:

a    1

b    2

c    3

Name: one, dtype: int64

But now the value is set as int64.

The difference is the first one, there is a NaN in the value.

What is the rule behind the set up of the data types in the above examples?

Thanks!

like image 340
searain Avatar asked Jan 02 '23 03:01

searain


1 Answers

Type of NaN is float, so pandas will infer all ints numbers to be floats too.

This can be easily checked :

>>> type(np.nan) 
float 

I would recommend this interesting read

like image 53
rafaelc Avatar answered Jan 05 '23 17:01

rafaelc