Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding row to pandas DataFrame changes dtype

Tags:

python

pandas

The problem I have is that adding a row to DataFrame changes dtype of columns:

>>> from pandas import DataFrame
>>> df = DataFrame({'a' : range(10)}, dtype='i4')
>>> df
   a
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

[10 rows x 1 columns]

I specifically specified dtype to be int32 (i.e., 'i4'), as can be seen:

>>> df.dtypes
a    int32
dtype: object

However, adding a row changes dtype to float64:

>>> df.loc[10] = 99

>>> df
     a
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  99

[11 rows x 1 columns]

>>> df.dtypes
a    float64
dtype: object

I've tried specifying the dtype of the value that I add:

>>> import numpy as np
>>> df = DataFrame({'a' : np.arange(10, dtype=np.int32)})

>>> df.dtypes
a    int32
dtype: object

>>> df.loc[10] = np.int32(0)

>>> df.dtypes
a    float64
dtype: object

But that does not work either. Is there any solution, without using functions that return new objects?

like image 963
Ben Avatar asked Feb 26 '14 14:02

Ben


People also ask

How to add a row to an existing pandas Dataframe?

Add a row to the existing pandas DataFrame object at a specific index position using DataFrame.iloc [] method. NOTE: Kindly take care while using the DataFrame.iloc [] method, as it replaces the existing row at that index position with the new row.

How to change data type for one or more columns in pandas?

Let’s see the different ways of changing Data Type for one or more columns in Pandas Dataframe. Method #1: Using DataFrame.astype () We can pass any Python, Numpy or Pandas datatype to change all columns of a dataframe to that type, or we can pass a dictionary having column names as keys and datatype as values to change type of selected columns.

How to append a row to the end of a Dataframe?

And you can use the df.append () function to append several rows of an existing DataFrame to the end of another DataFrame: The following examples show how to use these functions in practice. The following code shows how to add one row to the end of a pandas DataFrame:

How do I select a subset of my data in pandas?

Here’s a look at how you can use the pandas . loc method to select a subset of your data and edit it if it meets a condition. Note, before trying any of the code below, don’t forget to import pandas. The Pandas documentation has this description for “ .loc [] ”: Access a group of rows and columns (in a .DataFrame) by label (s) or a boolean array.


1 Answers

Enlargment is done in 2 stages, and a nan is placed in that column first, then its assigned, so that is why it is coerced. I'll put it on the bug/enhancement list. Its a bit non-trivial.

Here's a workaround, by using append.

In [14]: df.append(Series(99,[10],dtype='i4').to_frame('a'))
Out[14]: 
     a
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  99

[11 rows x 1 columns]

In [15]: df.append(Series(99,[10],dtype='i4').to_frame('a')).dtypes
Out[15]: 
a    int32
dtype: object

An issue for the bug/enhancement to do this automagically: https://github.com/pydata/pandas/issues/6485

like image 114
Jeff Avatar answered Sep 16 '22 12:09

Jeff