The problem I have is that adding a row to DataFrame changes dtype of columns:
>>> from pandas import DataFrame
>>> df = DataFrame({'a' : range(10)}, dtype='i4')
>>> df
a
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
[10 rows x 1 columns]
I specifically specified dtype to be int32 (i.e., 'i4'), as can be seen:
>>> df.dtypes
a int32
dtype: object
However, adding a row changes dtype to float64:
>>> df.loc[10] = 99
>>> df
a
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 99
[11 rows x 1 columns]
>>> df.dtypes
a float64
dtype: object
I've tried specifying the dtype of the value that I add:
>>> import numpy as np
>>> df = DataFrame({'a' : np.arange(10, dtype=np.int32)})
>>> df.dtypes
a int32
dtype: object
>>> df.loc[10] = np.int32(0)
>>> df.dtypes
a float64
dtype: object
But that does not work either. Is there any solution, without using functions that return new objects?
Add a row to the existing pandas DataFrame object at a specific index position using DataFrame.iloc [] method. NOTE: Kindly take care while using the DataFrame.iloc [] method, as it replaces the existing row at that index position with the new row.
Let’s see the different ways of changing Data Type for one or more columns in Pandas Dataframe. Method #1: Using DataFrame.astype () We can pass any Python, Numpy or Pandas datatype to change all columns of a dataframe to that type, or we can pass a dictionary having column names as keys and datatype as values to change type of selected columns.
And you can use the df.append () function to append several rows of an existing DataFrame to the end of another DataFrame: The following examples show how to use these functions in practice. The following code shows how to add one row to the end of a pandas DataFrame:
Here’s a look at how you can use the pandas . loc method to select a subset of your data and edit it if it meets a condition. Note, before trying any of the code below, don’t forget to import pandas. The Pandas documentation has this description for “ .loc [] ”: Access a group of rows and columns (in a .DataFrame) by label (s) or a boolean array.
Enlargment is done in 2 stages, and a nan
is placed in that column first, then its assigned, so that is why it is coerced. I'll put it on the bug/enhancement list. Its a bit non-trivial.
Here's a workaround, by using append.
In [14]: df.append(Series(99,[10],dtype='i4').to_frame('a'))
Out[14]:
a
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 99
[11 rows x 1 columns]
In [15]: df.append(Series(99,[10],dtype='i4').to_frame('a')).dtypes
Out[15]:
a int32
dtype: object
An issue for the bug/enhancement to do this automagically: https://github.com/pydata/pandas/issues/6485
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With