Is there a general, efficient way to assign values to a subset of a DataFrame in pandas? I've got hundreds of rows and columns that I can access directly but I haven't managed to figure out how to edit their values without iterating through each row,col pair. For example:
In [1]: import pandas, numpy
In [2]: array = numpy.arange(30).reshape(3,10)
In [3]: df = pandas.DataFrame(array, index=list("ABC"))
In [4]: df
Out[4]:
0 1 2 3 4 5 6 7 8 9
A 0 1 2 3 4 5 6 7 8 9
B 10 11 12 13 14 15 16 17 18 19
C 20 21 22 23 24 25 26 27 28 29
In [5]: rows = ['A','C']
In [6]: columns = [1,4,7]
In [7]: df[columns].ix[rows]
Out[7]:
1 4 7
A 1 4 7
C 21 24 27
In [8]: df[columns].ix[rows] = 900
In [9]: df
Out[9]:
0 1 2 3 4 5 6 7 8 9
A 0 1 2 3 4 5 6 7 8 9
B 10 11 12 13 14 15 16 17 18 19
C 20 21 22 23 24 25 26 27 28 29
I believe what is happening here is that I'm getting a copy rather than a view, meaning I can't assign to the original DataFrame. Is that my problem? What's the most efficient way to edit those rows x columns (preferably in-pace, as the DataFrame may take up a lot of memory)?
Also, what if I want to replace those values with a correctly shaped DataFrame?
To set the DataFrame index using existing columns or arrays in Pandas, use the set_index() method. The set_index() function sets the DataFrame index using existing columns. The index can replace the existing index or expand on it.
To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.
Pandas DataFrame – Get Index To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.
Use loc
in an assignment expression (the =
means it's not relevant whether it is a view or a copy!):
In [11]: df.loc[rows, columns] = 99
In [12]: df
Out[12]:
0 1 2 3 4 5 6 7 8 9
A 0 99 2 3 99 5 6 99 8 9
B 10 11 12 13 14 15 16 17 18 19
C 20 99 22 23 99 25 26 99 28 29
If you're using a version prior to 0.11 you can use .ix
.
As @Jeff comments:
This is an assignment expression (see 'advanced indexing with ix' section of the docs) and doesn't return anything (although there are assignment expressions which do return things, e.g.
.at
and.iat
).
df.loc[rows,columns]
can return a view, but usually it's a copy. Confusing, but done for efficiency.Bottom line: use
ix
,loc
,iloc
to set (as above), and don't modify copies.
See 'view versus copy' section of the docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With