Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting values with pandas.DataFrame

Tags:

python

pandas

Having this DataFrame:

import pandas

dates = pandas.date_range('2016-01-01', periods=5, freq='H')
s = pandas.Series([0, 1, 2, 3, 4], index=dates)
df = pandas.DataFrame([(1, 2, s, 8)], columns=['a', 'b', 'foo', 'bar'])
df.set_index(['a', 'b'], inplace=True)

df

enter image description here

I would like to replace the Series in there with a new one that is simply the old one, but resampled to a day period (i.e. x.resample('D').sum().dropna()).

When I try:

df['foo'][0] = df['foo'][0].resample('D').sum().dropna()

That seems to work well:

enter image description here

However, I get a warning:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

The question is, how should I do this instead?

Notes

Things I have tried but do not work (resampling or not, the assignment raises an exception):

df.iloc[0].loc['foo'] = df.iloc[0].loc['foo']
df.loc[(1, 2), 'foo'] = df.loc[(1, 2), 'foo']
df.loc[df.index[0], 'foo'] = df.loc[df.index[0], 'foo']

A bit more information about the data (in case it is relevant):

  • The real DataFrame has more columns in the multi-index. Not all of them necessarily integers, but more generally numerical and categorical. The index is unique (i.e.: there is only one row with a given index value).
  • The real DataFrame has, of course, many more rows in it (thousands).
  • There are not necessarily only two columns in the DataFrame and there may be more than 1 columns containing a Series type. Columns usually contain series, categorical data and numerical data as well. Any single column is always single-typed (either numerical, or categorical, or series).
  • The series contained in each cell usually have a variable length (i.e.: two series/cells in the DataFrame do not, unless pure coincidence, have the same length, and will probably never have the same index anyway, as dates vary as well between series).

Using Python 3.5.1 and Pandas 0.18.1.

like image 365
Peque Avatar asked Jun 01 '16 13:06

Peque


People also ask

How do you assign a value to a DataFrame column?

DataFrame - assign() function The assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.

How do I change the value of a column in Pandas DataFrame?

In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values.

How do I change the value of Pandas?

Pandas DataFrame replace() Method The replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.


Video Answer


2 Answers

This should work:

df.iat[0, df.columns.get_loc('foo')] = df['foo'][0].resample('D').sum().dropna()

Pandas is complaining about chained indexing but when you don't do it that way it's facing problems assigning whole series to a cell. With iat you can force something like that. I don't think it would be a preferable thing to do, but seems like a working solution.

like image 95
ayhan Avatar answered Oct 09 '22 07:10

ayhan


Simply set df.is_copy = False before asignment of new value.

like image 28
Dark Matter Avatar answered Oct 09 '22 08:10

Dark Matter