I'm having problems getting the daily average in a Pandas database. I've checked here Calculating daily average from irregular time series using pandas and it doesn't help. csv files look like this:
Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666
and so on. My code looks like this:
# Import iButton temperatures
flistloc = '../data/iButtons/Readings/edit'
flist = os.listdir(flistloc)
# Create empty dictionary to store db for each file
pdib = {}
for file in flist:
file = os.path.join(flistloc,file)
# Calls function to return only name
fname,_,_,_= namer(file)
# Read each file to db
pdib[fname] = pd.read_csv(file, parse_dates=0, dayfirst=True, index_col=0)
pdibkeys = sorted(pdib.keys())
#
# Calculate daily average for each iButton
for name in pdibkeys:
pdib[name]['daily'] = pdib[name].resample('D', how = 'mean')
The database seems ok but the averaging doesn't work. Here is what one looks like in iPython:
'2B5DE4': <class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1601 entries, 2013-08-12 12:00:01 to 2013-09-14 20:00:01
Data columns (total 2 columns):
Value 1601 non-null values
daily 0 non-null values
dtypes: float64(2)}
Anyone know what's going on?
The question is somewhat old, but i want to contribute anyway since i had to deal with this over and over again (and i think it's not really pythonic...).
The best solution, i have come up so far is to use the original index to create a new dataframe with mostly NA and fill it up at the end.
davg = df.resample('D', how='mean')
davg_NA = davg.loc[df.index]
davg_daily = davg_NA.fillna(method='ffill')
One can even cramp this in one line
df.resample('D', how='mean').loc[df.index].fillna(method='ffill')
You can't resample at a lower frequency and then assign the resampled DataFrame
or Series
back into the one you resampled from, because the indices don't match:
In [49]: df = pd.read_csv(StringIO("""Date/Time,Value
12/08/13 12:00:01,5.553
12/08/13 12:30:01,2.604
12/08/13 13:00:01,2.604
12/08/13 13:30:01,2.604
12/08/13 14:00:01,2.101
12/08/13 14:30:01,2.666"""), parse_dates=0, dayfirst=True, index_col=0)
In [50]: df.resample('D')
Out[50]:
Value
Date/Time
2013-08-12 3.022
[1 rows x 1 columns]
In [51]: df['daily'] = df.resample('D')
In [52]: df
Out[52]:
Value daily
Date/Time
2013-08-12 12:00:01 5.553 NaN
2013-08-12 12:30:01 2.604 NaN
2013-08-12 13:00:01 2.604 NaN
2013-08-12 13:30:01 2.604 NaN
2013-08-12 14:00:01 2.101 NaN
2013-08-12 14:30:01 2.666 NaN
[6 rows x 2 columns]
One option is to take advantage of partial time indexing on the rows:
davg = df.resample('D', how='mean')
df.loc[str(davg.index.date[0]), 'daily'] = davg.values
which looks like this, when you expand the str(davg.index.date[0])
line:
df.loc['2013-08-12', 'daily'] = davg.values
This is a bit of hack, there might be a better way to do it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With