Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resampling pandas dataframe is deleting column

Tags:

python

pandas

                    Val         ts  year  doy     interpolat  region_id
2000-02-18          NaN  950832000  2000   49           NaN      19987
2000-03-05          NaN  952214400  2000   65           NaN      19987
2000-03-21          NaN  953596800  2000   81           NaN      19987
2000-04-06  0.402539365  954979200  2000   97           NaN      19987
2000-04-22   0.54021746  956361600  2000  113           NaN      19987

The above dataframe has a datetime index. I resample it like so:

df = df.resample('D')

However, this resampling results in this dataframe:

                    ts  year  doy    interpolat  region_id
2000-01-01  1199180160  2008    1             1      19990
2000-01-02         NaN   NaN  NaN           NaN        NaN
2000-01-03         NaN   NaN  NaN           NaN        NaN
2000-01-04         NaN   NaN  NaN           NaN        NaN
2000-01-05         NaN   NaN  NaN           NaN        NaN

Why did the 'Val' column disappear? and all the other columns seem messed up too. See Linearly interpolate missing rows in pandas dataframe for an explanation of where the dataframe is coming from.

--EDIT Based on @unutbu's questions:

df.reset_index().to_dict('list')

{'index': [Timestamp('2000-02-18 00:00:00'), Timestamp('2000-03-05 00:00:00'), Timestamp('2000-03-21 00:00:00'), ... '0.670709965', '0.631584375', '0.562112815', '0.50740686', '0.4447712', '0.47880806', nan, nan]}

-- EDIT: The csv file for the above data frame in its entirety is here:

https://www.dropbox.com/s/dp76hk6yfs6c1og/test.csv?dl=0

like image 937
user308827 Avatar asked Dec 13 '15 22:12

user308827


People also ask

What does resample do pandas?

Resampling is used in time series data. This is a convenience method for frequency conversion and resampling of time series data. Although it works on the condition that objects must have a datetime-like index for example, DatetimeIndex, PeriodIndex, or TimedeltaIndex.

Does DataFrame preserve order?

DataFrame doesn't preserve the column order when converting from a DataFrames. DataFrame #72.

How do you resample a dataset in Python?

resample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .


1 Answers

The Val columns will probably not have a numerical dtype for some reason, and all non-numerical (eg object dtype) columns are removed in resample.

To check, just look at df.info().
To convert it to a numerical columns, you can use astype(float) or the convert_objects (pd.to_numeric starting from v0.17).

like image 71
joris Avatar answered Oct 02 '22 10:10

joris