Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In python pandas, how can I re-sample and interpolate a DataFrame?

I have a pd DataFrame, typically on this format:

   1       2          3          4  
0.1100 0.0000E+00 1.0000E+00 5.0000E+00  
0.1323 7.7444E-05 8.7935E-01 1.0452E+00  
0.1545 4.3548E-04 7.7209E-01 4.5432E-01  
0.1768 1.2130E-03 6.7193E-01 2.6896E-01  
0.1990 2.5349E-03 5.7904E-01 1.8439E-01  
0.2213 4.5260E-03 4.9407E-01 1.3771E-01 

What I would like to do is re-sample the column 1 (index) values from a list, for example represented by:

indexList = numpy.linspace(0.11, 0.25, 8)

Then I need the values for columns 2, 3 and 4 to be linearly interpolated from the input DataFrame (it is always only my column 1 that I re-sample/reindex) - and if necessary extrapolated, as the min/max values for my list is not necessarily within my existing column 1 (index). However the key point is the interpolation part. I am quite new to python, but I was thinking using an approach like this:

  1. output_df = DataFrame.reindex(index=indexList) - this will give me mainly NaN's for columns 2-4.
  2. for index, row in output_df.iterrows()
    "function that calculates interpolated/extrapolated values from DataFrame and inserts them at correct row/column"

Somehow it feels like I should be able to use the .interpolate functionality, but I cannot figure out how. I cannot use it straightforward - it will be too inaccurate since most of my entries after re-indexing as mentioned in columns 2-4 will be NaN's; the interpolation should be done within the two closest values of my initial DataFrame. Any good tips anyone? (and if my format/intension is unclear, please let me know...)

like image 873
Marius Avatar asked Jan 05 '17 19:01

Marius


People also ask

How do I resample data in pandas?

Pandas Series: resample() functionThe resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

What is resampling in Python?

Resampling is used in time series data. This is a convenience method for frequency conversion and resampling of time series data. Although it works on the condition that objects must have a datetime-like index for example, DatetimeIndex, PeriodIndex, or TimedeltaIndex.

How do pandas interpolate missing values?

You can interpolate missing values ( NaN ) in pandas. DataFrame and Series with interpolate() . This article describes the following contents. Use dropna() and fillna() to remove missing values NaN or to fill them with a specific value.


1 Answers

Assuming column 1 is in the index, you can reindex your dataframe with the original values along with the list you created and then use interpolate to fill in the nan's.

df1 = df.reindex(df.index.union(np.linspace(.11,.25,8)))
df1.interpolate('index')

               2         3         4
0.1100  0.000000  1.000000  5.000000
0.1300  0.000069  0.891794  1.453094
0.1323  0.000077  0.879350  1.045200
0.1500  0.000363  0.793832  0.574093
0.1545  0.000435  0.772090  0.454320
0.1700  0.000976  0.702472  0.325482
0.1768  0.001213  0.671930  0.268960
0.1900  0.001999  0.616698  0.218675
0.1990  0.002535  0.579040  0.184390
0.2100  0.003517  0.537127  0.161364
0.2213  0.004526  0.494070  0.137710
0.2300  0.004526  0.494070  0.137710
0.2500  0.004526  0.494070  0.137710
like image 193
Ted Petrou Avatar answered Oct 16 '22 00:10

Ted Petrou