Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

set new index for pandas DataFrame (interpolating?)

I have a DataFrame where the index is NOT time. I need to re-scale all of the values from an old index which is not equi-spaced, to a new index which has different limits and is equi-spaced.

The first and last values in the columns should stay as they are (although they will have the new, stretched index values assigned to them).

Example code is:

import numpy as np
import pandas as pd
%matplotlib inline

index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)

df = pd.DataFrame(x, index=index)
df.plot();

newindex = np.linspace(0, 29, 100)

How do I create a DataFrame where the index is newindex and the new x values are interpolated from the old x values?

The first new x value should be the same as the first old x value. Ditto for the last x value. That is, there should not be NaNs at the beginning and copies of the last old x repeated at the end.

The others should be interpolated to fit the new equi-spaced index.

I tried df.interpolate() but couldn't work out how to interpolate against the newindex.

Thanks in advance for any help.

like image 223
blokeley Avatar asked Jan 02 '18 22:01

blokeley


3 Answers

This is works well:

import numpy as np
import pandas as pd

def interp(df, new_index):
    """Return a new DataFrame with all columns values interpolated
    to the new_index values."""
    df_out = pd.DataFrame(index=new_index)
    df_out.index.name = df.index.name

    for colname, col in df.iteritems():
        df_out[colname] = np.interp(new_index, df.index, col)

    return df_out
like image 96
blokeley Avatar answered Oct 23 '22 14:10

blokeley


I have adopted the following solution:

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

def reindex_and_interpolate(df, new_index):
    return df.reindex(df.index | new_index).interpolate(method='index', limit_direction='both').loc[new_index]

index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)

df = pd.DataFrame(x, index=index)

newindex = pd.Float64Index(np.linspace(min(index)-5, max(index)+5, 50))

df_reindexed = reindex_and_interpolate(df, newindex)

plt.figure()
plt.scatter(df.index, df.values, color='red', alpha=0.5)
plt.scatter(df_reindexed.index, df_reindexed.values,  color='green', alpha=0.5)
plt.show()

enter image description here

like image 30
João Abrantes Avatar answered Oct 23 '22 13:10

João Abrantes


I wonder if you're up against one of pandas limitations; it seems like you have limited choices for aligning your df to an arbitrary set of numbers (your newindex).

For example, your stated newindex only overlaps with the first and last numbers in index, so linear interpolation (rightly) interpolates a straight line between the start (2) and end (27) of your index.

import numpy as np
import pandas as pd
%matplotlib inline

index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)

df = pd.DataFrame(x, index=index)

newindex = np.linspace(min(index), max(index), 100)

df_reindexed = df.reindex(index = newindex)
df_reindexed.interpolate(method = 'linear', inplace = True)

df.plot()
df_reindexed.plot()

image1

If you change newindex to provide more overlapping points with your original data set, interpolation works in a more expected manner:

newindex = np.linspace(min(index), max(index), 26)

df_reindexed = df.reindex(index = newindex)
df_reindexed.interpolate(method = 'linear', inplace = True)

df.plot()
df_reindexed.plot()

image2

There are other methods that do not require one to manually align the indices, but the resulting curve (while technically correct) is probably not what one wants:

newindex = np.linspace(min(index), max(index), 1000)

df_reindexed = df.reindex(index = newindex, method = 'ffill')

df.plot()
df_reindexed.plot()

image3

I looked at the pandas docs but I couldn't identify an easy solution.

https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-reindexing

like image 33
Evan Avatar answered Oct 23 '22 13:10

Evan