Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Relative Time Pivot

Tags:

I have the last eight months of my customers' data, however these months are not the same months, just the last months they happened to be with us. Monthly fees and penalties are stored in rows, but I want each of the last eight months to be a column.

What I have:

Customer Amount Penalties Month
123      500    200       1/7/2017
123      400    100       1/6/2017
   ...
213      300    150       1/4/2015
213      200    400       1/3/2015

What I want:

Customer Month-8-Amount Month-7-Amount ... Month-1-Amount Month-1-Penalties ...
123      500            400                450            300
213      900            250                300            200
...

What I've tried:

df = df.pivot(index=num, columns=[amount,penalties])

I got this error:

ValueError: all arrays must be same length

Is there some ideal way to do this?

like image 289
Memduh Avatar asked May 02 '18 14:05

Memduh


People also ask

Is pandas good for time series?

Pandas' time series tools are very useful when data is timestamped. Timestamp is the pandas equivalent of python's Datetime. It's the type used for the entries that make up a DatetimeIndex, and other timeseries-oriented data structures in pandas.

What does PD DatetimeIndex do?

DatetimeIndex. Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information. If data is None, start is used as the start point in generating regular timestamp data.

How do pandas deal with datetime?

Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.


1 Answers

You can do it with unstack and set_index

# assuming all date is sort properly , then we do cumcount
df['Month']=df.groupby('Customer').cumcount()+1 

# slice the most recent 8 one 
df=df.loc[df.Month<=8,:]# slice the most recent 8 one 

# doing unstack to reshape your df 
s=df.set_index(['Customer','Month']).unstack().sort_index(level=1,axis=1)

# flatten multiple index to one 
s.columns=s.columns.map('{0[0]}-{0[1]}'.format) 
s.add_prefix("Month-")
Out[189]: 
          Month-Amount-1  Month-Penalties-1  Month-Amount-2  Month-Penalties-2
Customer                                                                      
123                  500                200             400                100
213                  300                150             200                400
like image 141
BENY Avatar answered Oct 11 '22 13:10

BENY