Take the difference of all elements of a series with the previous ones in python pandas

Tags:

I have a dataframe with sorted values labeled by ids and I want to take the difference of the value for the first element of an id with the value of the last elements of the all previous ids. The code below does what I want:

import pandas as pd

a = 'a'; b = 'b'; c = 'c'
df = pd.DataFrame(data=[*zip([a, a, a, b, b, c, a], [1, 2, 3, 5, 6, 7, 8])],
                  columns=['id', 'value'])
print(df)
# # take the last value for a particular id
# last_value_for_id = df.loc[df.id.shift(-1) != df.id, :]
# print(last_value_for_id)
current_id = ''; prev_values = {};diffs = {}
for t in df.itertuples(index=False):
    prev_values[t.id] = t.value
    if current_id != t.id:
        current_id = t.id
    else: continue
    for k, v in prev_values.items():
        if k == current_id: continue
        diffs[(k, current_id)] = t.value - v
print(pd.DataFrame(data=diffs.values(), columns=['diff'], index=diffs.keys()))

prints:

  id  value
0  a      1
1  a      2
2  a      3
3  b      5
4  b      6
5  c      7
6  a      8
     diff
a b     2
  c     4
b c     1
  a     2
c a     1

I want to do this in a vectorized manner however. I have found a way of getting the series of last elements as in:

# take the last value for a particular id
last_value_for_id = df.loc[df.id.shift(-1) != df.id, :]
print(last_value_for_id)

which gives me:

  id  value
2  a      3
4  b      6
5  c      7

but can't find a way of using this to take the diffs in a vectorized manner

236

asked May 14 '19 13:05

Mr_and_Mrs_D

1 Answers

Depending on how many ids you have, this works with few thousands:

# enumerate ids, should be careful
ids = [a,b,c]
num_ids = len(ids)

# compute first and last
f = df.groupby('id').value.agg(['first','last'])

# lower triangle mask
mask = np.array([[i>=j for j in range(num_ids)] for i in range(num_ids)])

# compute diff of first and last, then mask 
diff = np.where(mask, None, f['first'][None,:] - f['last'][:,None])
diff = pd.DataFrame(diff,
                    index = ids,
                    columns = ids)
# stack
diff.stack()

output:

a  b    2
   c    4
b  c    1
dtype: object

Edit for updated data:

For the updated data, approach is similar if we can create the f table:

# create blocks of consecutive id
blocks = df['id'].ne(df['id'].shift()).cumsum()

# groupby
groups = df.groupby(blocks)

# create first and last values
df['fv'] = groups.value.transform('first')
df['lv'] = groups.value.transform('last')

# the above f and ids 
# note the column name change
f = df[['id','fv', 'lv']].drop_duplicates()
ids = f['id'].values
num_ids = len(ids)

Output:

a   b     2
    c     4
    a     5
b   c     1
    a     2
c   a     1
dtype: object

If you want to go further and drop the index (a,a), well, I'm so lazy :D.

answered Oct 19 '22 07:10

Quang Hoang

Related questions
                            
                                How can I read pickle file containing pandas data frame from qrc resource file with pandas read_pickle?
                            
                                Is it possible to use a custom filter function in pandas?
                            
                                pandas: Fill missing dates when keeping duplicates
                            
                                Pandas DataFrame: mean of column B values within column A windows
                            
                                Django update_or_create (get part) using related object as kwarg
                            
                                How to put multiple colormap patches in a matplotlib legend?
                            
                                Convert UTC timestamp to local timezone issue in pandas
                            
                                How to import one databricks notebook into another?
                            
                                Joining Two Different Dataframes on Timestamp
                            
                                Calculating Rolling forward averages with pandas
                            
                                How to validate html forms in python Flask?
                            
                                Why is there so much speed difference between these two variants?
                            
                                Extracting parts of array repeatedly
                            
                                when extending python with c, how do one cope with arbitrary size integers?
                            
                                How to create a tree from a list of subtrees?
                            
                                What is the best way to run python scripts in AWS?
                            
                                Why is my Flask error handler not being called?
                            
                                Overhead of python multiprocessing initialization is worse than benefits
                            
                                Binary-vectorize pandas DataFrame column
                            
                                How does pytest.approx accomplish its magic?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Take the difference of all elements of a series with the previous ones in python pandas

Tags:

python

pandas

vectorization

Mr_and_Mrs_D

People also ask

1 Answers

Edit for updated data:

Quang Hoang

Recent Activity

Donate For Us