Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iterrows pandas get next rows value

I have a df in pandas

import pandas as pd df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value']) 

I want to iterate over rows in df. For each row i want rows value and next rows value Something like(it does not work):

for i, row in df.iterrows():      print row['value']      i1, row1 = next(df.iterrows())      print row1['value'] 

As a result I want

'AA' 'BB' 'BB' 'CC' 'CC' *Wrong index error here   

At this point i have mess way to solve this

for i in range(0, df.shape[0])    print df.irow(i)['value']    print df.irow(i+1)['value'] 

Is there more efficient way to solve this issue?

like image 862
Ayrat Avatar asked Apr 18 '14 09:04

Ayrat


People also ask

What does Iterrows return Pandas?

iterrows() is used to iterate over a pandas Data frame rows in the form of (index, series) pair. This function iterates over the data frame column, it will return a tuple with the column name and content in form of series.

Is Iterrows faster than apply?

This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes. See below for an example of how we could use apply for labeling the species in each row.

Is Itertuples faster than Iterrows?

Itertuples(): Itertuples() iterates through the data frame by converting each row of data as a list of tuples. itertuples() takes 16 seconds to iterate through a data frame with 10 million records that are around 50x times faster than iterrows().


2 Answers

Firstly, your "messy way" is ok, there's nothing wrong with using indices into the dataframe, and this will not be too slow. iterrows() itself isn't terribly fast.

A version of your first idea that would work would be:

row_iterator = df.iterrows() _, last = row_iterator.next()  # take first item from row_iterator for i, row in row_iterator:     print(row['value'])     print(last['value'])     last = row 

The second method could do something similar, to save one index into the dataframe:

last = df.irow(0) for i in range(1, df.shape[0]):     print(last)     print(df.irow(i))     last = df.irow(i) 

When speed is critical you can always try both and time the code.

like image 177
alisdt Avatar answered Sep 19 '22 11:09

alisdt


There is a pairwise() function example in the itertools document:

from itertools import tee, izip def pairwise(iterable):     "s -> (s0,s1), (s1,s2), (s2, s3), ..."     a, b = tee(iterable)     next(b, None)     return izip(a, b)  import pandas as pd df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])  for (i1, row1), (i2, row2) in pairwise(df.iterrows()):     print i1, i2, row1["value"], row2["value"] 

Here is the output:

0 1 AA BB 1 2 BB CC 

But, I think iter rows in a DataFrame is slow, if you can explain what's the problem you want to solve, maybe I can suggest some better method.

like image 37
HYRY Avatar answered Sep 17 '22 11:09

HYRY