I have a df in pandas
import pandas as pd df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value'])
I want to iterate over rows in df. For each row i want rows value and next row
s value Something like(it does not work):
for i, row in df.iterrows(): print row['value'] i1, row1 = next(df.iterrows()) print row1['value']
As a result I want
'AA' 'BB' 'BB' 'CC' 'CC' *Wrong index error here
At this point i have mess way to solve this
for i in range(0, df.shape[0]) print df.irow(i)['value'] print df.irow(i+1)['value']
Is there more efficient way to solve this issue?
iterrows() is used to iterate over a pandas Data frame rows in the form of (index, series) pair. This function iterates over the data frame column, it will return a tuple with the column name and content in form of series.
This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes. See below for an example of how we could use apply for labeling the species in each row.
Itertuples(): Itertuples() iterates through the data frame by converting each row of data as a list of tuples. itertuples() takes 16 seconds to iterate through a data frame with 10 million records that are around 50x times faster than iterrows().
Firstly, your "messy way" is ok, there's nothing wrong with using indices into the dataframe, and this will not be too slow. iterrows() itself isn't terribly fast.
A version of your first idea that would work would be:
row_iterator = df.iterrows() _, last = row_iterator.next() # take first item from row_iterator for i, row in row_iterator: print(row['value']) print(last['value']) last = row
The second method could do something similar, to save one index into the dataframe:
last = df.irow(0) for i in range(1, df.shape[0]): print(last) print(df.irow(i)) last = df.irow(i)
When speed is critical you can always try both and time the code.
There is a pairwise()
function example in the itertools
document:
from itertools import tee, izip def pairwise(iterable): "s -> (s0,s1), (s1,s2), (s2, s3), ..." a, b = tee(iterable) next(b, None) return izip(a, b) import pandas as pd df = pd.DataFrame(['AA', 'BB', 'CC'], columns = ['value']) for (i1, row1), (i2, row2) in pairwise(df.iterrows()): print i1, i2, row1["value"], row2["value"]
Here is the output:
0 1 AA BB 1 2 BB CC
But, I think iter rows in a DataFrame
is slow, if you can explain what's the problem you want to solve, maybe I can suggest some better method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With