Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iterate over pandas dataframe using itertuples

Tags:

python

pandas

I am iterating over a pandas dataframe using itertuples. I also want to capture the row number while iterating:

for row in df.itertuples():
    print row['name']

Expected output :

1 larry
2 barry
3 michael

1, 2, 3 are row numbers. I want to avoid using a counter and getting the row number. Is there an easy way to achieve this using pandas?

like image 662
Sun Avatar asked Apr 05 '17 03:04

Sun


People also ask

What is DF Itertuples ()?

DataFrame - itertuples() function The itertuples() function is used to iterate over DataFrame rows as namedtuples. Syntax: DataFrame.itertuples(self, index=True, name='Pandas')

How do I iterate through a pandas DataFrame?

DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .

What does Itertuples return?

itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row's corresponding index value, while the remaining values are the row values.

What is the fastest way to iterate over pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.


3 Answers

When using itertuples you get a named tuple for every row. By default, you can access the index value for that row with row.Index.

If the index value isn't what you were looking for then you can use enumerate

for i, row in enumerate(df.itertuples(), 1):
    print(i, row.name)

enumerate takes the place of an ugly counter construct

like image 168
piRSquared Avatar answered Oct 17 '22 11:10

piRSquared


for row in df.itertuples():
    print(getattr(row, 'Index'), getattr(row, 'name'))
like image 29
Ashok Kumar Pant Avatar answered Oct 17 '22 12:10

Ashok Kumar Pant


For column names that aren't valid Python names, use:

for i, row in enumerate(df.itertuples(index=False)):
    print(str(i) + row[df.columns.get_loc('My nasty - column / name')])

If you don't specify index=False, the column before the one named will be read.

like image 11
Chris Avatar answered Oct 17 '22 13:10

Chris