I have a DataFrame
from Pandas:
import pandas as pd inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}] df = pd.DataFrame(inp) print df
Output:
c1 c2 0 10 100 1 11 110 2 12 120
Now I want to iterate over the rows of this frame. For every row I want to be able to access its elements (values in cells) by the name of the columns. For example:
for row in df.rows: print row['c1'], row['c2']
Is it possible to do that in Pandas?
I found this similar question. But it does not give me the answer I need. For example, it is suggested there to use:
for date, row in df.T.iteritems():
or
for row in df.iterrows():
But I do not understand what the row
object is and how I can work with it.
DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .
Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.
One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax ([]) . Yields below output. The values() function is used to extract the object elements as a list.
DataFrame.iterrows
is a generator which yields both the index and row (as a Series):
import pandas as pd df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]}) df = df.reset_index() # make sure indexes pair with number of rows for index, row in df.iterrows(): print(row['c1'], row['c2'])
10 100 11 110 12 120
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With