I am trying to iterate over the rows of a Python Pandas dataframe. Within each row of the dataframe, I am trying to to refer to each value along a row by its column name.
Here is what I have:
import numpy as np import pandas as pd df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD')) print df A B C D 0 0.351741 0.186022 0.238705 0.081457 1 0.950817 0.665594 0.671151 0.730102 2 0.727996 0.442725 0.658816 0.003515 3 0.155604 0.567044 0.943466 0.666576 4 0.056922 0.751562 0.135624 0.597252 5 0.577770 0.995546 0.984923 0.123392 6 0.121061 0.490894 0.134702 0.358296 7 0.895856 0.617628 0.722529 0.794110 8 0.611006 0.328815 0.395859 0.507364 9 0.616169 0.527488 0.186614 0.278792
I used this approach to iterate, but it is only giving me part of the solution - after selecting a row in each iteration, how do I access row elements by their column name?
Here is what I am trying to do:
for row in df.iterrows(): print row.loc[0,'A'] print row.A print row.index()
My understanding is that the row is a Pandas series. But I have no way to index into the Series.
Is it possible to use column names while simultaneously iterating over rows?
DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .
One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax ([]) . Yields below output. The values() function is used to extract the object elements as a list.
iterrows() to Iterate Over Rows. pandas DataFrame. iterrows() is used to iterate over DataFrame rows. This returns (index, Series) where the index is an index of the Row and Series is data or content of each row.
You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.
I also like itertuples()
for row in df.itertuples(): print(row.A) print(row.Index)
since row is a named tuples, if you meant to access values on each row this should be MUCH faster
speed run :
df = pd.DataFrame([x for x in range(1000*1000)], columns=['A']) st=time.time() for index, row in df.iterrows(): row.A print(time.time()-st) 45.05799984931946 st=time.time() for row in df.itertuples(): row.A print(time.time() - st) 0.48400020599365234
The item from iterrows()
is not a Series, but a tuple of (index, Series), so you can unpack the tuple in the for loop like so:
for (idx, row) in df.iterrows(): print(row.loc['A']) print(row.A) print(row.index) #0.890618586836 #0.890618586836 #Index(['A', 'B', 'C', 'D'], dtype='object')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With