Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas iterate over rows and access column names

I am trying to iterate over the rows of a Python Pandas dataframe. Within each row of the dataframe, I am trying to to refer to each value along a row by its column name.

Here is what I have:

import numpy as np import pandas as pd  df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD')) print df           A         B         C         D 0  0.351741  0.186022  0.238705  0.081457 1  0.950817  0.665594  0.671151  0.730102 2  0.727996  0.442725  0.658816  0.003515 3  0.155604  0.567044  0.943466  0.666576 4  0.056922  0.751562  0.135624  0.597252 5  0.577770  0.995546  0.984923  0.123392 6  0.121061  0.490894  0.134702  0.358296 7  0.895856  0.617628  0.722529  0.794110 8  0.611006  0.328815  0.395859  0.507364 9  0.616169  0.527488  0.186614  0.278792 

I used this approach to iterate, but it is only giving me part of the solution - after selecting a row in each iteration, how do I access row elements by their column name?

Here is what I am trying to do:

for row in df.iterrows():     print row.loc[0,'A']     print row.A     print row.index() 

My understanding is that the row is a Pandas series. But I have no way to index into the Series.

Is it possible to use column names while simultaneously iterating over rows?

like image 987
edesz Avatar asked Apr 25 '17 19:04

edesz


People also ask

How do I iterate over rows in pandas DataFrame?

DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .

How do I iterate through a Pandas column?

One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax ([]) . Yields below output. The values() function is used to extract the object elements as a list.

How do you iterate through a row in Python?

iterrows() to Iterate Over Rows. pandas DataFrame. iterrows() is used to iterate over DataFrame rows. This returns (index, Series) where the index is an index of the Row and Series is data or content of each row.

How do you get the names of all columns in pandas?

You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.


2 Answers

I also like itertuples()

for row in df.itertuples():     print(row.A)     print(row.Index) 

since row is a named tuples, if you meant to access values on each row this should be MUCH faster

speed run :

df = pd.DataFrame([x for x in range(1000*1000)], columns=['A']) st=time.time() for index, row in df.iterrows():     row.A print(time.time()-st) 45.05799984931946  st=time.time() for row in df.itertuples():     row.A print(time.time() - st) 0.48400020599365234 
like image 192
Steven G Avatar answered Sep 18 '22 17:09

Steven G


The item from iterrows() is not a Series, but a tuple of (index, Series), so you can unpack the tuple in the for loop like so:

for (idx, row) in df.iterrows():     print(row.loc['A'])     print(row.A)     print(row.index)  #0.890618586836 #0.890618586836 #Index(['A', 'B', 'C', 'D'], dtype='object') 
like image 35
Psidom Avatar answered Sep 21 '22 17:09

Psidom