I need to iterate over a pandas dataframe in order to pass each row as argument of a function (actually, class constructor) with **kwargs
. This means that each row should behave as a dictionary with keys the column names and values the corresponding ones for each row.
This works, but it performs very badly:
import pandas as pd
def myfunc(**kwargs):
try:
area = kwargs.get('length', 0)* kwargs.get('width', 0)
return area
except TypeError:
return 'Error : length and width should be int or float'
df = pd.DataFrame({'length':[1,2,3], 'width':[10, 20, 30]})
for i in range(len(df)):
print myfunc(**df.iloc[i])
Any suggestions on how to make that more performing ? I have tried iterating with tried df.iterrows()
,
but I get the following error :
TypeError: myfunc() argument after ** must be a mapping, not tuple
I have also tried df.itertuples()
and df.values
, but either I am missing something, or it means that I have to convert each tuple / np.array to a pd.Series or dict , which will also be slow.
My constraint is that the script has to work with python 2.7 and pandas 0.14.1.
DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .
Use DataFrame. To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.
You can loop through a dictionary by using a for loop. When looping through a dictionary, the return value are the keys of the dictionary, but there are methods to return the values as well.
You can use the for loop to iterate over columns of a DataFrame. You can use multiple methods to iterate over a pandas DataFrame like iteritems() , getitem([]) , transpose(). iterrows() , enumerate() and NumPy. asarray() function.
Therefore, by specifying the integer value of the row and column index, you can iterate over the rows of the pandas DataFrame. # Pass the integer-value locations of the rows or columns of the DataFrame to the iloc () function to iterate over them for i in range(len(df)): print(df.iloc[i, 0], df.iloc[i, 1])
According to the official documentation, iterrows () iterates "over the rows of a Pandas DataFrame as (index, Series) pairs". It converts each row into a Series object, which causes two problems:
Thus, to make it iterate over rows, you have to transpose (the "T"), which means you change rows and columns into each other (reflect over diagonal). As a result, you effectively iterate the original dataframe over its rows when you use df.T.iteritems()
Answer: DON'T*! Iteration in Pandas is an anti-pattern and is something you should only do when you have exhausted every other option. You should not use any function with "iter" in its name for more than a few thousand rows or you will have to get used to a lotof waiting.
one clean option is this one:
for row_dict in df.to_dict(orient="records"):
print(row_dict['column_name'])
You can try:
for k, row in df.iterrows():
myfunc(**row)
Here k
is the dataframe index and row
is a dict, so you can access any column with: row["my_column_name"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With