Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteration over the rows of a Pandas DataFrame as dictionaries

I need to iterate over a pandas dataframe in order to pass each row as argument of a function (actually, class constructor) with **kwargs. This means that each row should behave as a dictionary with keys the column names and values the corresponding ones for each row.

This works, but it performs very badly:

import pandas as pd


def myfunc(**kwargs):
    try:
        area = kwargs.get('length', 0)* kwargs.get('width', 0)
        return area
    except TypeError:
        return 'Error : length and width should be int or float'


df = pd.DataFrame({'length':[1,2,3], 'width':[10, 20, 30]})

for i in range(len(df)):
    print myfunc(**df.iloc[i])

Any suggestions on how to make that more performing ? I have tried iterating with tried df.iterrows(), but I get the following error :

TypeError: myfunc() argument after ** must be a mapping, not tuple

I have also tried df.itertuples() and df.values , but either I am missing something, or it means that I have to convert each tuple / np.array to a pd.Series or dict , which will also be slow. My constraint is that the script has to work with python 2.7 and pandas 0.14.1.

like image 578
Matina G Avatar asked Nov 14 '18 09:11

Matina G


People also ask

How do I iterate through a row in pandas DataFrame?

DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .

How do you convert a DataFrame to a dictionary?

Use DataFrame. To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.

Can you iterate over dictionary?

You can loop through a dictionary by using a for loop. When looping through a dictionary, the return value are the keys of the dictionary, but there are methods to return the values as well.

How do I iterate over a DataFrame column in Python?

You can use the for loop to iterate over columns of a DataFrame. You can use multiple methods to iterate over a pandas DataFrame like iteritems() , getitem([]) , transpose(). iterrows() , enumerate() and NumPy. asarray() function.

How to iterate over the rows of the pandas Dataframe?

Therefore, by specifying the integer value of the row and column index, you can iterate over the rows of the pandas DataFrame. # Pass the integer-value locations of the rows or columns of the DataFrame to the iloc () function to iterate over them for i in range(len(df)): print(df.iloc[i, 0], df.iloc[i, 1])

How does iterrows() work in pandas?

According to the official documentation, iterrows () iterates "over the rows of a Pandas DataFrame as (index, Series) pairs". It converts each row into a Series object, which causes two problems:

How to iterate over the original Dataframe in R?

Thus, to make it iterate over rows, you have to transpose (the "T"), which means you change rows and columns into each other (reflect over diagonal). As a result, you effectively iterate the original dataframe over its rows when you use df.T.iteritems()

When should I not use iteration in pandas?

Answer: DON'T*! Iteration in Pandas is an anti-pattern and is something you should only do when you have exhausted every other option. You should not use any function with "iter" in its name for more than a few thousand rows or you will have to get used to a lotof waiting.


2 Answers

one clean option is this one:

for row_dict in df.to_dict(orient="records"):
    print(row_dict['column_name'])
like image 158
avloss Avatar answered Oct 16 '22 05:10

avloss


You can try:

for k, row in df.iterrows():
    myfunc(**row)

Here k is the dataframe index and row is a dict, so you can access any column with: row["my_column_name"]

like image 32
stellasia Avatar answered Oct 16 '22 04:10

stellasia