Update a dataframe in pandas while iterating row by row

People also ask

How do I iterate over a row in a DataFrame in Python?

DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .

Is Iterrows faster than apply?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

You can assign values in the loop using df.set_value:

for i, row in df.iterrows():
    ifor_val = something
    if <condition>:
        ifor_val = something_else
    df.set_value(i,'ifor',ifor_val)

If you don't need the row values you could simply iterate over the indices of df, but I kept the original for-loop in case you need the row value for something not shown here.

update

df.set_value() has been deprecated since version 0.21.0 you can use df.at() instead:

for i, row in df.iterrows():
    ifor_val = something
    if <condition>:
        ifor_val = something_else
    df.at[i,'ifor'] = ifor_val

Pandas DataFrame object should be thought of as a Series of Series. In other words, you should think of it in terms of columns. The reason why this is important is because when you use pd.DataFrame.iterrows you are iterating through rows as Series. But these are not the Series that the data frame is storing and so they are new Series that are created for you while you iterate. That implies that when you attempt to assign tho them, those edits won't end up reflected in the original data frame.

Ok, now that that is out of the way: What do we do?

Suggestions prior to this post include:

pd.DataFrame.set_value is deprecated as of Pandas version 0.21
pd.DataFrame.ix is deprecated
pd.DataFrame.loc is fine but can work on array indexers and you can do better

My recommendation
Use pd.DataFrame.at

for i in df.index:
    if <something>:
        df.at[i, 'ifor'] = x
    else:
        df.at[i, 'ifor'] = y

You can even change this to:

for i in df.index:
    df.at[i, 'ifor'] = x if <something> else y

Response to comment

and what if I need to use the value of the previous row for the if condition?

for i in range(1, len(df) + 1):
    j = df.columns.get_loc('ifor')
    if <something>:
        df.iat[i - 1, j] = x
    else:
        df.iat[i - 1, j] = y

A method you can use is itertuples(), it iterates over DataFrame rows as namedtuples, with index value as first element of the tuple. And it is much much faster compared with iterrows(). For itertuples(), each row contains its Index in the DataFrame, and you can use loc to set the value.

for row in df.itertuples():
    if <something>:
        df.at[row.Index, 'ifor'] = x
    else:
        df.at[row.Index, 'ifor'] = x

    df.loc[row.Index, 'ifor'] = x

Under most cases, itertuples() is faster than iat or at.

Thanks @SantiStSupery, using .at is much faster than loc.

You should assign value by df.ix[i, 'exp']=X or df.loc[i, 'exp']=X instead of df.ix[i]['ifor'] = x.

Otherwise you are working on a view, and should get a warming:

-c:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead

But certainly, loop probably should better be replaced by some vectorized algorithm to make the full use of DataFrame as @Phillip Cloud suggested.

It's better to use lambda functions using df.apply() -

df["ifor"] = df.apply(lambda x: {value} if {condition} else x["ifor"], axis=1)

Well, if you are going to iterate anyhow, why don't use the simplest method of all, df['Column'].values[i]

df['Column'] = ''

for i in range(len(df)):
    df['Column'].values[i] = something/update/new_value

Or if you want to compare the new values with old or anything like that, why not store it in a list and then append in the end.

mylist, df['Column'] = [], ''

for <condition>:
    mylist.append(something/update/new_value)

df['Column'] = mylist

for i, row in df.iterrows():
    if <something>:
        df.at[i, 'ifor'] = x
    else:
        df.at[i, 'ifor'] = y

Related questions
                            
                                How do I fix 'ImportError: cannot import name IncompleteRead'?
                            
                                Virtualenv Command Not Found
                            
                                How to fix Python indentation
                            
                                How to append multiple values to a list in Python
                            
                                How do you extract a column from a multi-dimensional array?
                            
                                List comprehension on a nested list?
                            
                                Defining private module functions in python
                            
                                How can I find where Python is installed on Windows?
                            
                                Why are some float < integer comparisons four times slower than others?
                            
                                Escaping regex string
                            
                                What is the purpose of "pip install --user ..."?
                            
                                How can I read large text files in Python, line by line, without loading it into memory?
                            
                                how do I insert a column at a specific column index in pandas?
                            
                                Is Python strongly typed?
                            
                                Disable Tensorflow debugging information
                            
                                Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
                            
                                Get human readable version of file size?
                            
                                Why do python lists have pop() but not push()
                            
                                Replace non-ASCII characters with a single space
                            
                                In Python, how do I split a string and keep the separators?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Update a dataframe in pandas while iterating row by row

Tags:

python

pandas

dataframe

updates

People also ask

Response to comment

Recent Activity

Donate For Us