Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modify pandas dataframe within iterrows loop

I'm new to Python.

I am trying to add prefix (Serial number) to an element within a data frame using for loop, to do with data cleaning/preparation before analysis.

The code is

a=pd.read_excel('C:/Users/HP/Desktop/WFH/PowerBI/CMM data.xlsx','CMM_unclean')
a['Serial Number'] = a['Serial Number'].apply(str)
print(a.iloc[72,1])

for index,row in a.iterrows():
    if len(row['Serial Number']) == 6:
        row['Serial Number'] = 'SR0' + row['Serial Number']
        print(row['Serial Number'])

print(a.iloc[72,1])

The output is

C:\Users\HP\anaconda3\envs\test\python.exe C:/Users/HP/PycharmProjects/test/first.py
101306
SR0101306
101306

I don't understand why this is happening inside the for loop, value is changing, however outside it is the same.

like image 672
Saurabh Arya Avatar asked Oct 17 '25 16:10

Saurabh Arya


2 Answers

This will never change the actual dataframe named a.

TL;DR: The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. However, you can use the index to access and edit the relevant row of the dataframe.


EXPLANATION

Why?

The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. However, you can use the index to access and edit the relevant row of the dataframe.


The solution is this:

import pandas as pd

a = pd.read_excel("Book1.xlsx")
a['Serial Number'] = a['Serial Number'].apply(str)

a.head()
#    ID    Serial Number
# 0   1     SR0101306
# 1   2       1101306

print(a.iloc[0,1])
#101306

for index,row in a.iterrows():
    row = row.copy()
    if len(row['Serial Number']) == 6:
        # use the index and .loc method to alter the dataframe
        a.loc[index, 'Serial Number'] = 'SR0' + row['Serial Number']

print(a.iloc[0,1])
#SR0101306
like image 161
seralouk Avatar answered Oct 20 '25 06:10

seralouk


In the documentation, I read (emphasis from there)

You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

Maybe this means in your case that a copy is made and no reference used. So the change applies temporarily to the copy but not to the data in the data frame.

like image 22
Wolf Avatar answered Oct 20 '25 06:10

Wolf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!