Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace cell values in each row of pandas column using for loop

Please, help me understand my error. I'm trying to change one column in my .csv file. I have .csv file as following:

sku,name,code  
k1,aaa,886  
k2,bbb,898  
k3,ccc,342  
k4,ddd,503  
k5,eee,401  

I want to replace "k" symbol with the "_" symbol in the "sku" column.
I wrote the code:

import sys  
import pandas as pd  
import numpy as np  
import datetime  

df = pd.read_csv('cat0.csv')  

for r in df['sku']:  
    r1 = r.replace('k', '_')  
    df['sku'] = r1  

print (df) 

But the code inserts the last value in every row of the "sku" column. So I get:

  sku name  code
0  _5  aaa   886
1  _5  bbb   898
2  _5  ccc   342
3  _5  ddd   503
4  _5  eee   401

I want to get as following:

  sku name  code
0  _1  aaa   886
1  _2  bbb   898
2  _3  ccc   342
3  _4  ddd   503
4  _5  eee   401
like image 224
j.stalin Avatar asked Jun 23 '18 12:06

j.stalin


2 Answers

You can use str.replace on the whole column:

from io import StringIO
import pandas as pd

data = """sku,name,code  
k1,aaa,886  
k2,bbb,898  
k3,ccc,342  
k4,ddd,503  
k5,eee,401"""

file = StringIO(data)

df = pd.read_csv(file)
df['sku'] = df['sku'].str.replace('k', '_')

print(df)

This yields

  sku name  code  
0  _1  aaa     886
1  _2  bbb     898
2  _3  ccc     342
3  _4  ddd     503
4  _5  eee     401
like image 160
Jan Avatar answered Oct 28 '22 11:10

Jan


As @Jan mentioned, doing it by using df['sku'] = df['sku'].str.replace('k', '_') is the best/quickest way to do this.

However, to understand why you are getting the results you are and to present a way as close to how you were doing it as possible, you'd do:

import pandas as pd

df = pd.DataFrame(
    {
        'sku':["k1", "k2", "k3", "k4", "k5"], 
        'name': ["aaa", "bbb", "ccc", "ddd", "eee"], 
        'code':[886, 898,342,503,401]
    }, columns =["sku", "name", "code"]
)

for i, r in enumerate(df['sku']):  
    r1 = r.replace('k', '_')
    df.at[i, 'sku'] = r1  

Which gives:

  sku name  code
0  _1  aaa   886
1  _2  bbb   898
2  _3  ccc   342
3  _4  ddd   503
4  _5  eee   401

In your code...

for r in df['sku']:  
    r1 = r.replace('k', '_')  

...the issue is here:

    df['sku'] = r1  

You are broadcasting your results to the entire column rather than just the row you are working on.

like image 45
Zev Avatar answered Oct 28 '22 12:10

Zev