Please, help me understand my error.
I'm trying to change one column in my .csv
file.
I have .csv
file as following:
sku,name,code
k1,aaa,886
k2,bbb,898
k3,ccc,342
k4,ddd,503
k5,eee,401
I want to replace "k" symbol with the "_" symbol in the "sku" column.
I wrote the code:
import sys
import pandas as pd
import numpy as np
import datetime
df = pd.read_csv('cat0.csv')
for r in df['sku']:
r1 = r.replace('k', '_')
df['sku'] = r1
print (df)
But the code inserts the last value in every row of the "sku" column. So I get:
sku name code
0 _5 aaa 886
1 _5 bbb 898
2 _5 ccc 342
3 _5 ddd 503
4 _5 eee 401
I want to get as following:
sku name code
0 _1 aaa 886
1 _2 bbb 898
2 _3 ccc 342
3 _4 ddd 503
4 _5 eee 401
You can use str.replace
on the whole column:
from io import StringIO
import pandas as pd
data = """sku,name,code
k1,aaa,886
k2,bbb,898
k3,ccc,342
k4,ddd,503
k5,eee,401"""
file = StringIO(data)
df = pd.read_csv(file)
df['sku'] = df['sku'].str.replace('k', '_')
print(df)
This yields
sku name code
0 _1 aaa 886
1 _2 bbb 898
2 _3 ccc 342
3 _4 ddd 503
4 _5 eee 401
As @Jan mentioned, doing it by using df['sku'] = df['sku'].str.replace('k', '_')
is the best/quickest way to do this.
However, to understand why you are getting the results you are and to present a way as close to how you were doing it as possible, you'd do:
import pandas as pd
df = pd.DataFrame(
{
'sku':["k1", "k2", "k3", "k4", "k5"],
'name': ["aaa", "bbb", "ccc", "ddd", "eee"],
'code':[886, 898,342,503,401]
}, columns =["sku", "name", "code"]
)
for i, r in enumerate(df['sku']):
r1 = r.replace('k', '_')
df.at[i, 'sku'] = r1
Which gives:
sku name code
0 _1 aaa 886
1 _2 bbb 898
2 _3 ccc 342
3 _4 ddd 503
4 _5 eee 401
In your code...
for r in df['sku']:
r1 = r.replace('k', '_')
...the issue is here:
df['sku'] = r1
You are broadcasting your results to the entire column rather than just the row you are working on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With