I have a pandas dataframe with a text column.
I'd like to create a new column in which values are conditional on the start of the text string from the text column.
So if the 30 first characters of the text column:
== 'xxx...xxx'
then return value 1
== 'yyy...yyy'
then return value 2
== 'zzz...zzz'
then return value 3
if none of the above return 0
There is possible use multiple numpy.where
but if more conditions use apply
:
For select strings from strats use indexing with str.
df = pd.DataFrame({'A':['xxxss','yyyee','zzzswee','sss'],
'B':[4,5,6,8]})
print (df)
A B
0 xxxss 4
1 yyyee 5
2 zzzswee 6
3 sss 8
#check first 3 values
a = df.A.str[:3]
df['new'] = np.where(a == 'xxx', 1,
np.where(a == 'yyy', 2,
np.where(a == 'zzz', 3, 0)))
print (df)
A B new
0 xxxss 4 1
1 yyyee 5 2
2 zzzswee 6 3
3 sss 8 0
def f(x):
#print (x)
if x == 'xxx':
return 1
elif x == 'yyy':
return 2
elif x == 'zzz':
return 3
else:
return 0
df['new'] = df.A.str[:3].apply(f)
print (df)
A B new
0 xxxss 4 1
1 yyyee 5 2
2 zzzswee 6 3
3 sss 8 0
EDIT:
If length is different, only need:
df['new'] = np.where(df.A.str[:3] == 'xxx', 1,
np.where(df.A.str[:2] == 'yy', 2,
np.where(df.A.str[:1] == 'z', 3, 0)))
print (df)
A B new
0 xxxss 4 1
1 yyyee 5 2
2 zzzswee 6 3
3 sss 8 0
EDIT1:
Thanks for idea to Quickbeam2k1 use str.startswith
for check starts of each string:
df['new'] = np.where(df.A.str.startswith('xxx'), 1,
np.where(df.A.str.startswith('yy'), 2,
np.where(df.A.str.startswith('z'), 3, 0)))
print (df)
A B new
0 xxxss 4 1
1 yyyee 5 2
2 zzzswee 6 3
3 sss 8 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With