I have a pandas dataframe with a text column.
I'd like to create a new column in which values are conditional on the start of the text string from the text column.
So if the 30 first characters of the text column:
== 'xxx...xxx' then return value 1 
== 'yyy...yyy' then return value 2
== 'zzz...zzz' then return value 3
if none of the above return 0
                There is possible use multiple numpy.where but if more conditions use apply:
For select strings from strats use indexing with str.
df = pd.DataFrame({'A':['xxxss','yyyee','zzzswee','sss'],
                   'B':[4,5,6,8]})
print (df)
         A  B
0    xxxss  4
1    yyyee  5
2  zzzswee  6
3      sss  8
#check first 3 values
a = df.A.str[:3]
df['new'] = np.where(a == 'xxx', 1, 
            np.where(a == 'yyy', 2, 
            np.where(a == 'zzz', 3, 0)))
print (df)
         A  B  new
0    xxxss  4    1
1    yyyee  5    2
2  zzzswee  6    3
3      sss  8    0
def f(x):
    #print (x)
    if x == 'xxx':
        return 1
    elif x == 'yyy':
        return 2
    elif x == 'zzz':
        return 3
    else:
        return 0
df['new'] = df.A.str[:3].apply(f)
print (df)
         A  B  new
0    xxxss  4    1
1    yyyee  5    2
2  zzzswee  6    3
3      sss  8    0
EDIT:
If length is different, only need:
df['new'] = np.where(df.A.str[:3] == 'xxx', 1, 
            np.where(df.A.str[:2] == 'yy', 2, 
            np.where(df.A.str[:1] == 'z', 3, 0)))
print (df)
         A  B  new
0    xxxss  4    1
1    yyyee  5    2
2  zzzswee  6    3
3      sss  8    0
EDIT1:
Thanks for idea to Quickbeam2k1 use str.startswith for check starts of each string:
df['new'] = np.where(df.A.str.startswith('xxx'), 1, 
            np.where(df.A.str.startswith('yy'), 2, 
            np.where(df.A.str.startswith('z'), 3, 0)))
print (df)
         A  B  new
0    xxxss  4    1
1    yyyee  5    2
2  zzzswee  6    3
3      sss  8    0
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With