Any Alternative methods to achieve this solution? using str.contains() is not very elegant when there are many keys to match.
df = DataFrame({'A':['Cat had a nap','Dog had puppies','Did you see a Donkey','kitten got angry','puppy was cute']})
dic = {'Cat':'Cat','kitten':'Cat','Dog':'Dog','puppy':'Dog'}
A
0 Cat had a nap
1 Dog had puppies
2 Did you see a Donkey
3 kitten got angry
4 puppy was cute
df['Cat'] = (df['A'].astype(str).str.contains('Cat')|df['A'].astype(str).str.contains('kitten')).replace({False:0, True:1})
df['Dog'] = (df['A'].astype(str).str.contains('Dog')|df['A'].astype(str).str.contains('puppy')).replace({False:0, True:1})
df
A Cat Dog
0 Cat had a nap 1 0
1 Dog had puppies 0 1
2 Did you see a Donkey 0 0
3 kitten got angry 1 0
4 puppy was cute 0 1
Use | for regex or in str.contains with cast boolean to integer by astype:
df['Cat'] = df['A'].astype(str).str.contains('Cat|kitten').astype(int)
df['Dog'] = df['A'].astype(str).str.contains('Dog|puppy').astype(int)
Similar:
a = df['A'].astype(str)
df['Cat'] = a.str.contains('Cat|kitten').astype(int)
df['Dog'] = a.str.contains('Dog|puppy').astype(int)
print (df)
A Cat Dog
0 Cat had a nap 1 0
1 Dog had puppies 0 1
2 Did you see a Donkey 0 0
3 kitten got angry 1 0
4 puppy was cute 0 1
More dynamic solution with dictionary of lists:
dic = {'Cat':['Cat','kitten'],'Dog':['Dog','puppy']}
for k, v in dic.items():
df[k] = df['A'].astype(str).str.contains('|'.join(v)).astype(int)
print (df)
A Cat Dog
0 Cat had a nap 1 0
1 Dog had puppies 0 1
2 Did you see a Donkey 0 0
3 kitten got angry 1 0
4 puppy was cute 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With