I have a dataframe like -
FileName PageNo LineNo EntityName
1 17743633 - 1 TM000002 69 Ambuja Cement Limited
2 17743633 - 1 TM000003 14 Vessel Name
3 17743633 - 1 TM000003 12 tyre Chips (Shredded Tyres)
4 17743633 - 1 TM000006 22 ambuja Cement Limited
5 17743633 - 1 TM000006 28 Binani Cement Limited
I have to remove those rows from the datframe in which EntityName column's first letter is lowercase. i.e I have to retain values that start with a upper case.
I have used to methods till now -
df['EntityName'] = map(lambda x: x[0].isupper(), df['EntityName'])
but it is giving NaN values.
another thing that i tried was regex.
df['EntityName'] = df['EntityName'].str.replace('^[a-z]+$','')
but it is showing no effect.
another one was -
qw = df.EntityName.str[0]
df = df[qw.isupper()]
but it is showing error -
'Series' object has no attribute 'isupper'
Can someone suggest me the correct code snippet or any kind of hint?
First select first letter by indexing and then check isupper
or islower
and filter by boolean indexing
:
df = df[df['EntityName'].str[0].str.isupper()]
#for working with NaN and None
#df = df[df['EntityName'].str[0].str.isupper().fillna(False)]
Or:
df = df[~df['EntityName'].str[0].str.islower()]
#for working with NaN and None
df = df[~df['EntityName'].str[0].str.islower().fillna(False)]
Or use str.contains
with regex - ^
is for match first value of string:
df = df[df['EntityName'].str.contains('^[A-Z]+')]
Solution if no NaN
s in data is list comprehension:
df = df[[x[0].isupper() for x in df['EntityName']]]
More general solution working with empty strings and NaN
s is add if-else
:
mask = [x[0].isupper() if isinstance(x,str) and len(x)>0 else False for x in df['EntityName']]
df = df[mask]
print (df)
FileName ... EntityName
1 17743633 - 1 ... Ambuja Cement Limited
2 17743633 - 1 ... Vessel Name
5 17743633 - 1 ... Binani Cement Limited
Looking at the data I think istitle
would do your work i.e
df[df['EntityName'].str.istitle()]
FileName PageNo LineNo EntityName
1 17743633 - 1 TM000002 69 Ambuja Cement Limited
2 17743633 - 1 TM000003 14 Vessel Name
5 17743633 - 1 TM000006 28 Binani Cement Limited
You can use:
df[df.EntityName.str[0].str.isupper()]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With