Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing rows from dataframe whose first letter is in lowercase

I have a dataframe like -

    FileName        PageNo     LineNo   EntityName  
1   17743633 - 1    TM000002    69      Ambuja Cement Limited
2   17743633 - 1    TM000003    14      Vessel Name
3   17743633 - 1    TM000003    12      tyre Chips (Shredded Tyres)
4   17743633 - 1    TM000006    22      ambuja Cement Limited
5   17743633 - 1    TM000006    28      Binani Cement Limited

I have to remove those rows from the datframe in which EntityName column's first letter is lowercase. i.e I have to retain values that start with a upper case.

I have used to methods till now -

df['EntityName'] = map(lambda x: x[0].isupper(), df['EntityName'])

but it is giving NaN values.

another thing that i tried was regex.

df['EntityName'] = df['EntityName'].str.replace('^[a-z]+$','')

but it is showing no effect.

another one was -

qw = df.EntityName.str[0]
df = df[qw.isupper()]

but it is showing error -

'Series' object has no attribute 'isupper'

Can someone suggest me the correct code snippet or any kind of hint?

like image 201
Madhur Yadav Avatar asked May 30 '18 08:05

Madhur Yadav


3 Answers

First select first letter by indexing and then check isupper or islower and filter by boolean indexing:

df = df[df['EntityName'].str[0].str.isupper()]
#for working with NaN and None
#df = df[df['EntityName'].str[0].str.isupper().fillna(False)]

Or:

df = df[~df['EntityName'].str[0].str.islower()]
#for working with NaN and None
df = df[~df['EntityName'].str[0].str.islower().fillna(False)]

Or use str.contains with regex - ^ is for match first value of string:

df = df[df['EntityName'].str.contains('^[A-Z]+')]

Solution if no NaNs in data is list comprehension:

df = df[[x[0].isupper() for x in df['EntityName']]]

More general solution working with empty strings and NaNs is add if-else:

mask = [x[0].isupper() if isinstance(x,str) and len(x)>0 else False for x in df['EntityName']]
df = df[mask]

print (df)
              FileName          ...                       EntityName
1 17743633 -         1          ...            Ambuja Cement Limited
2 17743633 -         1          ...                      Vessel Name
5 17743633 -         1          ...            Binani Cement Limited
like image 151
jezrael Avatar answered Oct 24 '22 03:10

jezrael


Looking at the data I think istitle would do your work i.e

df[df['EntityName'].str.istitle()]

      FileName    PageNo  LineNo             EntityName
1  17743633 - 1  TM000002      69  Ambuja Cement Limited
2  17743633 - 1  TM000003      14            Vessel Name
5  17743633 - 1  TM000006      28  Binani Cement Limited
like image 3
Bharath Avatar answered Oct 24 '22 01:10

Bharath


You can use:

df[df.EntityName.str[0].str.isupper()]
like image 2
llllllllll Avatar answered Oct 24 '22 01:10

llllllllll