Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I delete rows not starting with 'x' in Pandas or keep rows starting with 'x'

Tags:

python

pandas

I have been at this all morning and have slowly pieced things together. But for the life of me I can not figure out how to use the .str.startswith() function in Pandas.

My XLSX spreadsheet is as follows

1 Name, Registration Date, Phone number
2 John Doe, 2015-11-20T19:54:45Z, 1.1112223333
3 Jane Doe, 2015-11-20T20:44:26Z, 65.1112223333
etc...

So I am importing it as a data frame, cleaning the header so that there are no spaces and such, then I want to delete any rows not starting with '1.' (or keep rows that start with '1.') and delete all others. So in this short example, delete the entire 'Jane Doe' entry since her phone number starts with '65.'

import pandas as pd
df = pd.read_excel('testingpanda.xlsx', sheetname = 'Export 1')
def colHeaderCleaner():
    cols = df.columns
    cols = cols.map(lambda x: x.replace(' ', '_') if isinstance(x, (str, unicode)) else x)
    df.columns = cols
    df.columns = [x.lower() for x in df.columns]

colHeaderCleaner()

#by default it sets the values in 'registrant_phone' as float64, so this is fixing that...
df['registrant_phone'] = df['registrant_phone'].astype('object')

The closest I have gotten, and by that I mean the only line I have been able to execute without annoying tracebacks and other errors is:

df['registrant_phone'] = df['registrant_phone'].str.startswith('1')

But all that does is convert all phone values to 'NaN', it maintains all of the rows and everything as shown below:

print df
[output] name, registration_date, phone_number
[output] John Doe, 2015-11-20T19:54:45Z, NaN
[output] Jane Doe, 2015-11-20T20:44:26Z, NaN

I have searched far too many places to even try to list, I have tried different versions of df.drop and just can't seem to figure anything out. Where do I go from here?

like image 359
Mxracer888 Avatar asked Feb 03 '16 19:02

Mxracer888


People also ask

How do I delete unwanted rows in Pandas?

You can delete a list of rows from Pandas by passing the list of indices to the drop() method. In this code, [5,6] is the index of the rows you want to delete. axis=0 denotes that rows should be deleted from the dataframe.

How do you delete certain rows in Python?

Python pandas drop rows by index To remove the rows by index all we have to do is pass the index number or list of index numbers in case of multiple drops. to drop rows by index simply use this code: df. drop(index) . Here df is the dataframe on which you are working and in place of index type the index number or name.


1 Answers

I am a bit confused by your question. In any case, if you have a DataFrame df with a column 'c', and you would like to remove the items starting with 1, then the safest way would be to use something like:

df = df[~df['c'].astype(str).str.startswith('1')]
like image 171
Ami Tavory Avatar answered Nov 01 '22 09:11

Ami Tavory