Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas select from Dataframe using startswith

This works (using Pandas 12 dev)

table2=table[table['SUBDIVISION'] =='INVERNESS'] 

Then I realized I needed to select the field using "starts with" Since I was missing a bunch. So per the Pandas doc as near as I could follow I tried

criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS')) table2 = table[criteria] 

And got AttributeError: 'float' object has no attribute 'startswith'

So I tried an alternate syntax with the same result

table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]] 

Reference http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing Section 4: List comprehensions and map method of Series can also be used to produce more complex criteria:

What am I missing?

like image 535
dartdog Avatar asked Jul 30 '13 21:07

dartdog


People also ask

How do I select a substring in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.

Is at and LOC same in pandas?

at is a single element and using . loc maybe a Series or a DataFrame. Returning single value is not the case always. It returns array of values if the provided index is used multiple times.


2 Answers

You can use the str.startswith DataFrame method to give more consistent results:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])  In [12]: s Out[12]: 0      a 1     ab 2      c 3     11 4    NaN dtype: object  In [13]: s.str.startswith('a', na=False) Out[13]: 0     True 1     True 2    False 3    False 4    False dtype: bool 

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

In [14]: s.loc[s.str.startswith('a', na=False)] Out[14]: 0     a 1    ab dtype: object 

.

It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...

like image 103
Andy Hayden Avatar answered Oct 07 '22 01:10

Andy Hayden


To retrieve all the rows which startwith required string

dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')] 

To retrieve all the rows which contains required string

dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')] 
like image 37
Vinoj John Hosan Avatar answered Oct 07 '22 03:10

Vinoj John Hosan