Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying wildcard to Pandas isin filter [duplicate]

total noob here, sorry for the beginner question. I've been racking my brain in Pandas trying to filter a series in a Dataframe to locate rows that contain one among a list of strings.

import pandas as pd
streets = ['CONGRESS', 'GUADALUPE', 'BEN WHITE', 'LAMAR', 'MANCHACA', 'BURNET', 'ANDERSON', 'BRAKER' ]
# the actual list of street names is much longer than this

strs = pd.read_csv('short_term_rental_locations.csv')

# the following returns no values, or all 'False' values to be more accurate
strs[strs['PROP_ADDRESS'].isin(streets)]

# but if I use .contains, i can find rows that contain part of the 
# street names, but .contains has a limit of six positional arguments.
strs[strs['PROP_ADDRESS'].str.contains('CONGRESS')]

I've tried using wildcard * with .isin to no avail. I feel so dumb for struggling with this. Any help much appreciated. Thanks!

like image 825
24hourbreakfast Avatar asked Oct 13 '18 14:10

24hourbreakfast


People also ask

What does the ISIN () function do in pandas?

The isin() method checks if the Dataframe contains the specified value(s). It returns a DataFrame similar to the original DataFrame, but the original values have been replaced with True if the value was one of the specified values, otherwise False .

How do you filter a DataFrame in multiple conditions?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

How do you ISIN a DataFrame in Python?

DataFrame - isin() functionThe isin() function is used to check each element in the DataFrame is contained in values or not. The result will only be true at a location if all the labels match. If values is a Series, that's the index. If values is a dict, the keys must be the column names, which must match.

How do you filter with Contains in pandas?

Using Series.Series. str. contains() method in pandas allows you to search a column for a specific substring. The contains() method returns boolean values for the series with True when the original Series value contains the substring and False if not.


1 Answers

.contains has a limit of six positional arguments.

There's some misunderstanding here. It's not clear what "six positional arguments" refers to. Strictly speaking, pd.Series.str.contains has a maximum of 5 arguments. But only one actually includes the strings you are searching for.

In this case, you can use regular expression, which by default is enabled, to build a single string to use with pd.Series.str.contains:

streets = ['CONGRESS', 'GUADALUPE', 'BEN WHITE', 'LAMAR',
           'MANCHACA', 'BURNET', 'ANDERSON', 'BRAKER' ]

searchstr = '|'.join(streets)
strs[strs['PROP_ADDRESS'].str.contains(searchstr)]
like image 138
jpp Avatar answered Oct 31 '22 06:10

jpp