Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching Multiple Strings in pandas without predefining number of strings to use

Tags:

python

pandas

I'm wondering if there's a more general way to do the below? I'm wondering if there's a way to create the st function so that I can search a non-predefined number of strings?

So for instance, being able to create a generalized st function, and then type st('Governor', 'Virginia', 'Google)

here's my current function, but it predefines two words you can use. (df is a pandas DataFrame)

def search(word1, word2, word3 df):
    """
    allows you to search an intersection of three terms
    """
    return df[df.Name.str.contains(word1) & df.Name.str.contains(word2) & df.Name.str.contains(word3)]

st('Governor', 'Virginia', newauthdf)
like image 305
user3314418 Avatar asked Mar 25 '14 01:03

user3314418


1 Answers

You could use np.logical_and.reduce:

import pandas as pd
import numpy as np
def search(df, *words):  #1
    """
    Return a sub-DataFrame of those rows whose Name column match all the words.
    """
    return df[np.logical_and.reduce([df['Name'].str.contains(word) for word in words])]   # 2


df = pd.DataFrame({'Name':['Virginia Google Governor',
                           'Governor Virginia',
                           'Governor Virginia Google']})
print(search(df, 'Governor', 'Virginia', 'Google'))

prints

                       Name
0  Virginia Google Governor
2  Governor Virginia Google

  1. The * in def search(df, *words) allows search to accept an unlimited number of positional arguments. It will collect all the arguments (after the first) and place them in a list called words.
  2. np.logical_and.reduce([X,Y,Z]) is equivalent to X & Y & Z. It allows you to handle an arbitrarily long list, however.
like image 131
unutbu Avatar answered Oct 04 '22 01:10

unutbu