Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering a pandas data frame with a variable

Tags:

python

pandas

I'm stuck on something simple but I can't see it despite reading the docs and relevant SO questions. This involves filtering records from data that comes out of a WordPress database.

Create some data:

import pandas as pd

data = {'number': [1,2,3],\
        'field': ['billing_last_name', 'shipping_last_name', 'field_1435'],\
        'name': ['jones', 'smith', 'jones']}
dframe = pd.DataFrame(data)
print dframe

                field   name  number
0   billing_last_name  jones       1
1  shipping_last_name  smith       2
2          field_1435  jones       3

Select a subset of the data:

field_filter = 'billing_last_name'
number_filter = 1
choice = dframe[(dframe['field'] == field_filter) & (dframe['number'] == number_filter)]
print choice

               field   name  number
0  billing_last_name  jones       1
0    jones
Name: name, dtype: object

Use this result to set a variable for further filtering:

match = str(choice['name'])

Here's where the problem starts. If I filter with the variable it returns nothing:

print dframe[dframe['name'] == match]

Empty DataFrame
Columns: [field, name, number]
Index: []    

If I run the same filter with the string the variable holds it returns the correct result:

print dframe[dframe['name'] == 'jones']

               field   name  number
0  billing_last_name  jones       1
2         field_1435  jones       3

Yet both the variable and its contents are strings, apparently:

print type('jones')
print type(match)

<type 'str'>
<type 'str'>

Why doesn't the filter with the variable work?

like image 863
Pete Kelly Avatar asked May 03 '26 17:05

Pete Kelly


1 Answers

match is actually a pandas series, not the string variable "jones". In this case, you need to access the string values within the series:

field_filter = 'billing_last_name'
number_filter = 1
choice = dframe[(dframe['field'] == field_filter) & (dframe['number'] == number_filter)]

matches = choice['name'].values
dframe[dframe['name'].isin(matches)]

This assumes you can have multiple elements in matches. This may / may not be the desired effect (I can update the answer).

like image 133
NickBraunagel Avatar answered May 06 '26 08:05

NickBraunagel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!