Pandas OR statement ending in series contains

I have a DataFrame df that has columns type and subtype and about 100k rows, I'm trying to classify what kind of data df contains by checking type / subtype combinations. While df can contain many different combinations there are particular combinations that only appear in certain data types. To check if my objects contains any of these combinations I'm currently doing:

typeA = ((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | 
         (df.subtype == 5) | (df.subtype == 6))) | 
         ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | 
         (df.subtype ==  8)))
A = typeA.sum()

Where typeA is a long Series of Falses that might have some Trues, if A > 0 then I know it contained a True. The problem with this scheme is that if the first row of the df produces a True it still has to check everything else. Checking the whole DataFrame is faster then using a for loop with a break, but I'm wondering if there is a better way to do it.

Thanks for any suggestions.

How do you check if a series contains a string?

contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

What are the features of series in Pandas?

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet. Labels need not be unique but must be a hashable type.

How do you check if an element is in a series Pandas?

isin() function check whether values are contained in Series. It returns a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

How do I find the last 10 entries in Pandas?

Use pandas. DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end).

use Pandas crosstab:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 10, size=(100, 2)), columns=["type", "subtype"])
counts = pd.crosstab(df.type, df.subtype)

print counts.loc[0, [2, 3, 5, 6]].sum() + counts.loc[5, [3, 4, 7, 8]].sum()

the result is same as:

a = (((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | 
         (df.subtype == 5) | (df.subtype == 6))) | 
         ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | 
         (df.subtype ==  8))))
a.sum()

In pandas 0.13 (soon to be released) you can pass this as a query, which will use numexpr, which should be more efficient for your usecase:

df.query("((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | 
         (df.subtype == 5) | (df.subtype == 6))) | 
         ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | 
         (df.subtype ==  8)))")

Note: I would probably clean up the indentation to make this more readable (you can also replace df.type with type in most cases:

df.query("((type == 0) & ((subtype == 2)"
                        "|(subtype == 3)"
                        "|(subtype == 5)"
                        "|(subtype == 6)))"
        "|((type == 5) & ((subtype == 3)"
                        "|(subtype == 4)"
                        "|(subtype == 7)"
                        "|(subtype ==  8)))")

Update: It may be able to do this more efficiently, certainly more concisely, using the "in" syntax:

df.query("(type == 0) & (subtype in [2, 3, 5, 6])"
        "|(type == 5) & (subtype in [3, 4, 7, 8])")

Pandas OR statement ending in series contains

Tags:

python

pandas

TristanMatthews

People also ask

2 Answers

HYRY

Andy Hayden

Recent Activity

Donate For Us

Pandas OR statement ending in series contains

Tags:

python

pandas

TristanMatthews

People also ask

2 Answers

HYRY

Andy Hayden

Related questions

Recent Activity

Donate For Us