Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Column- and row-wise logical operations on Polars DataFrame

In Pandas, one can perform boolean operations on boolean DataFrames with the all and any methods, providing an axis argument. For example:

import pandas as pd

data = dict(A=["a","b","?"], B=["d","?","f"])
pd_df = pd.DataFrame(data)

For example, to get a boolean mask on columns containing the element "?":

(pd_df == "?").any(axis=0)

and to get a mask on rows:

(pd_df == "?").any(axis=1)

Also, to get a single boolean:

(pd_df == "?").any().any()

In comparison, in polars, the best I could come up with are the following:

import polars as pl
pl_df = pl.DataFrame(data)

To get a mask on columns:

(pl_df == "?").select(pl.all().any())

To get a mask on rows:

pl_df.select(
    pl.concat_list(pl.all() == "?").alias("mask")
).select(
    pl.col("mask").list.eval(pl.element().any()).list.first()
)

And to get a single boolean value:

pl_df.select(
    pl.concat_list(pl.all() == "?").alias("mask")
).select(
    pl.col("mask").list.eval(pl.element().any()).list.first()
)["mask"].any()

The last two cases seem particularly verbose and convoluted for such a straightforward task, so I'm wondering whether there are shorter/faster equivalents?

like image 854
AAriam Avatar asked Sep 16 '25 22:09

AAriam


2 Answers

Polars added dedicated horizontal methods in version 0.18.7 for "row-wise" operations.

For these examples:

  • pl.all_horizontal()
  • pl.any_horizontal()

If we start with your sample frame:

df = pl.DataFrame(dict(A=["a","b","?"], B=["d","?","f"]))

boolean mask:

df.select(pl.all() == "?") 
shape: (3, 2)
┌───────┬───────┐
│ A     ┆ B     │
│ ---   ┆ ---   │
│ bool  ┆ bool  │
╞═══════╪═══════╡
│ false ┆ false │
│ false ┆ true  │
│ true  ┆ false │
└───────┴───────┘

mask on columns:

df.select((pl.all() == "?").any())
shape: (1, 2)
┌──────┬──────┐
│ A    ┆ B    │
│ ---  ┆ ---  │
│ bool ┆ bool │
╞══════╪══════╡
│ true ┆ true │
└──────┴──────┘

horizontal mask / mask on rows:

df.select(pl.any_horizontal(pl.all() == "?"))
shape: (3, 1)
┌───────┐
│ any   │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ true  │
│ true  │
└───────┘

.list also received any/all methods in 0.18.5 meaning it could also be written as in your example:

df.select(pl.concat_list(pl.all() == "?").list.any())

single boolean for horizontal mask:

df.select(pl.any_horizontal(pl.all() == "?").any())
shape: (1, 1)
┌──────┐
│ any  │
│ ---  │
│ bool │
╞══════╡
│ true │
└──────┘

If you want to extract it as a single value into Python, you can use .item()

df.select(pl.any_horizontal(pl.all() == "?").any()).item()
# True
like image 131
jqurious Avatar answered Sep 19 '25 07:09

jqurious


I think one thing that might be making this more confusing is that you're not doing everything in the select context. In other words, don't do this: (pl_df == "?")

The first thing we can do is

pl_df.select(pl.all()=="?")
shape: (3, 2)
┌───────┬───────┐
│ A     ┆ B     │
│ ---   ┆ ---   │
│ bool  ┆ bool  │
╞═══════╪═══════╡
│ false ┆ false │
│ false ┆ true  │
│ true  ┆ false │
└───────┴───────┘

When we call pl.all() it means all of the columns. For each column we're converting its original value into a bool of whether or not it's equal to ?

Now let's do this:

pl_df.select((pl.all()=="?").any())

shape: (1, 2)
┌──────┬──────┐
│ A    ┆ B    │
│ ---  ┆ ---  │
│ bool ┆ bool │
╞══════╪══════╡
│ true ┆ true │
└──────┴──────┘

This gives you the per column. All we did was add .any which tells it that if anything in the parenthesis that preceded it is true then return True.

Now let's do

pl_df.select(pl.any_horizontal(pl.all()=="?"))

shape: (3, 1)
┌───────┐
│ any   │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ true  │
│ true  │
└───────┘

When we call pl.any_horizontal(...) then it is going to do that rowwise for whatever ... is.

Lastly, if we put them together...

pl_df.select(pl.any_horizontal(pl.all()=="?").any())

shape: (1, 1)
┌──────┐
│ any  │
│ ---  │
│ bool │
╞══════╡
│ true │
└──────┘

then we get the single value indicating that somewhere in the dataframe is an item that is equal to "?"

like image 31
Dean MacGregor Avatar answered Sep 19 '25 06:09

Dean MacGregor