Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polars - Select columns not exist with no error

Is it possible to select a potentially non-existent column from a polars dataframe without exceptions (return a column with default values or null/None)?

The behavior I really want can be shown in the example as follows:

import polars as pl

df1 = pl.DataFrame({"id": [1, 2, 3], "bar": ["sugar", "ham", "spam"]})
df2 = pl.DataFrame({"id": [4, 5, 6], "other": ["a", "b", "b"]})

df1.write_csv("df1.csv")
df2.write_csv("df2.csv")

df = pl.scan_csv("df*.csv").select(["id", "bar"])
res = df.collect()

Now, if I run the code above, will get an error since df2.csv does not contain column "bar". The result I want is - res is just the contents in df1.csv, which means the dataframe in df2.csv will not be selected due to no column "bar" in it.

like image 891
lebesgue Avatar asked Mar 26 '26 08:03

lebesgue


2 Answers

I mean as already in the comment mentioned above this functionality doesn't exist in polars, but we can construct a function which would fullfil your needs

import glob

def scan_csv_with_columns(file: str, needed_colnames: list[str]) -> pl.LazyFrame:
    file_collector = []
    for filename in glob.glob(file):
        df_scan = pl.scan_csv(filename)
        if (df_scan.columns == needed_colnames):
            file_collector.append(df_scan)
    df = pl.concat(file_collector, how="vertical")
    return(df)

file = "df*.csv"
needed_colnames = ["id", "bar"]
df = scan_csv_with_columns(file, needed_colnames)
df.collect()

shape: (3, 2)
┌─────┬───────┐
│ id  ┆ bar   │
│ --- ┆ ---   │
│ i64 ┆ str   │
╞═════╪═══════╡
│ 1   ┆ sugar │
│ 2   ┆ ham   │
│ 3   ┆ spam  │
└─────┴───────┘
like image 176
alexp Avatar answered Mar 28 '26 22:03

alexp


You can do that using pl.selectors.matches and a regex pattern

df = pl.DataFrame({"col1": [1,2], "col2": [3,4], "col3": [5,6]})
print(
    df
    .select(
        pl.selectors.matches("^col1$|^col3$|^col4$")
    )
)
like image 38
Michael W. Avatar answered Mar 28 '26 21:03

Michael W.