Polars - Select columns not exist with no error

Question

Is it possible to select a potentially non-existent column from a polars dataframe without exceptions (return a column with default values or null/None)?

The behavior I really want can be shown in the example as follows:

import polars as pl

df1 = pl.DataFrame({"id": [1, 2, 3], "bar": ["sugar", "ham", "spam"]})
df2 = pl.DataFrame({"id": [4, 5, 6], "other": ["a", "b", "b"]})

df1.write_csv("df1.csv")
df2.write_csv("df2.csv")

df = pl.scan_csv("df*.csv").select(["id", "bar"])
res = df.collect()

Now, if I run the code above, will get an error since df2.csv does not contain column "bar". The result I want is - res is just the contents in df1.csv, which means the dataframe in df2.csv will not be selected due to no column "bar" in it.

alexp · Accepted Answer

I mean as already in the comment mentioned above this functionality doesn't exist in polars, but we can construct a function which would fullfil your needs

import glob

def scan_csv_with_columns(file: str, needed_colnames: list[str]) -> pl.LazyFrame:
    file_collector = []
    for filename in glob.glob(file):
        df_scan = pl.scan_csv(filename)
        if (df_scan.columns == needed_colnames):
            file_collector.append(df_scan)
    df = pl.concat(file_collector, how="vertical")
    return(df)

file = "df*.csv"
needed_colnames = ["id", "bar"]
df = scan_csv_with_columns(file, needed_colnames)
df.collect()

shape: (3, 2)
┌─────┬───────┐
│ id  ┆ bar   │
│ --- ┆ ---   │
│ i64 ┆ str   │
╞═════╪═══════╡
│ 1   ┆ sugar │
│ 2   ┆ ham   │
│ 3   ┆ spam  │
└─────┴───────┘

Michael W. · Answer

You can do that using pl.selectors.matches and a regex pattern

df = pl.DataFrame({"col1": [1,2], "col2": [3,4], "col3": [5,6]})
print(
    df
    .select(
        pl.selectors.matches("^col1$|^col3$|^col4$")
    )
)

Polars - Select columns not exist with no error

Tags:

python

dataframe

csv

python-polars

lebesgue

2 Answers

alexp

Michael W.

Recent Activity

Donate For Us

Polars - Select columns not exist with no error

Tags:

python

dataframe

csv

python-polars

lebesgue

2 Answers

alexp

Michael W.

Related questions

Recent Activity

Donate For Us