Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a new column within a polars DataFrame that is equal to a list?

I am currently trying to create a new column within a polars dataframe (df). Within my df, there are many many rows, and within this new column I only want my existing list to populate wherever certain conditions are met.

I know that in pandas, I could do something like the following (where 'output' is my new column and assuming list is the same length as the filtered df) :

df = pd.DataFrame({
    'cond1': [-1, 0, 1, 2, 2, 3, 4, 5],
    'cond2': ['No', 'No', 'No', 'Yes','Yes', 'Yes', 'Yes', 'Yes']
})
conditions = ((df['cond1']>=1) & (df['cond2'] == 'Yes'))
list_new = [1,2,3,4,5]

df.loc[conditions, 'output'] = list_new

How would I do this same thing but within polars? I want the total number of rows to stay the same, I just want to update the new column wherever the conditions are met

I tried a few variations of the following but nothing has seemed to work:

df_polars = df_polars.with_columns(pl.when(conditions).then(list_new).alias('output')
like image 946
Datawiz Avatar asked Oct 19 '25 14:10

Datawiz


2 Answers

  • .with_row index() to get row indexes.
  • .filter() to get rows based on conditions.

Now we get DataFrame with proper row indexes and new data:

conditions = [pl.col('cond1') >= 1, pl.col('cond2') == 'Yes']

df_output = (
    df.with_row_index()
    .filter(conditions).select('index', output=pl.Series(list_new)
)

┌───────┬────────┐
│ index ┆ output │
│ ---   ┆ ---    │
│ u32   ┆ i64    │
╞═══════╪════════╡
│ 3     ┆ 1      │
│ 4     ┆ 2      │
│ 5     ┆ 3      │
│ 6     ┆ 4      │
│ 7     ┆ 5      │
└───────┴────────┘
  • .join() to add column to original DataFrame:
(
    df.with_row_index()
    .join(df_output, on='index', how='left')
    .drop('index')
)

┌───────┬───────┬────────┐
│ cond1 ┆ cond2 ┆ output │
│ ---   ┆ ---   ┆ ---    │
│ i64   ┆ str   ┆ i64    │
╞═══════╪═══════╪════════╡
│ -1    ┆ No    ┆ null   │
│ 0     ┆ No    ┆ null   │
│ 1     ┆ No    ┆ null   │
│ 2     ┆ Yes   ┆ 1      │
│ 2     ┆ Yes   ┆ 2      │
│ 3     ┆ Yes   ┆ 3      │
│ 4     ┆ Yes   ┆ 4      │
│ 5     ┆ Yes   ┆ 5      │
└───────┴───────┴────────┘
like image 186
Roman Pekar Avatar answered Oct 21 '25 02:10

Roman Pekar


Can't avoid the use of with_row_index but one can also use DataFrame.update here

import polars as pl
from polars import col

df = pl.DataFrame({
    'cond1': [-1, 0, 1, 2, 2, 3, 4, 5],
    'cond2': ['No', 'No', 'No', 'Yes','Yes', 'Yes', 'Yes', 'Yes']
})
list_new = [1, 2, 3, 4, 5]

df = df.with_row_index()
subset = df.select(
   col('index').filter(col('cond1') >= 1, col('cond2') == 'Yes'),
   pl.Series('output', list_new)
)

print(
    df.with_columns(output=None)
    .update(subset, on='index', how='outer')
    .drop('index')
)
# shape: (8, 3)
# ┌───────┬───────┬────────┐
# │ cond1 ┆ cond2 ┆ output │
# │ ---   ┆ ---   ┆ ---    │
# │ i64   ┆ str   ┆ i64    │
# ╞═══════╪═══════╪════════╡
# │ -1    ┆ No    ┆ null   │
# │ 0     ┆ No    ┆ null   │
# │ 1     ┆ No    ┆ null   │
# │ 2     ┆ Yes   ┆ 1      │
# │ 2     ┆ Yes   ┆ 2      │
# │ 3     ┆ Yes   ┆ 3      │
# │ 4     ┆ Yes   ┆ 4      │
# │ 5     ┆ Yes   ┆ 5      │
# └───────┴───────┴────────┘
like image 27
Cameron Riddell Avatar answered Oct 21 '25 04:10

Cameron Riddell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!