Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace a Polars column with a 1D array

Sample df:

import polars as pl
import numpy as np
df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "A": [True, True, False, False, False],
    }
)

I want to replace column random. So far, I've been doing

new = np.arange(5)
df.replace('random', pl.Series(new))

note that replace is one of the few polars methods which works inplace!

But now I'm getting

C:\Users\...\AppData\Local\Temp\ipykernel_18244\1406681700.py:2: DeprecationWarning: `replace` is deprecated. DataFrame.replace is deprecated and will be removed in a future version. Please use
    df = df.with_columns(new_column.alias(column_name))
instead.
  df = df.replace('random', pl.Series(new)) 

So, should I do

df = df.with_columns(pl.Series(new).alias('random'))

Seems more verbose, also inplace modification is gone. Am I doing things right?

like image 498
DeltaIV Avatar asked Sep 19 '25 11:09

DeltaIV


2 Answers

Disclaimer. I think that the polars developers want to nudge the users away from using in-place updates. Also, pl.DataFrame.with_columns is a cheap operation as it is incredibly optimized and doesn't just copy the underlying data. Hence, using

df = df.with_columns(pl.Series("random", new))

seems like the best approach. See this answer for more information.


Still, if you need in-place updates (e.g. because you implemented a library function, whose interface depends on it), you can use pl.DataFrame.replace_column.

new_col = pl.Series("random", np.arange(5))
df.replace_column(df.columns.index(new_col.name), new_col)
like image 118
Hericks Avatar answered Sep 23 '25 06:09

Hericks


Yes, you are doing right. You need to use with_columns in the follwing way:

import polars as pl
import numpy as np

df = pl.DataFrame({
    "nrs": [1, 2, 3, None, 5],
    "names": ["foo", "ham", "spam", "egg", None],
    "random": np.random.rand(5), 
    "A": [True, True, False, False, False],
})

print(df)
new = np.arange(5)

new_series = pl.Series('random', new)

df_new = df.with_columns(new_series)

print(df_new)

Here is the original df:

shape: (5, 4)
┌──────┬───────┬──────────┬───────┐
│ nrs  ┆ names ┆ random   ┆ A     │
│ ---  ┆ ---   ┆ ---      ┆ ---   │
│ i64  ┆ str   ┆ f64      ┆ bool  │
╞══════╪═══════╪══════════╪═══════╡
│ 1    ┆ foo   ┆ 0.736232 ┆ true  │
│ 2    ┆ ham   ┆ 0.017485 ┆ true  │
│ 3    ┆ spam  ┆ 0.940966 ┆ false │
│ null ┆ egg   ┆ 0.157872 ┆ false │
│ 5    ┆ null  ┆ 0.003914 ┆ false │
└──────┴───────┴──────────┴───────┘

and here is the new one

shape: (5, 4)
┌──────┬───────┬────────┬───────┐
│ nrs  ┆ names ┆ random ┆ A     │
│ ---  ┆ ---   ┆ ---    ┆ ---   │
│ i64  ┆ str   ┆ i64    ┆ bool  │
╞══════╪═══════╪════════╪═══════╡
│ 1    ┆ foo   ┆ 0      ┆ true  │
│ 2    ┆ ham   ┆ 1      ┆ true  │
│ 3    ┆ spam  ┆ 2      ┆ false │
│ null ┆ egg   ┆ 3      ┆ false │
│ 5    ┆ null  ┆ 4      ┆ false │
└──────┴───────┴────────┴───────┘
like image 43
Serge de Gosson de Varennes Avatar answered Sep 23 '25 05:09

Serge de Gosson de Varennes