Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manipulating data in Polars

A dumb question. How to manipulate columns in Polars?

Explicitly, I have a table with 3 columns : N , Survivors, Deaths

I want to replace Deaths by Deaths * N and Survivors by Survivors * N

the following code is not working

table["SURVIVORS"] = table["SURVIVORS"]*table["N"]

I have this error:

TypeError: 'DataFrame' object does not support 'Series' assignment by index. Use 'DataFrame.with_columns'

thank you

like image 803
Soufiane Fadili Avatar asked Jun 03 '26 14:06

Soufiane Fadili


1 Answers

Polars isn't pandas.

You can't assign a part of a df. To put that another way, the left side of the equals has to be a full df so forget about this syntax table["SURVIVORS"]=

You'll mainly use the with_columns, select methods. The first will add columns to your existing df based on the expression you feed them whereas select will only return what you ask for.

In your case, since you want to overwrite SURVIVORS and DEATHS while keeping N you'd do:

table=table.with_columns([
                          pl.col('SURVIVORS')*pl.col('N'),
                          pl.col('DEATHS')*pl.col('N')
                         ])

If you wanted to rename the columns then you might think to do this:

table=table.with_columns([
                          (pl.col('SURVIVORS')*pl.col('N')).alias('SURIVORS_N'),
                          (pl.col('DEATHS')*pl.col('N')).alias('DEATHS_N')
                         ])

in this case, since with_columns just adds columns, you'll still have the original SURVIVORS and DEATHS column.

This brings it back to select, if you want to have explicit control of what is returned, including the order, then do select:

table=table.select([      'N',
                          (pl.col('SURVIVORS')*pl.col('N')).alias('SURIVORS_N'),
                          (pl.col('DEATHS')*pl.col('N')).alias('DEATHS_N')
                         ])

One note, you can refer to a column by just giving its name, like 'N' in the previous example as long as you don't want to do anything to it. If you want to do something with it (math, rename, anything) then you have to wrap it in pl.col('column_name') so that it becomes an Expression.

like image 108
Dean MacGregor Avatar answered Jun 05 '26 05:06

Dean MacGregor