Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace part of string with values from dictionary?

Loving the Polars library for its fantastic speed and easy syntax!

Struggling with this question - is there an analogue in Polars for the Pandas code below? Would like to replace strings using a dictionary.

Tried using this expression, but it returns 'TypeError: 'dict' object is not callable'

pl.col("List").str.replace_all(lambda key: key,dict())

Trying to replace the Working Pandas code below with a Polars expression

df = pd.DataFrame({'List':[
    'Systems',
    'Software',
    'Cleared'
    ]})

dic = {
    'Systems':'Sys'
    ,'Software':'Soft'
    ,'Cleared':'Clr'
    }

df["List"] = df["List"].replace(dic, regex=True)

Output:

 List
 0   Sys
 1  Soft
 2   Clr

like image 682
DBOak Avatar asked Sep 02 '25 17:09

DBOak


2 Answers

There is a "stale" feature request for accepting a dictionary:

  • https://github.com/pola-rs/polars/issues/11418

One possible workaround is to stack multiple expressions in a loop:

expr = pl.col("List")

for old, new in dic.items():
    expr = expr.str.replace_all(old, new)
    
df.with_columns(result = expr)
shape: (3, 2)
┌──────────┬────────┐
│ List     ┆ result │
│ ---      ┆ ---    │
│ str      ┆ str    │
╞══════════╪════════╡
│ Systems  ┆ Sys    │
│ Software ┆ Soft   │
│ Cleared  ┆ Clr    │
└──────────┴────────┘

For non-regex cases, there is also .str.replace_many():

df.with_columns(
   pl.col("List").str.replace_many(
       ["Systems", "Software", "Cleared"],
       ["Sys", "Soft", "Clr"]
   )
   .alias("result")
)
like image 123
jqurious Avatar answered Sep 05 '25 16:09

jqurious


I think your best bet would be to turn your dic into a dataframe and join the two.

You need to convert your dic to the format which will make a nice DataFrame. You can do that as a list of dicts so that you have

dicdf=pl.DataFrame([{'List':x, 'newList':y} for x,y in dic.items()])

where List is what your column name is and we're arbitrary making newList our new column name that we'll get rid of later

You'll want to join that with your original df and then select all columns except the old List plus newList but renamed to List

df=df.join(
    dicdf, 
    on='List') \
.select([
    pl.exclude(['List','newList']), 
    pl.col('newList').alias('List')
 ])
like image 33
Dean MacGregor Avatar answered Sep 05 '25 15:09

Dean MacGregor