How to add a column with JSON representation of rows in Polars DataFrame?

Question

I want to use polars to take a csv input and get for each row another column (e.g called json_per_row) where the entry per row is the json representation of the entire row. I also want to select only a subset of the columns to be included alongside the json_per_row column.

Ideally I don’t want to hardcode the number / names of the columns of my input but just to illustrate I’ve provided a simple example below:

# Input: csv with columns time, var1, var2,... 
s1 = pl.Series("time", [100, 200, 300])
s2 = pl.Series("var1", [1,2,3])
s3 = pl.Series("var2", [4,5,6])

# I want to add this column with polars somehow
output_col = pl.Series("json_per_row", [
    json.dumps({ "time": 100, "var1":1, "var2":4 }),
    json.dumps({ "time": 200, "var1":2, "var2":5 }),
    json.dumps({ "time":300 , "var1":3, "var2":6 })
])

# Desired output
df = pl.DataFrame([s1, output_col])
print(df)

So is there a way to do this with the functions in the polars library? I'd rather not use json.dumps if it's not needed since as the docs say it can affect performance if you have to bring in external / user defined functions. Thanks

Roman Pekar · Accepted Answer

you can use read_csv() to read your csv data, but here I'll just use Series data you provided.
.struct() to combine all the columns into one struct column.
struct.json_encode() to convert to json.

(
    pl.DataFrame([s1,s2,s3])
    .select(
        pl.col.time,
        json_per_row = pl.struct(pl.all()).struct.json_encode()
    )
)

┌──────┬────────────────────────────────┐
│ time ┆ json_per_row                   │
│ ---  ┆ ---                            │
│ i64  ┆ str                            │
╞══════╪════════════════════════════════╡
│ 100  ┆ {"time":100,"var1":1,"var2":4} │
│ 200  ┆ {"time":200,"var1":2,"var2":5} │
│ 300  ┆ {"time":300,"var1":3,"var2":6} │
└──────┴────────────────────────────────┘

How to add a column with JSON representation of rows in Polars DataFrame?

Tags:

python

python-polars

SuperJeff

1 Answers

Roman Pekar

Recent Activity

Donate For Us

How to add a column with JSON representation of rows in Polars DataFrame?

Tags:

python

python-polars

SuperJeff

1 Answers

Roman Pekar

Related questions

Recent Activity

Donate For Us