Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark: explode json in column to multiple columns

The data looks like this -

+-----------+-----------+-----------------------------+ |         id|      point|                         data| +-----------------------------------------------------+ |        abc|          6|{"key1":"124", "key2": "345"}| |        dfl|          7|{"key1":"777", "key2": "888"}| |        4bd|          6|{"key1":"111", "key2": "788"}| 

I am trying to break it into the following format.

+-----------+-----------+-----------+-----------+ |         id|      point|       key1|       key2| +------------------------------------------------ |        abc|          6|        124|        345| |        dfl|          7|        777|        888| |        4bd|          6|        111|        788| 

The explode function explodes the dataframe into multiple rows. But that is not the desired solution.

Note: This solution does not answers my questions. PySpark "explode" dict in column

like image 513
sjishan Avatar asked Jun 27 '18 19:06

sjishan


People also ask

What does explode () do in a JSON field?

The explode() function breaks a string into an array.

What is the use of explode function in Pyspark?

Returns a new row for each element in the given array or map. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.


1 Answers

As long as you are using Spark version 2.1 or higher, pyspark.sql.functions.from_json should get you your desired result, but you would need to first define the required schema

from pyspark.sql.functions import from_json, col from pyspark.sql.types import StructType, StructField, StringType  schema = StructType(     [         StructField('key1', StringType(), True),         StructField('key2', StringType(), True)     ] )  df.withColumn("data", from_json("data", schema))\     .select(col('id'), col('point'), col('data.*'))\     .show() 

which should give you

+---+-----+----+----+ | id|point|key1|key2| +---+-----+----+----+ |abc|    6| 124| 345| |df1|    7| 777| 888| |4bd|    6| 111| 788| +---+-----+----+----+ 
like image 58
Ramesh Maharjan Avatar answered Oct 14 '22 19:10

Ramesh Maharjan