How can I print nulls when converting a dataframe to json in Spark

Tags:

I have a dataframe that I read from a csv.

CSV:
name,age,pets
Alice,23,dog
Bob,30,dog
Charlie,35,

Reading this into a DataFrame called myData:
+-------+---+----+
|   name|age|pets|
+-------+---+----+
|  Alice| 23| dog|
|    Bob| 30| dog|
|Charlie| 35|null|
+-------+---+----+

Now, I want to convert each row of this dataframe to a json using myData.toJSON. What I get are the following jsons.

{"name":"Alice","age":"23","pets":"dog"}
{"name":"Bob","age":"30","pets":"dog"}
{"name":"Charlie","age":"35"}

I would like the 3rd row's json to include the null value. Ex.

{"name":"Charlie","age":"35", "pets":null}

However, this doesn't seem to be possible. I debugged through the code and saw that Spark's org.apache.spark.sql.catalyst.json.JacksonGenerator class has the following implementation

  private def writeFields(
    row: InternalRow, schema: StructType, fieldWriters: 
    Seq[ValueWriter]): Unit = {
    var i = 0
    while (i < row.numFields) {
      val field = schema(i)
      if (!row.isNullAt(i)) {
        gen.writeFieldName(field.name)
        fieldWriters(i).apply(row, i)
      }
      i += 1
    }
  }

This seems to be skipping a column if it is null. I am not quite sure why this is the default behavior but is there a way to print null values in json using Spark's toJSON?

I am using Spark 2.1.0

360

asked Aug 11 '17 03:08

Rahul

1 Answers

To print the null values in JSON using Spark's toJSON method, you can use following code:

myData.na.fill("null").toJSON

It will give you expected result:

+-------------------------------------------+
|value                                      |
+-------------------------------------------+
|{"name":"Alice","age":"23","pets":"dog"}   |
|{"name":"Bob","age":"30","pets":"dog"}     |
|{"name":"Charlie","age":"35","pets":"null"}|
+-------------------------------------------+

I hope it helps!

145

answered Oct 23 '22 12:10

himanshuIIITian

Related questions
                            
                                Create dynamic Word Cloud using d3.js
                            
                                Add Index column to dataTable
                            
                                How to query in DocumentDB based on inner json object value?
                            
                                JSONArray response with Volley for Android
                            
                                Oracle 12c JSON Query Issue with Dot Notation and Double Quotes
                            
                                What is the best way to work with nested JSON structures in Golang?
                            
                                How to remove duplicate and sort objects from JSONArray using Java
                            
                                How to preserve integer data type when exporting to JSON?
                            
                                How do I stream JSON from node?
                            
                                How do I add fields to log4j2's JSON logs
                            
                                Using jq to split a string into nested objects
                            
                                add item to the collection with foreign key via REST call
                            
                                Deserializing field from nested objects within JSON response with Jackson
                            
                                Pass JSON Data from PHP to Python Script
                            
                                Conditional field requirement based on another field value in Jackson?
                            
                                What database to use for my Electron offline Application [closed]
                            
                                Multiple selected items RecyclerView in Activity.java
                            
                                Generate code for multiple swaggers in the same project
                            
                                sending JSON object along with file using FormData in ajax call and accessing the json object in PHP
                            
                                Deserialize JSON array of arrays into List of Tuples using Newtonsoft

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I print nulls when converting a dataframe to json in Spark

Tags:

json

scala

apache-spark

apache-spark-sql

Rahul

People also ask

1 Answers

himanshuIIITian

Recent Activity

Donate For Us