I am writing Spark Application in Java which reads the HiveTable and store the output in HDFS as Json Format.
I read the hive table using HiveContext and it returns the DataFrame. Below is the code snippet.
 SparkConf conf = new SparkConf().setAppName("App");
 JavaSparkContext sc = new JavaSparkContext(conf);
 HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc);
DataFrame data1= hiveContext.sql("select * from tableName")
Now I want to convert DataFrame to JsonArray. For Example, data1 data looks like below
|  A  |     B     |
-------------------
|  1  | test      |
|  2  | mytest    |
I need an output like below
[{1:"test"},{2:"mytest"}]
I tried using data1.schema.json() and it gives me the output like below, not an Array.
{1:"test"}
{2:"mytest"}
What is the right approach or function to convert the DataFrame to jsonArray without using any third Party libraries.
data1.schema.json will give you a JSON string containing the schema of the dataframe and not the actual data itself. You will get :
String = {"type":"struct",
          "fields":
                  [{"name":"A","type":"integer","nullable":false,"metadata":{}},
                  {"name":"B","type":"string","nullable":true,"metadata":{}}]}
To convert your dataframe to array of JSON, you need to use toJSON method of DataFrame:
val df = sc.parallelize(Array( (1, "test"), (2, "mytest") )).toDF("A", "B")
df.show()
+---+------+
|  A|     B|
+---+------+
|  1|  test|
|  2|mytest|
+---+------+
df.toJSON.collect.mkString("[", "," , "]" )
String = [{"A":1,"B":"test"},{"A":2,"B":"mytest"}]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With