Spark SQL DataFrame pretty print

Question

I'm not very good with Scala (I'm more an R addict) I wish to display the WrappedArray elemnt's content (see below sqlDf.show()) in two rows using Scala in spark-shell. I've tried the explode() function but couldn't get further ...

scala> val sqlDf = spark.sql("select t.articles.donneesComptablesArticle.taxes from  dau_temp t")
sqlDf: org.apache.spark.sql.DataFrame = [taxes: array<array<struct<baseImposition:bigint,codeCommunautaire:string,codeNatureTaxe:string,codeTaxe:string,droitCautionnable:boolean,droitPercu:boolean,imputationCreditCautionne:boolean,montantLiquidation:bigint,quotite:double,statutAi2:boolean,statutDeLiquidation:string,statutRessourcesPropres:boolean,typeTaxe:string>>>]

scala> sqlDf.show
16/12/21 15:13:21 WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
+--------------------+
|               taxes|
+--------------------+
|[WrappedArray([12...|
+--------------------+


scala> sqlDf.printSchema
root
 |-- taxes: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- baseImposition: long (nullable = true)
 |    |    |    |-- codeCommunautaire: string (nullable = true)
 |    |    |    |-- codeNatureTaxe: string (nullable = true)
 |    |    |    |-- codeTaxe: string (nullable = true)
 |    |    |    |-- droitCautionnable: boolean (nullable = true)
 |    |    |    |-- droitPercu: boolean (nullable = true)
 |    |    |    |-- imputationCreditCautionne: boolean (nullable = true)
 |    |    |    |-- montantLiquidation: long (nullable = true)
 |    |    |    |-- quotite: double (nullable = true)
 |    |    |    |-- statutAi2: boolean (nullable = true)
 |    |    |    |-- statutDeLiquidation: string (nullable = true)
 |    |    |    |-- statutRessourcesPropres: boolean (nullable = true)
 |    |    |    |-- typeTaxe: string (nullable = true)

scala> val sqlDfTaxes = sqlDf.select(explode(sqlDf("taxes")))
sqlDfTaxes: org.apache.spark.sql.DataFrame = [col: array<struct<baseImposition:bigint,codeCommunautaire:string,codeNatureTaxe:string,codeTaxe:string,droitCautionnable:boolean,droitPercu:boolean,imputationCreditCautionne:boolean,montantLiquidation:bigint,quotite:double,statutAi2:boolean,statutDeLiquidation:string,statutRessourcesPropres:boolean,typeTaxe:string>>]

scala> sqlDfTaxes.show()
16/12/21 15:22:28 WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
+--------------------+
|                 col|
+--------------------+
|[[12564,B00,TVA,A...|
+--------------------+

The "readable" content looks like this (THIS IS MY GOAL: a classic row x columns structure display with headers):

codeTaxe codeCommunautaire baseImposition quotite montantLiquidation statutDeLiquidation
A445               B00          12564    20.0               2513                   C
U165               A00          12000     4.7                564                   C
codeNatureTaxe typeTaxe statutRessourcesPropres statutAi2 imputationCreditCautionne
TVA    ADVAL                   FALSE      TRUE                     FALSE
DD    ADVAL                    TRUE     FALSE                      TRUE
droitCautionnable droitPercu
FALSE       TRUE
FALSE       TRUE

and the class of each row is (found it using R package sparklyr):

<jobj[100]>
  class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
  [12564,B00,TVA,A445,false,true,false,2513,20.0,true,C,false,ADVAL]

[[1]][[1]][[2]]
<jobj[101]>
  class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
  [12000,A00,DD,U165,false,true,true,564,4.7,false,C,true,ADVAL]

toofrellik · Accepted Answer

you can explode on each column:

val flattenedtaxes = sqlDf.withColumn("codeCommunautaire",  org.apache.spark.sql.functions.explode($"taxes. codeCommunautaire"))

After this your flattenedtaxes will have 2 columns taxes(all the columns as is) new column codeCommunautaire

Spark SQL DataFrame pretty print

Tags:

json

scala

apache-spark-sql

guzu92

1 Answers

toofrellik

Recent Activity

Donate For Us

Spark SQL DataFrame pretty print

Tags:

json

scala

apache-spark-sql

guzu92

1 Answers

toofrellik

Related questions

Recent Activity

Donate For Us