Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Truncated Spark Plan

I face the following issue: when printing the executed plan, I am unable to look at all the pushed filters.

The code executed is

println(df.queryExecution.executedPlan.treeString(true))

All the plan is printed, and in the Pushed filter field it is as the following

 PushedFilters: [IsNotNull(X1), IsNotNull(X2), IsNotNull(X2), IsNotNull(X3..., ReadSchema: 

As you might notice, it does not print it completely. Additionally, trying to solve this problem I modified the following property in the spark-default.conf

spark.debug.maxToStringFields    120000

Unfortunately, the previous did not solve the problem.

Any suggestions on how to overcome this?

like image 918
Alessandroempire Avatar asked May 02 '19 12:05

Alessandroempire


2 Answers

It is currently hardcoded [1, 2] to be a maximum of 100 characters as of Spark 3.0.1, but it is fixed recently with a newly introduced config key spark.sql.maxMetadataStringLength which defaults to 100.

like image 143
ofo Avatar answered Sep 27 '22 23:09

ofo


You can do df.explain(true) it will output the entire plan:

== Parsed Logical Plan ==
'SerializeFromObject [validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, x), IntegerType) AS x#67, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, y), IntegerType) AS y#68, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, z), IntegerType) AS z#69]
+- 'MapElements <function1>, interface org.apache.spark.sql.Row, [StructField(x,IntegerType,false), StructField(y,IntegerType,false), StructField(z,IntegerType,false)], obj#66: org.apache.spark.sql.Row
   +- 'DeserializeToObject unresolveddeserializer(createexternalrow(getcolumnbyordinal(0, IntegerType), getcolumnbyordinal(1, IntegerType), getcolumnbyordinal(2, IntegerType), StructField(x,IntegerType,false), StructField(y,IntegerType,false), StructField(z,IntegerType,false))), obj#65: org.apache.spark.sql.Row
      +- Filter isnull(y#9)
         +- Filter (x#8 = 0)
            +- Project [_1#4 AS x#8, _2#5 AS y#9, _3#6 AS z#10]
               +- SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._1 AS _1#4, assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._2 AS _2#5, assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._3 AS _3#6]
                  +- ExternalRDD [obj#3]

== Analyzed Logical Plan ==
x: int, y: int, z: int
SerializeFromObject [validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, x), IntegerType) AS x#67, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, y), IntegerType) AS y#68, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, z), IntegerType) AS z#69]
+- MapElements <function1>, interface org.apache.spark.sql.Row, [StructField(x,IntegerType,false), StructField(y,IntegerType,false), StructField(z,IntegerType,false)], obj#66: org.apache.spark.sql.Row
   +- DeserializeToObject createexternalrow(x#8, y#9, z#10, StructField(x,IntegerType,false), StructField(y,IntegerType,false), StructField(z,IntegerType,false)), obj#65: org.apache.spark.sql.Row
      +- Filter isnull(y#9)
         +- Filter (x#8 = 0)
            +- Project [_1#4 AS x#8, _2#5 AS y#9, _3#6 AS z#10]
               +- SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._1 AS _1#4, assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._2 AS _2#5, assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._3 AS _3#6]
                  +- ExternalRDD [obj#3]

== Optimized Logical Plan ==
SerializeFromObject [validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, x), IntegerType) AS x#67, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, y), IntegerType) AS y#68, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, z), IntegerType) AS z#69]
+- MapElements <function1>, interface org.apache.spark.sql.Row, [StructField(x,IntegerType,false), StructField(y,IntegerType,false), StructField(z,IntegerType,false)], obj#66: org.apache.spark.sql.Row
   +- DeserializeToObject createexternalrow(x#8, y#9, z#10, StructField(x,IntegerType,false), StructField(y,IntegerType,false), StructField(z,IntegerType,false)), obj#65: org.apache.spark.sql.Row
      +- LocalRelation <empty>, [x#8, y#9, z#10]

== Physical Plan ==
*(1) SerializeFromObject [validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, x), IntegerType) AS x#67, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, y), IntegerType) AS y#68, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, z), IntegerType) AS z#69]
+- *(1) MapElements <function1>, obj#66: org.apache.spark.sql.Row
   +- *(1) DeserializeToObject createexternalrow(x#8, y#9, z#10, StructField(x,IntegerType,false), StructField(y,IntegerType,false), StructField(z,IntegerType,false)), obj#65: org.apache.spark.sql.Row
      +- LocalTableScan <empty>, [x#8, y#9, z#10]
like image 24
Neenad Ingole Avatar answered Sep 27 '22 22:09

Neenad Ingole