Is there an easy way how use <code>explode</code> on array column on SparkSQL <code>DataFrame</code>? It's relatively simple in Scala, but this function seems to be unavailable (as mentioned in javadoc) in Java. An option is to use <code>SQLContext.sql(...)</code> and <code>explode</code> function inside the query, but I'm looking for a bit better and especially cleaner way. <code>DataFrame</code>s are loaded from parquet files.

It seems it is possible to use a combination of <code>org.apache.spark.sql.functions.explode(Column col)</code> and <code>DataFrame.withColumn(String colName, Column col)</code> to replace the column with the exploded version of it.

SparkSQL and explode on DataFrame in Java

2 Answers

I solved it in this manner: say that you have an array column containing job descriptions named "positions", for each person with "fullName".

Then you get from initial schema :

root
|-- fullName: string (nullable = true)
|-- positions: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- companyName: string (nullable = true)
    |    |    |-- title: string (nullable = true)
...

to schema:

root
 |-- personName: string (nullable = true)
 |-- companyName: string (nullable = true)
 |-- positionTitle: string (nullable = true)

by doing:

    DataFrame personPositions = persons.select(persons.col("fullName").as("personName"),
          org.apache.spark.sql.functions.explode(persons.col("positions")).as("pos"));

    DataFrame test = personPositions.select(personPositions.col("personName"),
    personPositions.col("pos").getField("companyName").as("companyName"), personPositions.col("pos").getField("title").as("positionTitle"));

answered Oct 30 '22 04:10

marilena.oita

It seems it is possible to use a combination of org.apache.spark.sql.functions.explode(Column col) and DataFrame.withColumn(String colName, Column col) to replace the column with the exploded version of it.

answered Oct 30 '22 05:10

JiriS

Related questions
                            
                                Best structure for list of key-value (integer, string) to be shuffled
                            
                                Can not open project with IntelliJ IDEA
                            
                                JAX-WS client without a WSDL document file
                            
                                Access a private variable from the superclass (JAVA)
                            
                                javac: file not found: first.java Usage: javac <options> <source files>
                            
                                Iteration performance Java vs. C++
                            
                                Understanding super fast blur algorithm
                            
                                Get the values from the properties file at runtime based on the input - java Spring
                            
                                How to ignore placeholder expressions for Flyway?
                            
                                Is there any way to create a Pivot Table in Excel using Apache POI?
                            
                                org.codehaus.groovy.grails.cli.support.GrailsStarter not found error
                            
                                Replace comma with newline in java
                            
                                javac: invalid target release: 1.8 on Mac when executing Maven command
                            
                                Why isn't BigInteger a primitive
                            
                                Android: Is onPause() guaranteed to be called after finish()?
                            
                                How to parametrize a string and replace parameters
                            
                                Pretty HTML snippet output
                            
                                minimal java8 nio secure websocket client (wss)
                            
                                How does Collections.binarySearch work?
                            
                                Spring Boot & Thymeleaf - remove strict HTML error checking

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SparkSQL and explode on DataFrame in Java

Tags:

java

apache-spark

apache-spark-sql

JiriS

People also ask

2 Answers

marilena.oita

JiriS

Recent Activity

Donate For Us