In Spark SQL (working with the Java APIs) I have a <code>DataFrame</code>. The <code>DataFrame</code> has a <code>select</code> method. I wonder if it's a transformation or an action? I just need a confirmation and a good reference which states that clearly.

It is transformation. Please refer: https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/Dataset.html <blockquote> A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. Each Dataset also has an untyped view called a DataFrame, which is a Dataset of Row. Operations available on Datasets are divided into transformations and actions. Transformations are the ones that produce new Datasets, and actions are the ones that trigger computation and return results. Example transformations include map, filter, select, and aggregate (groupBy). Example actions count, show, or writing data out to file systems. </blockquote>

Spark SQL - DataFrame - select - transformation or action?

2 Answers

It is transformation. Please refer: https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/Dataset.html

A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. Each Dataset also has an untyped view called a DataFrame, which is a Dataset of Row.

Operations available on Datasets are divided into transformations and actions. Transformations are the ones that produce new Datasets, and actions are the ones that trigger computation and return results. Example transformations include map, filter, select, and aggregate (groupBy). Example actions count, show, or writing data out to file systems.

186

answered Oct 09 '22 03:10

Nikhil

If you execute the below code you will be able to see the output in the console

Click to copy

import org.apache.spark.sql.SparkSession

object learnSpark2 extends App {
    val sparksession = SparkSession.builder()
        .appName("Learn Spark")
        .config("spark.master", "local")
        .getOrCreate()

    val range = sparksession.range(1, 500).toDF("numbers")
    range.select(range.col("numbers"), range.col("numbers") + 10).show(2)
}

+-------+--------------+

|numbers|(numbers + 10)|

+-------+--------------+

| 1| 11|

| 2| 12|

If you execute the folowing code with only select and not show, you will not be able to see any output though the code is execute, then it mean select is just a transformation and it is not action. So it will not be evaluated.

Click to copy

object learnSpark2 extends App {
    val sparksession = SparkSession.builder()
        .appName("Learn Spark")
        .config("spark.master","local")
        .getOrCreate()

    val range = sparksession.range(1, 500).toDF("numbers")
    range.select(range.col("numbers"), range.col("numbers") + 10)
}

In the console:

Click to copy

19/01/03 22:46:25 INFO Utils: Successfully started service 'sparkDriver' on port 55531.

19/01/03 22:46:25 INFO SparkEnv: Registering MapOutputTracker

19/01/03 22:46:25 INFO SparkEnv: Registering BlockManagerMaster

19/01/03 22:46:25 INFO BlockManagerMasterEndpoint: Using

org.apache.spark.storage.DefaultTopologyMapper for getting topology information

19/01/03 22:46:25 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up

19/01/03 22:46:25 INFO DiskBlockManager: Created local directory at

C:\Users\swilliam\AppData\Local\Temp\blockmgr-9abc8a2c-15ee-4e4f-be04-9ef37ace1b7c

19/01/03 22:46:25 INFO MemoryStore: MemoryStore started with capacity 1992.9 MB

19/01/03 22:46:25 INFO SparkEnv: Registering OutputCommitCoordinator

19/01/03 22:46:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.

19/01/03 22:46:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at

http://10.192.99.214:4040

19/01/03 22:46:26 INFO Executor: Starting executor ID driver on host localhost

19/01/03 22:46:26 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 55540.
19/01/03 22:46:26 INFO NettyBlockTransferService: Server created on 10.192.99.214:55540

19/01/03 22:46:26 INFO BlockManager: Using

org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy

19/01/03 22:46:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.192.99.214, 55540, None)

19/01/03 22:46:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.192.99.214:55540 with 1992.9 MB RAM, BlockManagerId(driver, 10.192.99.214, 55540, None)

19/01/03 22:46:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.192.99.214, 55540, None)

19/01/03 22:46:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.192.99.214, 55540, None)
19/01/03 22:46:26 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/UDEMY/SparkJob/spark-warehouse/').
19/01/03 22:46:26 INFO SharedState: Warehouse path is 'file:/C:/UDEMY/SparkJob/spark-warehouse/'.
19/01/03 22:46:27 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
19/01/03 22:46:29 INFO SparkContext: Invoking stop() from shutdown hook
19/01/03 22:46:29 INFO SparkUI: Stopped Spark web UI at http://10.192.99.214:4040
19/01/03 22:46:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/01/03 22:46:29 INFO MemoryStore: MemoryStore cleared
19/01/03 22:46:29 INFO BlockManager: BlockManager stopped
19/01/03 22:46:29 INFO BlockManagerMaster: BlockManagerMaster stopped
19/01/03 22:46:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/01/03 22:46:29 INFO SparkContext: Successfully stopped SparkContext
19/01/03 22:46:29 INFO ShutdownHookManager: Shutdown hook called
19/01/03 22:46:29 INFO ShutdownHookManager: Deleting directory C:\Users\swilliam\AppData\Local\Temp\spark-c69bfb9b-f351-45af-9947-77950b23dd15
Picked up JAVA_TOOL_OPTIONS: -Djavax.net.ssl.trustStore="C:\Program Files\SquirrelSQL\certificates\jssecacerts"

answered Oct 09 '22 01:10

Samuel William

Related questions
                            
                                Side effects in Java methods
                            
                                BitSet valueOf does what?
                            
                                Java Web Start not working after updating Java 8u141 - java.lang.SecurityException: digest missing for org/apache/commons/httpclient
                            
                                Redirecting user to oauth2 authorization server to get token Spring Boot
                            
                                Can apollo-android be used as a java client?
                            
                                Android Studio 3.0 Canary 8: Advanced profiling is unavailable for the selected process
                            
                                How to force some method to be visible only to kotlin
                            
                                signed zero double equals (as in ==) but Double.compare(double,double) != 0
                            
                                Using arrays in enums Java
                            
                                Java Threads: How to print alphabets and numbers using two threads one at a time
                            
                                Elastic Search, Java API: Validation Failed: 1: script or doc is missing;
                            
                                How to add global where clause for all find methods of Spring data JPA with Hibernate?
                            
                                How to do Distributed Transactions XA in Spring and GlassFish 5?
                            
                                RESTEASY002142: Multiple resource methods match request
                            
                                Realm Java Android: Create or update object
                            
                                Jersey REST client - multipart creation - not from File object
                            
                                Replace this lambda with a method reference. (sonar.java.source not set. Assuming 8 or greater.)
                            
                                Concurrency in Netty
                            
                                Multiple constraints spring validation
                            
                                Spring boot - POST method not allowed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark SQL - DataFrame - select - transformation or action?

Tags:

java

apache-spark

peter.petrov

People also ask

2 Answers

Nikhil

Samuel William

Recent Activity

Donate For Us