I have a dataframe like this: <pre class="prettyprint"><code>+---+---+ |_c0|_c1| +---+---+ |1.0|4.0| |1.0|4.0| |2.1|3.0| |2.1|3.0| |2.1|3.0| |2.1|3.0| |3.0|6.0| |4.0|5.0| |4.0|5.0| |4.0|5.0| +---+---+ </code></pre> and I would like to shuffle all the rows using Spark in Scala. How can I do this without going back to RDD?

You need to use <code>orderBy</code> method of the dataframe: <pre class="prettyprint"><code>import org.apache.spark.sql.functions.rand val shuffledDF = dataframe.orderBy(rand()) </code></pre>

How to shuffle the rows in a Spark dataframe?

Tags:

dataframe

scala

apache-spark

apache-spark-sql

I have a dataframe like this:

+---+---+
|_c0|_c1|
+---+---+
|1.0|4.0|
|1.0|4.0|
|2.1|3.0|
|2.1|3.0|
|2.1|3.0|
|2.1|3.0|
|3.0|6.0|
|4.0|5.0|
|4.0|5.0|
|4.0|5.0|
+---+---+

and I would like to shuffle all the rows using Spark in Scala.

How can I do this without going back to RDD?

664

asked Apr 26 '17 14:04

Laure D

1 Answers

You need to use orderBy method of the dataframe:

import org.apache.spark.sql.functions.rand
val shuffledDF = dataframe.orderBy(rand())

118

answered Oct 17 '22 21:10

prudenko

Related questions
                            
                                org.apache.catalina.connector.ClientAbortException: java.io.IOException: APR error: -32
                            
                                Conditional If Statement: If value in row contains string ... set another column equal to string
                            
                                Android: force gradle to include only one version of a library
                            
                                Calling kotlin functions which are keywords in java from java?
                            
                                Is there any definite list of Sliver widgets
                            
                                Migration to create table raises Mysql2::Error: Table doesn't exist
                            
                                How to list groups that host is member of?
                            
                                Passing out-of-scope variable with jest.mock
                            
                                shortcut for changing the editor font size in Oracle SQL Developer
                            
                                Firebase Auth/unauthorized domain. Domain is not authorized
                            
                                Kubernetes - Pod Remains in ContainerCreating Status
                            
                                Sequelize: Where is an example of using bulkDelete with criteria?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With