Is there a way to take the first 1000 rows of a Spark Dataframe?

Tags:

scala

apache-spark

I am using the randomSplitfunction to get a small amount of a dataframe to use in dev purposes and I end up just taking the first df that is returned by this function.

val df_subset = data.randomSplit(Array(0.00000001, 0.01), seed = 12345)(0)

If I use df.take(1000) then I end up with an array of rows- not a dataframe, so that won't work for me.

Is there a better, simpler way to take say the first 1000 rows of the df and store it as another df?

649

asked Dec 10 '15 16:12

Michael Discenza

1 Answers

The method you are looking for is .limit.

Returns a new Dataset by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new Dataset.

Example usage:

df.limit(1000)

answered Sep 24 '22 12:09

Markon

Related questions
                            
                                When are higher kinded types useful?
                            
                                HowTo: Custom Field in Lift-Record-Squeryl
                            
                                Difference between using App trait and main method in scala
                            
                                How do I apply the enrich-my-library pattern to Scala collections?
                            
                                Map both keys and values of a Scala Map
                            
                                What is an idiomatic Scala way to "remove" one element from an immutable List?
                            
                                Differences between these three ways of defining a function in Scala
                            
                                Abort early in a fold
                            
                                Return in Scala
                            
                                Scala case class inheritance
                            
                                How to save DataFrame directly to Hive?
                            
                                How to read from standard input line by line?
                            
                                Editor does not contain a main type
                            
                                Difference between Abstract Class and Trait [duplicate]
                            
                                Ternary Operator Similar To ?:
                            
                                How to check whether key or value exist in Map?
                            
                                Package objects
                            
                                How to set heap size for sbt?
                            
                                Convert List of tuple to map (and deal with duplicate key ?)
                            
                                What are some example use cases for symbol literals in Scala?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With