I have a DataFrame
created by running sqlContext.read
of a Parquet file.
The DataFrame
consists of 300 M rows. I need to use these rows as input to another function, but I want to do it in smaller batches to prevent OOM error.
Currently, I am using df.head(1000000)
to read the first 1M rows, but I cannot find a way to read the subsequent rows. I tried df.collect()
, but it gives me a Java OOM error.
I want to iterate over this dataframe. I tried adding another column with the withColumn() API to generate a unique set of values to iterate over, but none of the existing columns in the dataframe have solely unique values.
For example, I tried val df = df1.withColumn("newColumn", df1("col") + 1)
as well as val df = df1.withColumn("newColumn",lit(i+=1))
, both of which do not return a sequential set of values.
Any other way to get the first n rows of a dataframe and then the next n rows, something that works like a range function of SqlContext
?
For looping through each row using map() first we have to convert the PySpark dataframe into RDD because map() is performed on RDD's only, so first convert into RDD it then use map() in which, lambda function for iterating through each row and stores the new RDD in some variable then convert back that new RDD into ...
By using SQL query with between() operator we can get the range of rows.
In Spark/PySpark, you can use show() action to get the top/first N (5,10,100 ..) rows of the DataFrame and display them on a console or a log, there are also several Spark Actions like take() , tail() , collect() , head() , first() that return top and last n rows as a list of Rows (Array[Row] for Scala).
To get the number of rows from the PySpark DataFrame use the count() function. This function returns the total number of rows from the DataFrame.
You can simple use the limit and except api of dataset or dataframes as follows
long count = df.count();
int limit = 50;
while(count > 0){
df1 = df.limit(limit);
df1.show(); //will print 50, next 50, etc rows
df = df.except(df1);
count = count - limit;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With