Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Apache Spark: Get number of records per partition

Tags:

I want to check how can we get information about each partition such as total no. of records in each partition on driver side when Spark job is submitted with deploy mode as a yarn cluster in order to log or print on the console.

like image

895

asked Sep 04 '17 07:09

nilesh1212

People also ask

How many tasks does Spark run on each partition?

Spark assigns one task per partition and each worker can process one task at a time.

2 Answers

I'd use built-in function. It should be as efficient as it gets:

import org.apache.spark.sql.functions.spark_partition_id  df.groupBy(spark_partition_id).count

like image

194

answered Sep 25 '22 12:09

Alper t. Turker

You can get the number of records per partition like this :

df   .rdd   .mapPartitionsWithIndex{case (i,rows) => Iterator((i,rows.size))}   .toDF("partition_number","number_of_records")   .show

But this will also launch a Spark Job by itself (because the file must be read by spark to get the number of records).

Spark could may also read hive table statistics, but I don't know how to display those metadata..

like image

22

answered Sep 26 '22 12:09

Raphael Roth

Sign in to Comment

Related questions
                            
                                How do I use OpenCV's remap function?
                            
                                debugPaintSizeEnabled is not working in Flutter
                            
                                Flutter Navigator not working
                            
                                Dynamically instanting classes in Objective-C, possible?
                            
                                Any examples of Flot with floating tooltips?
                            
                                How to set name to a Win32 Thread?
                            
                                Forfiles Batch Script (Escaping @ character)
                            
                                document.getElementById("tweetform").submit is not a function?
                            
                                Clearing output of a terminal program Linux C/C++
                            
                                SQL group by day, with count
                            
                                How can I use a JavaScript variable as a PHP variable? [duplicate]
                            
                                Why does Javascript's OR return a value other than true/false?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With