Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increasing Parallellism in Spark Executor without increasing Cores

I am running a Spark-Kafka Streaming job with 4 executors(1 core each). And the kafka source topic had 50 partitions.

In the foreachpartition of the streaming java program, i am connecting to oracle and doing some work. Apache DBCP2 is being used for connection pool.

Spark-streaming program is making 4 connections to database- may be 1 for each executor. But, My Expectation is - since there are 50 partitions, there should be 50 threads running and 50 database connections exist.

How do i increase the parallelism without increasing the number of cores.

like image 793
AKC Avatar asked Dec 13 '16 23:12

AKC


People also ask

How do you increase parallelism?

Hints to Improve Writing ParallelismRepeat key words throughout an essay to help the reader maintain focus. Use the same grammatical structures for phrases within lists, for example, verb endings. Repeated transitions can also produce interesting writing parallelism.

How many cores does executor Spark have?

The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.


1 Answers

Your expectations are wrong. One core is one available thread in Spark nomenclature and one partition that can be processed at the time.

4 "cores" -> 4 threads -> 4 partitions processed concurently.

like image 83
user7293606 Avatar answered Nov 15 '22 02:11

user7293606