I am running a Spark-Kafka Streaming job with 4 executors(1 core each). And the kafka source topic had 50 partitions.
In the foreachpartition of the streaming java program, i am connecting to oracle and doing some work. Apache DBCP2 is being used for connection pool.
Spark-streaming program is making 4 connections to database- may be 1 for each executor. But, My Expectation is - since there are 50 partitions, there should be 50 threads running and 50 database connections exist.
How do i increase the parallelism without increasing the number of cores.
Hints to Improve Writing ParallelismRepeat key words throughout an essay to help the reader maintain focus. Use the same grammatical structures for phrases within lists, for example, verb endings. Repeated transitions can also produce interesting writing parallelism.
The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.
Your expectations are wrong. One core is one available thread in Spark nomenclature and one partition that can be processed at the time.
4 "cores" -> 4 threads -> 4 partitions processed concurently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With