According to the API docs:
getActiveSession
() Returns the active SparkSession for the current thread, returned by the builder.
getDefaultSession
() Returns the default SparkSession that is returned by the builder.
I was (most likely erroneously) using getActiveSession
to retrieve the SparkSession
or SparkContext
in some functions across multiple threads. Sometimes the activeSession was not defined (most likely because the thread had just started up).
Can someone explain the difference between the two, or is the API doc sufficiently self-explanatory?
Also, when would I use getActiveSession
if
In 99% of apps there is only one session and
getDefaultSession
should return that session
ActiveSession
is for single thread while DefaultSession
is global. The DefaultSession
is the ActiveSession
for main thread by default.SparkSession
object share the same SparkContext
. But they may have different states, like SQL configurations, temporary tables and registered functions.In 99% of apps there is only one session
, you are right, in fact, more than 99%.ActiveSession
?
DefaultSession
, you must use different name for each dataframe like city_1
, city_2
.ActiveSession
(you can create new session by SparkSession.newSession
), you can register all the temp views with the same name city
, everything goes easy.SparkSession.active
can help you fall to DefaultSession
when ActiveSession
not existIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With