How to manage a Apache Spark context in Django?

Question

I have a Django application that interacts with a Cassandra database and I want to try using Apache Spark to run operations on this database. I have some experience with Django and Cassandra but I'm new to Apache Spark.

I know that to interact with a Spark cluster first I need to create a SparkContext, something like this:

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

My question is the following: how should I treat this context? Should I instantiate it when my application starts and let it live during it's execution or should I start a SparkContext everytime before running an operation in the cluster and then kill it when the operation finishes?

Thank you in advance.

Pedro Bernardo · Accepted Answer

For the last days I've been working on this, since no one answered I will post what was my approach.

Apparently creating a SparkContext generates a bit of overhead, so stopping the context after every operation is not a good idea.

Also, there is no downfall, apparently, on letting the context live while the application runs.

Therefore, my approach was treating the SparkContext like a database connection, I created a singleton that instantiates the context when the application starts running and used it where needed.

I hope this can be helpful to someone, and I am open to new suggestions on how to deal with this, I'm still new to Apache Spark.

How to manage a Apache Spark context in Django?

Tags:

python

django

apache-spark

Pedro Bernardo

1 Answers

Pedro Bernardo

Recent Activity

Donate For Us

How to manage a Apache Spark context in Django?

Tags:

python

django

apache-spark

Pedro Bernardo

1 Answers

Pedro Bernardo

Related questions

Recent Activity

Donate For Us