I am using cassandra 2.0.3 and I would like to use pyspark (Apache Spark Python API) to create an RDD object from cassandra data.
PLEASE NOTE: I do not want to do import CQL and then CQL query from pyspark API rather I would like to create an RDD on which I woud like to do some transformations.
I know this can be done in Scala but I am not able to find out how this could be done from pyspark.
Really appreciate if anyone could guide me on this.
Might not be relevant to you anymore, but I was looking for the same thing and couldn't find anything which I was happy with. So I did some work on this: https://github.com/TargetHolding/pyspark-cassandra. Needs a lot of testing before use in production, but I think the integration works quite nicely.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With