Output Dstream of Apache Spark in Python

Question

I 'm trying the technologies that i will be using to build a real-time data pipeline, and i have run into some issues exporting my contents to a file.

I have setup a local kafka cluster, and a node.js producer that sends a simple text message just to test functionality and get a rough estimate of complexity of implementation.

This is the spark streaming job that is reading from kafka and i am trying to get it to write to a file.

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

# Create a local StreamingContext with two working thread and batch interval of 1 second
sc = SparkContext("local[2]", "KafkaStreamingConsumer")
ssc = StreamingContext(sc, 10)

kafkaStream = KafkaUtils.createStream(ssc, "localhost:2181", "consumer-group", {"test": 1})

kafkaStream.saveAsTextFile('out.txt')

print 'Event recieved in window: ', kafkaStream.pprint()

ssc.start()
ssc.awaitTermination()

The error i am seeing when submitting the spark job is:

kafkaStream.saveAsTextFile('out.txt')
AttributeError: 'TransformedDStream' object has no attribute 'saveAsTextFile'

No computations or transformations are performed on the data, i just want to build the flow. What do I need to change/add to be able to export the data in a file?

Cody Koeninger · Accepted Answer

http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html

saveAsTextFiles (note the plural)

saveAsTextFile (singular) is a method on an RDD, not a DStream.

Output Dstream of Apache Spark in Python

Tags:

python

apache-kafka

apache-spark

spark-streaming

Stelios Savva

1 Answers

Cody Koeninger

Recent Activity

Donate For Us

Output Dstream of Apache Spark in Python

Tags:

python

apache-kafka

apache-spark

spark-streaming

Stelios Savva

1 Answers

Cody Koeninger

Related questions

Recent Activity

Donate For Us