Pyspark: Delta table as stream source, How to do it?

Tags:

I am facing issue in readStream on delta table.

What is expected, reference from following link https://docs.databricks.com/delta/delta-streaming.html#delta-table-as-a-stream-source Ex:

spark.readStream.format("delta").table("events")  -- As expected, should work fine

Issue, I have tried the same in the following way:

df.write.format("delta").saveAsTable("deltatable")  -- Saved the Dataframe as a delta table

spark.readStream.format("delta").table("deltatable") -- Called readStream

error:

Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'DataStreamReader' object has no attribute 'table'

Note: I am running it in localhost, using pycharm IDE, Installed latest version of pyspark, spark version = 2.4.5, Scala version 2.11.12

562

asked Jun 11 '20 18:06

Sarada Rout

1 Answers

The DataStreamReader.table and DataStreamWriter.table methods are not in Apache Spark yet. Currently you need to use Databricks Notebook in order to call them.

157

answered Sep 17 '22 22:09

zsxwing

Related questions
                            
                                Named accumulator in pyspark
                            
                                spark.sql vs SqlContext
                            
                                log from spark udf to driver
                            
                                Apache Spark UI displays incorrect input size of file being ingested
                            
                                Apache Spark 2.3.1 with Hive metastore 3.1.0
                            
                                Using Spark 2.3.1 with Scala, Reduce Arbitrary List of Date Ranges into distinct non-overlapping ranges of dates
                            
                                Transferring unroll memory to storage memory failed
                            
                                Why Spark dataframe cache doesn't work here
                            
                                How to give alias name for posexplode columns in Spark SQL?
                            
                                Spark Scala, how to check if nested column is present in dataframe
                            
                                Change spark _temporary directory path
                            
                                rdd.histogram gives "can not generate buckets with non-number in RDD" error
                            
                                How to save dataframe to Elasticsearch in PySpark?
                            
                                How to calculate rolling sum with varying window sizes in PySpark
                            
                                Lazy loading of partitioned parquet in Apache Spark
                            
                                Using Java Spark to read large text files line by line
                            
                                Spark Partitionby doesn't scale as expected
                            
                                Handling empty arrays in pySpark (optional binary element (UTF8) is not a group)
                            
                                Spark Scheduling Within an Application : performance issue
                            
                                Spark fillNa not replacing the null value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pyspark: Delta table as stream source, How to do it?

Tags:

apache-spark

pyspark

databricks

delta-lake

Sarada Rout

People also ask

1 Answers

zsxwing

Recent Activity

Donate For Us