I am facing issue in readStream on delta table.
What is expected, reference from following link https://docs.databricks.com/delta/delta-streaming.html#delta-table-as-a-stream-source Ex:
spark.readStream.format("delta").table("events") -- As expected, should work fine
Issue, I have tried the same in the following way:
df.write.format("delta").saveAsTable("deltatable") -- Saved the Dataframe as a delta table
spark.readStream.format("delta").table("deltatable") -- Called readStream
error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'DataStreamReader' object has no attribute 'table'
Note: I am running it in localhost, using pycharm IDE, Installed latest version of pyspark, spark version = 2.4.5, Scala version 2.11.12
To create a Delta table, write a DataFrame out in the delta format. You can use existing Spark SQL code and change the format from parquet , csv , json , and so on, to delta . import org. apache.
Can I access Delta tables outside of Databricks Runtime? There are two cases to consider: external reads and external writes. External reads: Delta tables store data encoded in an open format (Parquet), allowing other tools that understand this format to read the data.
ReadDeltaTable object is created in which spark session is initiated. The "Sampledata" value is created in which data is loaded. Further, the Delta table is created by path defined as "/tmp/delta-table" that is delta table is stored in tmp folder using by path defined "/tmp/delta-table" and using function "spark. read.
The DataStreamReader.table
and DataStreamWriter.table
methods are not in Apache Spark yet. Currently you need to use Databricks Notebook in order to call them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With