Structured Streaming exception when using append output mode with watermark

Tags:

Despite the fact that I'm using withWatermark(), I'm getting the following error message when I run my spark job:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;;

From what I can see in the programming guide, this exactly matches the intended usage (and the example code). Does anyone know what might be wrong?

Thanks in advance!

Relevant Code (Java 8, Spark 2.2.0):

StructType logSchema = new StructType()
        .add("timestamp", TimestampType)
        .add("key", IntegerType)
        .add("val", IntegerType);

Dataset<Row> kafka = spark
        .readStream()
        .format("kafka")
        .option("kafka.bootstrap.servers", brokers)
        .option("subscribe", topics)
        .load();

Dataset<Row> parsed = kafka
        .select(from_json(col("value").cast("string"), logSchema).alias("parsed_value"))
        .select("parsed_value.*");

Dataset<Row> tenSecondCounts = parsed
        .withWatermark("timestamp", "10 minutes")
        .groupBy(
            parsed.col("key"),
            window(parsed.col("timestamp"), "1 day"))
        .count();

StreamingQuery query = tenSecondCounts
        .writeStream()
        .trigger(Trigger.ProcessingTime("10 seconds"))
        .outputMode("append")
        .format("console")
        .option("truncate", false)
        .start();

527

asked Aug 08 '17 21:08

Ray J

1 Answers

The problem is in parsed.col. Replacing it with col will fix the issue. I would suggest always using col function instead of Dataset.col.

Dataset.col returns resolved column while col returns unresolved column.

parsed.withWatermark("timestamp", "10 minutes") will create a new Dataset with new columns with the same names. The watermark information is attached the timestamp column in the new Dataset, not parsed.col("timestamp"), so the columns in groupBy don't have watermark.

When you use unresolved columns, Spark will figure out the correct columns for you.

133

answered Oct 27 '22 10:10

zsxwing

Related questions
                            
                                How to specify a Primary Key on @ElementCollection
                            
                                Spark Submit fails with java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
                            
                                Why doesn't this Java 8 stream example compile?
                            
                                Copy file from the internal to the external storage in Android
                            
                                Redshift and Postgres JDBC driver both intercept jdbc://postgresql connection string
                            
                                How to calculate hash value of a file in Java? [duplicate]
                            
                                Android: getContext().getContentResolver() sometimes gets NullPointerException
                            
                                Is there a default thread pool in java
                            
                                Adding a List to Json ObjectNode
                            
                                Merge document with PDFMergerUtility in pdfbox 2.00
                            
                                Fullscreen DialogFragment overlaps with StatusBar
                            
                                Logback exclude logger from root
                            
                                Presenter injection with Dagger 2
                            
                                How do I know which node is focused in JavaFX?
                            
                                Android, Volley Request, the response is blocking main thread
                            
                                How to add mulitple AND conditions to criteria in Spring Data
                            
                                What is a StringIndexOutOfBoundsException? How can I fix it?
                            
                                How to reduce my java/gradle docker image size?
                            
                                How to align children in a HBox Left, Center and Right
                            
                                How do I run a class compiled with jaotc?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Structured Streaming exception when using append output mode with watermark

Tags:

java

apache-spark

spark-structured-streaming

Ray J

People also ask

1 Answers

zsxwing

Recent Activity

Donate For Us