I need to write Spark sql query with inner select and partition by. Problem is that I have AnalysisException. I already spend few hours on this but with other approach I have no success. Exception: <pre class="prettyprint"><code>Exception in thread "main" org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;; Window [sum(cast(_w0#41 as bigint)) windowspecdefinition(deviceId#28, timestamp#30 ASC NULLS FIRST, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp#34L], [deviceId#28], [timestamp#30 ASC NULLS FIRST] +- Project [currentTemperature#27, deviceId#28, status#29, timestamp#30, wantedTemperature#31, CASE WHEN (status#29 = cast(false as boolean)) THEN 1 ELSE 0 END AS _w0#41] </code></pre> I assume that this is too complicated query to implement like this. But i don't know to to fix it. <pre class="prettyprint"><code> SparkSession spark = SparkUtils.getSparkSession("RawModel"); Dataset<RawModel> datasetMap = readFromKafka(spark); datasetMap.registerTempTable("test"); Dataset<Row> res = datasetMap.sqlContext().sql("" + " select deviceId, grp, avg(currentTemperature) as averageT, min(timestamp) as minTime ,max(timestamp) as maxTime, count(*) as countFrame " + " from (select test.*, sum(case when status = 'false' then 1 else 0 end) over (partition by deviceId order by timestamp) as grp " + " from test " + " ) test " + " group by deviceid, grp "); </code></pre> Any suggestion would be very appreciated. Thank you.

I believe the issue is in the windowing specification: <pre class="prettyprint"><code>over (partition by deviceId order by timestamp) </code></pre> The partition would need to be over a time based column - in your case timestamp . The following should work: <pre class="prettyprint"><code>over (partition by timestamp order by timestamp) </code></pre> That will not of course address the intent of your query. The following might be attempted: but it is unclear whether spark would support it: <pre class="prettyprint"><code>over (partition by timestamp, deviceId order by timestamp) </code></pre> Even if spark does support that it would still change the semantics of your query. Update Here is a definitive source: from Tathagata Das who is a key/core committer on spark streaming: http://apache-spark-user-list.1001560.n3.nabble.com/Does-partition-by-and-order-by-works-only-in-stateful-case-td31816.html <img src="https://i.stack.imgur.com/sqFC5l.png" alt="enter image description here">

Spark - Non-time-based windows are not supported on streaming DataFrames/Datasets;

Tags:

java

apache-spark

apache-spark-sql

spark-streaming

I need to write Spark sql query with inner select and partition by. Problem is that I have AnalysisException. I already spend few hours on this but with other approach I have no success.

Exception:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;;
Window [sum(cast(_w0#41 as bigint)) windowspecdefinition(deviceId#28, timestamp#30 ASC NULLS FIRST, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp#34L], [deviceId#28], [timestamp#30 ASC NULLS FIRST]
+- Project [currentTemperature#27, deviceId#28, status#29, timestamp#30, wantedTemperature#31, CASE WHEN (status#29 = cast(false as boolean)) THEN 1 ELSE 0 END AS _w0#41]

I assume that this is too complicated query to implement like this. But i don't know to to fix it.

 SparkSession spark = SparkUtils.getSparkSession("RawModel");

 Dataset<RawModel> datasetMap = readFromKafka(spark);

 datasetMap.registerTempTable("test");

 Dataset<Row> res = datasetMap.sqlContext().sql("" +
                " select deviceId, grp, avg(currentTemperature) as averageT, min(timestamp) as minTime ,max(timestamp) as maxTime, count(*) as countFrame " +
                " from (select test.*,  sum(case when status = 'false' then 1 else 0 end) over (partition by deviceId order by timestamp) as grp " +
                "  from test " +
                "  ) test " +
                " group by deviceid, grp ");

Any suggestion would be very appreciated. Thank you.

943

asked Nov 14 '18 07:11

Raskolnikov

Video Answer

1 Answers

I believe the issue is in the windowing specification:

over (partition by deviceId order by timestamp)

The partition would need to be over a time based column - in your case timestamp . The following should work:

over (partition by timestamp order by timestamp)

That will not of course address the intent of your query. The following might be attempted: but it is unclear whether spark would support it:

over (partition by timestamp, deviceId order by timestamp)

Even if spark does support that it would still change the semantics of your query.

Update

Here is a definitive source: from Tathagata Das who is a key/core committer on spark streaming: http://apache-spark-user-list.1001560.n3.nabble.com/Does-partition-by-and-order-by-works-only-in-stateful-case-td31816.html

enter image description here

164

answered Oct 24 '22 17:10

WestCoastProjects

Related questions
                            
                                Correct usage of @Async, @Scheduled and thread-pool in Spring Boot
                            
                                G1 GC - Large background I/O causing JVM unresponsive - a 8sec pause
                            
                                Service has leaked IntentReceiver that was originally registered here. Are you missing a call to unregisterReceiver()?
                            
                                Hibernate Many-to-Many with join-class Cascading issue
                            
                                Java - get the quotient and remainder in the same step?
                            
                                Difference between ORM and Object Mapping?
                            
                                Maximum of list with expensive custom key function
                            
                                Fragment Recyclerview onCreateView, onViewCreated or onActivityCreated?
                            
                                Compare large lists and extract missing data
                            
                                Replace environment variables in Spring properties file other than application.properties
                            
                                Cant move a simple button around a constraint layout
                            
                                Java Executors Check TCP Connection Alive
                            
                                Is it possible to resolve REST end points of a dependency JAR file in spring Boot
                            
                                Java Reactor - conditional stream execution
                            
                                Unexpected generic behavior with TypeToken nesting generic types
                            
                                Can Java write to / read from off heap memory that was freed?
                            
                                How to get the TypeTag for a class in Java
                            
                                Gradle for Java 11 with Modules
                            
                                Compare two lists of string using java stream
                            
                                When is the @Initialized(ApplicationScoped.class) event sent in CDI?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With