I am trying to read messages from sqs using spark streaming using below code
import org.apache.spark.sql.streaming._
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
val df = spark.readStream.format("s3-sqs").option("queueUrl", "https://sqs.us-east-1.amazonaws.com/XXXX").option("region","us-east-1").option("awsAccessKey","xxxxx").option("fileFormat", "json").option("sqsFetchInterval", "1m") .load()
spark2-shell --jars /jars_aws/hadoop-aws-2.7.3.jar,/jars_aws/aws-java-sdk-1.11.582.jar,/jars_aws/aws-java-sdk-s3-1.11.584.jar,/jars_aws/aws-java-sdk-sqs-1.11.584.jar
I am getting below Exception Saying ClassNotFound Exception
java.lang.ClassNotFoundException: Failed to find data source: s3-sqs. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:159)
... 53 elided
Caused by: java.lang.ClassNotFoundException: s3-sqs.DefaultSource
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)
... 54 more
Please help
Added required jars
That errors says that no jar in --jars has the required classes for s3-sqs data source.
After a bit of googling and reading Optimized S3 File Source with SQS (that seems the official documentation) I think s3-sqs data source (aka Databricks S3-SQS connector) is part of Databricks Runtime (DBR) and Databricks-specific.
In other words, I think the connector is only available in Databricks notebooks and there seems no way to use it outside.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With