Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

How to process logs from distributed log broker (Eg Kafka) exactly after 1 week?

spark-nlp : DocumentAssembler initializing failing with 'java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class'

Why is Pandas UDF not being parallelized?

Get difference between two version of delta lake table

Spark Structured Streaming program that reads from non-empty Kafka topic (starting from earliest) triggers batches locally, but not on EMR cluster

saveAsTextFile to s3 on spark does not work, just hangs

amazon-s3 apache-spark

Apache Spark Native Libraries

Drawbacks of Spark Streaming in Comparison With Real Streaming Computing Systems

Multipart uploads to Amazon S3 from Apache Spark

How can I make Spark Streaming count the words in a file in a unit test?

How do I use infinite Scala streams as source in Spark Streaming?

Spark MLLib Collaborative Filtering with new user

Unable to add a new service with Cloudera Manager within Cloudera Quickstart VM 5.3.0

How does partitions map to tasks in Spark?

apache-spark rdd

Spark 1.3.1: cannot read file from S3 bucket, org/jets3t/service/ServiceException

Apache Spark-Kafka.TaskCompletionListenerException & KafkaRDD$KafkaRDDIterator.close NPE on local cluster(Client Mode)

parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file

java hadoop apache-spark hive

Why does format("kafka") fail with "Failed to find data source: kafka." (even with uber-jar)?

DataFrame error: "overloaded method value filter with alternatives"

ERROR Utils: Uncaught exception in thread SparkListenerBus

scala apache-spark