apache-spark tutorials and guides

Spark Error : executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

Nov 14, 2022

scala apache-spark

How does Pyspark Calculate Doc2Vec from word2vec word embeddings?

May 19, 2022

apache-spark nlp pyspark word2vec doc2vec

When to execute REFRESH TABLE my_table in spark?

Oct 26, 2022

apache-spark hive apache-spark-sql

Apache airflow - automation - how to run spark submit job with param

Sep 07, 2022

apache-spark airflow

PySpark.sql.filter not performing as it should

May 15, 2022

python-2.7 apache-spark pyspark apache-spark-sql spark-dataframe

ModuleNotFoundError in PySpark Worker on rdd.collect()

May 26, 2022

python apache-spark pyspark pyspark-sql

RuntimeError: Unsupported type in conversion to Arrow: VectorUDT

Jan 24, 2022

pandas apache-spark dataframe pyspark pyarrow

How to print the decision path / rules used to predict sample of a specific row in PySpark?

Sep 05, 2021

apache-spark pyspark apache-spark-ml

Table loaded through Spark not accessible in Hive

Dec 15, 2018

apache-spark hadoop hive pyspark hortonworks-data-platform

pyspark: Method isBarrier([]) does not exist

Mar 25, 2022

python apache-spark pyspark

PySpark error: AnalysisException: 'Cannot resolve column name

Oct 16, 2022

apache-spark exception pyspark

What problems can arise from a Spark non-deterministic Pandas UDF

Oct 23, 2022

python pandas apache-spark pyspark apache-spark-sql

attributeerror: 'AioClientCreator' object has no attribute '_register_lazy_block_unknown_fips_pseudo_regions'

Oct 04, 2022

python python-3.x amazon-web-services apache-spark amazon-s3

How to bundle many files in S3 using Spark

Jun 08, 2022

scala hadoop amazon-s3 apache-spark

Spark groupBy OutOfMemory woes

Jan 30, 2019

apache-spark

How to set the number of partitions for newAPIHadoopFile?

Nov 08, 2022

hadoop apache-spark

How to make Spark Streaming (Spark 1.0.0) read the latest data from Kafka (Kafka Broker 0.8.1)

Apr 04, 2022

apache-spark apache-kafka spark-streaming offset kafka-consumer-api

Cannot deploy local Spark job, worker fails with EndPointAssociationError

Jan 13, 2020

scala akka apache-spark

How to configure automatic restart of the application driver on Yarn

Oct 28, 2022

apache-spark hadoop-yarn spark-streaming

Derby version mismatch between Spark and Hive : Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

Nov 04, 2022

apache-spark apache-spark-sql

New posts in apache-spark