apache-spark tutorials and guides

How to bucketize a group of columns in pyspark?

Jun 29, 2022

python apache-spark pyspark

ERROR : User did not initialize spark context

May 06, 2022

apache-spark hadoop

Why does Spark's Word2Vec return a vector?

Jul 15, 2022

java apache-spark machine-learning word2vec apache-spark-ml

Set spark configuration

Feb 27, 2022

python-3.x apache-spark pyspark apache-spark-sql

PySpark explode stringified array of dictionaries into rows

Sep 25, 2022

python apache-spark dataframe pyspark apache-spark-sql

Convert UTC timestamp to local time based on time zone in PySpark

Oct 25, 2022

apache-spark pyspark apache-spark-sql

Delta Lake without Databricks Runtime

Sep 18, 2022

apache-spark hdfs databricks delta-lake

Stream-Static Join: How to refresh (unpersist/persist) static Dataframe periodically

Sep 25, 2021

scala apache-spark apache-spark-sql spark-streaming spark-structured-streaming

API compatibility between scala and python?

Jul 17, 2022

apache-spark pyspark

Spark fail when running pi.py example with yarn-client mode

May 23, 2022

apache-spark

Spark-csv data source: infer data types

Oct 25, 2022

apache-spark dataframe

Aggregation with Group By date in Spark SQL

Oct 30, 2022

sql group-by apache-spark aggregation

Convert Matrix to RowMatrix in Apache Spark using Scala

May 14, 2017

scala matrix apache-spark distributed

How to load data from saved file with Spark

Apr 06, 2022

apache-spark rdd

org.apache.spark.SparkException: Task not serializable - JavaSparkContext

Oct 09, 2021

java serialization apache-spark

Spark DataFrame created from JavaRDD<Row> copies all columns data into first column

Sep 13, 2022

apache-spark apache-spark-sql

"unbound method textFile() must be called with SparkContext instance as first argument (got str instance instead)"

Apr 03, 2022

python apache-spark pyspark

How to use spark Naive Bayes classifier for text classification with IDF?

Nov 14, 2022

python apache-spark tf-idf text-classification apache-spark-mllib

How to avoid "Not a file" exceptions when reading from HDFS with spark

Apr 29, 2022

apache-spark hdfs emr s3distcp

Understanding closures and parallelism in Spark

Apr 12, 2022

scala hadoop apache-spark

New posts in apache-spark