apache-spark tutorials and guides

run pyspark locally

Jun 17, 2022

python apache-spark pyspark

Python: How to convert Pyspark column to date type if there are null values

Nov 23, 2019

python date apache-spark null pyspark

How to use spark quantilediscretizer on multiple columns

Oct 23, 2022

scala dictionary apache-spark pipeline quantile

PySpark sampleBy using multiple columns

Oct 06, 2019

python python-2.7 apache-spark pyspark

How to interpret probability column in spark logistic regression prediction?

May 15, 2022

apache-spark machine-learning apache-spark-sql logistic-regression apache-spark-ml

How to specify the location of custom log4j.configuration when spark-submit to Amazon EMR?

Oct 27, 2022

java apache-spark log4j amazon-emr

Unbounded table is spark structured streaming

Aug 27, 2022

scala apache-spark spark-structured-streaming

Visualizing topics with Spark LDA

Aug 12, 2020

apache-spark lda apache-spark-ml

R - How to replicate rows in a spark dataframe using sparklyr

Aug 21, 2022

r apache-spark sparklyr

Scala - How to split the probability column (column of vectors) that we obtain when we fit the GMM model to the data in to two separate columns? [duplicate]

Aug 31, 2022

scala apache-spark apache-spark-sql apache-spark-mllib

How does Spark SQL read compressed csv files?

Sep 14, 2022

csv apache-spark apache-spark-sql

S3A: fails while S3: works in Spark EMR

Nov 17, 2022

amazon-web-services apache-spark amazon-s3

with pyspark.sql.functions unix_timestamp get null

May 03, 2022

python apache-spark pyspark unix-timestamp

Streaming data store in hive using spark

Nov 09, 2022

scala hadoop apache-spark hive spark-streaming

How can I include additional jars when starting a Google DataProc cluster to use with Jupyter notebooks?

Nov 01, 2022

apache-spark jupyter-notebook google-cloud-dataproc

reuse the result of a select expression in the "GROUP BY" clause?

Apr 05, 2021

mysql scala apache-spark apache-spark-sql spark-dataframe

Spark DataFrame operators (nunique, multiplication)

Sep 24, 2021

python apache-spark pyspark spark-dataframe

Is it possible to print definition of a function in Scala

May 30, 2022

scala oop apache-spark user-defined-functions scala-collections

read/write dynamo db from apache spark [closed]

Nov 09, 2022

apache-spark amazon-dynamodb

java.lang.IllegalArgumentException: Invalid lambda deserialization

Dec 05, 2019

java apache-spark apache-kafka spark-streaming

New posts in apache-spark