apache-spark tutorials and guides

Apache Spark: ERROR local class incompatible when initiating a SparkContext class

Apr 13, 2020

Saving / exporting transformed DataFrame back to JDBC / MySQL

Apr 11, 2022

apache-spark apache-spark-sql apache-spark-1.5

Basic linear algebra on spark matrices

Jun 18, 2022

python matrix apache-spark

Connecting/Integrating Cassandra with Spark (pyspark)

Oct 14, 2021

cassandra apache-spark pyspark

How to know when to repartition/coalesce RDD with unbalanced partitions (without shuffling possibly)?

May 19, 2022

apache-spark

Error from python worker: /bin/python: No module named pyspark

Mar 11, 2022

python apache-spark ipython ipython-notebook pyspark

Spark - Difference between sortBy and sortByKey

Jun 09, 2022

apache-spark

Connecting IPython notebook to spark master running in different machines

Mar 02, 2021

apache-spark ipython kubernetes google-kubernetes-engine google-cloud-dataproc

Spark - How can get the Logical / Physical Query execution using - Thirft - Hive Interactor

Jan 30, 2022

apache-spark apache-spark-sql spark-dataframe

Spark DataFrame not respecting schema and considering everything as String

Jul 12, 2020

scala apache-spark apache-spark-sql apache-spark-mllib scala-collections

Spark Is there any rule of thumb about the optimal number of partition of a RDD and its number of elements?

Oct 01, 2022

apache-spark apache-spark-sql partitioning

Spark sql top n per group

Apr 22, 2022

apache-spark group-by apache-spark-sql top-n

org.apache.thrift.transport.TTransportException error while Reading large JSON file in zeppelin scala

Aug 18, 2021

json scala apache-spark apache-zeppelin

How to split column of vectors into two columns?

Mar 25, 2022

apache-spark pyspark apache-spark-ml

Running steps of EMR in parallel

Oct 15, 2022

web-services amazon-web-services apache-spark amazon-emr

How Spark handle data larger than cluster memory

Mar 08, 2022

apache-spark

Dropping nested column of Dataframe with PySpark

Jul 13, 2022

apache-spark dataframe pyspark struct schema

Best practice to create SparkSession object in Scala to use both in unittest and spark-submit

Aug 31, 2022

scala apache-spark spark-submit

Add months to date column in Spark dataframe

Nov 06, 2022

python apache-spark pyspark apache-spark-sql

What does "pre-built for Apache Hadoop 2.7 and later" mean?

Oct 29, 2022

apache-spark

New posts in apache-spark