apache-spark tutorials and guides

Why can't I merge multiple parquet files using "cat file1.parquet file2. parquet > result.parquet"?

Oct 28, 2025

apache-spark pyspark parquet

How to union two dataframes which have same number of columns?

Oct 28, 2025

dataframe apache-spark apache-spark-sql spark-java

Count distinct values with conditions

Oct 28, 2025

apache-spark pyspark apache-spark-sql count distinct

How many executor processes run for each worker node in spark?

Oct 26, 2025

apache-spark apache-spark-standalone

How to have idempotent guarantee when writing spark dataset to hdfs?

Oct 28, 2025

apache-spark hdfs idempotent

Possible to handle multi character delimiter in spark [duplicate]

Oct 26, 2025

scala apache-spark databricks

Spark off heap memory expanding with caching

Oct 27, 2025

apache-spark pyspark

Using Scala classes as UDF with pyspark

Oct 28, 2025

scala apache-spark pyspark apache-spark-sql user-defined-functions

CSV data source does not support null data type in pyspark [duplicate]

Oct 28, 2025

python dataframe apache-spark pyspark

How to get the name of a Spark Column as String?

Oct 25, 2025

scala apache-spark

Spark Cummulative Processing on single log file

Oct 27, 2025

apache-spark spark-streaming

remove last character from string

Oct 26, 2025

apache-spark pyspark apache-spark-sql

Spark CSV package not able to handle \n within fields

Oct 25, 2025

scala apache-spark apache-spark-sql spark-csv apache-spark-1.6

Databricks - Failure Starting REPL

Oct 26, 2025

python apache-spark pyspark cluster-analysis databricks

KernelRestarter: restart failed in jupyter , Kernel died

Oct 27, 2025

python apache-spark kernel jupyter-notebook jupyter

Spark sql group by and sum changing column name?

Oct 27, 2025

scala apache-spark

Difference between spark standalone and local mode?

Oct 27, 2025

apache-spark cluster-computing mode

Create sparse RDD from scipy sparse matrix

Oct 27, 2025

python numpy apache-spark scipy pyspark

PySpark to Azure SQL Database connection issue

Oct 26, 2025

python apache-spark pyspark azure-active-directory azure-sql-database

Casting string to int null issue

Oct 27, 2025

apache-spark pyspark

New posts in apache-spark