apache-spark tutorials and guides

How to cast an array of struct in a spark dataframe using selectExpr?

Mar 12, 2021

can't resolve ... given input columns

Sep 15, 2021

apache-spark pyspark apache-spark-sql

Spark DataFrame is Untyped vs DataFrame has schema?

Oct 19, 2022

apache-spark apache-spark-sql bigdata

Spark dataframe column naming conventions / restrictions

Feb 06, 2021

apache-spark hive pyspark naming-conventions amazon-athena

Extract and Visualize Model Trees from Sparklyr

Sep 01, 2021

r apache-spark random-forest decision-tree sparklyr

Spark - Reading partitioned data from S3 - how does partitioning happen?

Sep 11, 2022

apache-spark amazon-s3

How can I rename a PySpark dataframe column by index? (handle duplicated column names)

Jan 28, 2022

python apache-spark dataframe pyspark

Spark sampling options in JSON reader ignored?

Apr 14, 2022

apache-spark pyspark apache-spark-sql

Pyspark DataFrame: Split column with multiple values into rows

Jun 18, 2022

apache-spark pyspark apache-spark-sql pyspark-sql

Group days into weeks with totals PySpark

May 07, 2022

apache-spark apache-spark-sql pyspark-sql databricks

How to fix error on pyspark EMR Notebook - AnalysisException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Aug 26, 2022

apache-spark hadoop pyspark amazon-emr hive-metastore

How To Get Local Spark on AWS to Write to S3

Feb 09, 2022

apache-spark hadoop amazon-s3

TypeError: 'JavaPackage' object is not callable (spark._jvm)

Jun 13, 2021

java python apache-spark java-package geospark

Connecting to remote Dataproc master in SparkSession

Sep 16, 2022

apache-spark hadoop google-cloud-dataproc

PySpark 2.4.5: IllegalArgumentException when using PandasUDF

Oct 03, 2022

python pandas apache-spark pyspark pyarrow

How to programmatically get information about executors in PySpark

Jun 15, 2022

apache-spark pyspark

Python / Pyspark - Correct method chaining order rules

Sep 24, 2022

python apache-spark pyspark apache-spark-sql method-chaining

Using regexp to join two dataframes in spark

Jul 08, 2022

regex scala apache-spark

How to load json snappy compressed in HIVE

Jul 13, 2022

json apache-spark hadoop hive snappy

Unable to read images simultaneously [in parallels] using pyspark

Sep 14, 2022

apache-spark pyspark parallel-processing python-imaging-library

New posts in apache-spark