apache-spark tutorials and guides

Mongo Spark connector and mongo 3.2, root user cannot read database

Feb 08, 2023

mongodb apache-spark

PySpark PCA: how to convert dataframe rows from multiple columns to a single column DenseVector?

Feb 08, 2023

apache-spark pyspark apache-spark-mllib pca apache-spark-ml

RDD to DataFrame in pyspark (columns from rdd's first element)

Feb 07, 2023

python-2.7 apache-spark pyspark rdd pyspark-sql

Check equality for two Spark DataFrames in Scala

Feb 08, 2023

scala unit-testing apache-spark spark-dataframe

Why sortBy() cannot sort the data evenly in Spark?

Feb 08, 2023

python apache-spark pyspark rdd

convert string data in dataframe into double

Feb 08, 2023

scala apache-spark apache-spark-sql

RestAPI service call from Spark Streaming

Feb 07, 2023

scala rest apache-spark spark-streaming

How to create a schema from CSV file and persist/save that schema to a file?

Feb 07, 2023

scala apache-spark schema

How to convert all column of dataframe to numeric spark scala?

Feb 07, 2023

scala apache-spark apache-spark-sql

Starting Ipython with Spark 2

Feb 07, 2023

apache-spark ipython

Can pyspark.sql.function be used in udf?

Feb 07, 2023

python sql apache-spark pyspark user-defined-functions

Is Apache Zeppelin stable enough to be used in Production

Feb 06, 2023

apache-spark production amazon-emr apache-zeppelin

Scala Spark : Difference in the results returned by df.stat.sampleBy()

Feb 07, 2023

scala apache-spark

Scala-Spark(version1.5.2) Dataframes split error

Feb 07, 2023

scala apache-spark spark-dataframe

How to retrieve yarn's logs programmatically using java

Feb 06, 2023

java hadoop apache-spark hadoop-yarn

How to filter Spark dataframe by array column containing any of the values of some other dataframe/set

Feb 07, 2023

apache-spark apache-spark-sql

how can I keep partition'number not change when I use window.partitionBy() function with spark/scala?

Feb 06, 2023

scala apache-spark apache-spark-sql

Access to WrappedArray elements

Feb 05, 2023

python scala apache-spark pyspark

What is the main cause of "self-suppression not permitted" in Spark?

Feb 06, 2023

apache-spark hdfs

Spark Scala : Getting Cumulative Sum (Running Total) Using Analytical Functions

Feb 06, 2023

sql scala apache-spark apache-spark-sql window-functions

New posts in apache-spark