Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Spark SQL change format of the number

key not found: _PYSPARK_DRIVER_CALLBACK_HOST

python apache-spark pyspark

Error while using Hive context in spark : object hive is not a member of package org.apache.spark.sql

Scala/Spark version compatibility

scala apache-spark

Selecting only numeric/string columns names from a Spark DF in pyspark

How to allocate more executors per worker in Standalone cluster mode?

apache-spark

PySpark - Adding a Column from a list of values using a UDF

spark partition data writing by timestamp

Invalid Spark URL in local spark session

apache-spark

UnsatisfiedLinkError: no snappyjava in java.library.path when running Spark MLLib Unit test within Intellij

How can I efficiently read multiple json files into a Dataframe or JavaRDD?

java json apache-spark

spark error RDD type not found when creating RDD

What is the best way to define custom methods on a DataFrame?

java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession

java apache-spark

Apply same function to all fields of spark dataframe row

Pyspark: Replacing value in a column by searching a dictionary

pyspark and HDFS commands

Making histogram with Spark DataFrame column

Keep only duplicates from a DataFrame regarding some field

how to cast all columns of dataframe to string