Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Spark SQL: How to append new row to dataframe table (from another table)

How to save a partitioned parquet file in Spark 2.1?

How do I delete files in hdfs directory after reading it using scala?

File already exists error writing new files from dataframe

apache-spark emr

Kafka Structured Streaming KafkaSourceProvider could not be instantiated

How to get rid of "Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties" message?

log4j apache-spark

Is there a way to filter a field not containing something in a spark dataframe using scala?

Spark SQL change format of the number

key not found: _PYSPARK_DRIVER_CALLBACK_HOST

python apache-spark pyspark

Error while using Hive context in spark : object hive is not a member of package org.apache.spark.sql

Scala/Spark version compatibility

scala apache-spark

Selecting only numeric/string columns names from a Spark DF in pyspark

How to allocate more executors per worker in Standalone cluster mode?

apache-spark

PySpark - Adding a Column from a list of values using a UDF

spark partition data writing by timestamp

Invalid Spark URL in local spark session

apache-spark

UnsatisfiedLinkError: no snappyjava in java.library.path when running Spark MLLib Unit test within Intellij

How can I efficiently read multiple json files into a Dataframe or JavaRDD?

java json apache-spark

spark error RDD type not found when creating RDD

What is the best way to define custom methods on a DataFrame?