Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

How can I obtain the DAG of an Apache Spark job without running it?

scala apache-spark

Why is no map function for dataframe in pyspark while the spark equivalent has it?

apache-spark pyspark

How to set spark.driver.memory for Spark/Zeppelin on EMR

Is there a way to validate the syntax of raw spark sql query?

scala apache-spark

java.lang.UnsupportedOperationExceptionfieldIndex on a Row without schema is undefined: Exception on row.getAs[String]

scala apache-spark

How to select multiple columns of dataset, given a list of column names?

Spark decimal type precision loss

Comparison of a `float` to `np.nan` in Spark Dataframe

How do I get a spark dataframe to print it's explain plan to a string

How to find the max String length of a column in Spark using dataframe?

Spark: How to aggregate/reduce records based on time difference?

Reading Excel (.xlsx) file in pyspark

What is the optimal way to read from multiple Kafka topics and write to different sinks using Spark Structured Streaming?

Elasticsearch for spark 3.0

"'JavaPackage' object is not callable" error executing explain() in Pyspark 3.0.1 via Zeppelin

apache-spark pyspark

Workaround for Scala RDD not being covariant

Apache Spark ALS Recommendation Rating values higher than range

Spark: Counting co-occurrence - Algorithm for efficient multi-pass filtering of huge collections

Joining two spark dataframes on time (TimestampType) in python

write an RDD into HDFS in a spark-streaming context