Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Spark Dataset and java.sql.Date

Spark pulling data into RDD or dataframe or dataset

Pyspark simple re-partition and toPandas() fails to finish on just 600,000+ rows

Spark error: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

scala apache-spark

Spark is inventing his own AWS secretKey

Yarn slave nodes are not communicating with master node?

Project_Bank.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [110, 111, 13, 10]

Is there any way to get the output of Spark's Dataset.show() method as a string?

How to pivot streaming dataset?

UDF cause warning: CachedKafkaConsumer is not running in UninterruptibleThread (KAFKA-1894)

How can I force spark/hadoop to ignore the .gz extension on a file and read it as uncompressed plain text?

scala hadoop apache-spark gzip

pyspark equivalence of `df.loc`?

Calling a rest service from Spark

scala apache-spark rest

Does Spark support BigInteger type?

Failed to execute user defined function($anonfun$9: (string) => double) on using String Indexer for multiple columns

Spark: Prevent shuffle/exchange when joining two identically partitioned dataframes

How to set hive.metastore.warehouse.dir in HiveContext?

Spark SQL grouping: Add to group by or wrap in first() if you don't care which value you get.;

sql group-by apache-spark udf

How to extract rules from decision tree spark MLlib

Custom log4j appender in spark executor

apache-spark log4j