Questions
Linux
Laravel
Mysql
Ubuntu
Git
Menu
HTML
CSS
JAVASCRIPT
SQL
PYTHON
PHP
BOOTSTRAP
JAVA
JQUERY
R
React
Kotlin
×
Linux
Laravel
Mysql
Ubuntu
Git
New posts in apache-spark
Optimize Spark job that has to calculate each to each entry similarity and output top N similar items for each
Mar 25, 2022
scala
apache-spark
cross-join
Error when converting from spark dataframe with dates to pandas dataframe
Feb 19, 2022
pandas
apache-spark
dataframe
pyspark
Use spark-submit to submit a application to EC2 cluster
May 05, 2022
amazon-ec2
apache-spark
Spark with Cassandra input/output
Nov 17, 2022
java
cassandra
apache-spark
spring-data-cassandra
Increase memory available to Spark shell
Jul 14, 2017
scala
apache-spark
How to transform a categorical variable in Spark into a set of columns coded as {0,1}?
Sep 19, 2022
scala
apache-spark
bigdata
apache-spark-mllib
categorical-data
Geoip2's python library doesn't work in pySpark's map function
Oct 21, 2022
python
apache-spark
pyspark
geoip
Spark ml and PMML export
May 30, 2020
java
apache-spark
linear-regression
pmml
Why are Spark Parquet files for an aggregate larger than the original?
Oct 01, 2022
apache-spark
storage
aggregation
parquet
How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)
Dec 01, 2021
apache-spark
apache-spark-sql
Missing hive-site when using spark-submit YARN cluster mode
May 14, 2022
apache-spark
hive
hortonworks-data-platform
spark-hive
AWS connection timeout when running Spark job on EMR
Oct 31, 2022
hadoop
apache-spark
amazon-s3
apache-spark-sql
emr
Spark - how to get top N of rdd as a new rdd (without collecting at the driver)
Aug 31, 2022
scala
apache-spark
rdd
Apache Livy doesn't work with local jar file
Apr 05, 2020
scala
apache-spark
livy
RDD CountApproximate taking far longer than requested timeout
Apr 19, 2022
scala
apache-spark
Limit kafka batch size when using Spark Structured Streaming
Apr 26, 2022
scala
apache-spark
apache-kafka
spark-streaming
spark-structured-streaming
RDD filter in scala spark
Nov 06, 2022
scala
apache-spark
pySpark Create DataFrame from RDD with Key/Value
Nov 17, 2022
apache-spark
pyspark
Spark streaming data sharing between batches
Nov 13, 2022
apache-spark
spark-streaming
A list as a key for PySpark's reduceByKey
Oct 17, 2018
python
apache-spark
rdd
pyspark
« Newer Entries
Older Entries »