Python Spark Dataframes: Better way to export groups to text file

Question

I want to export data to separate text files; I can do it with this hack:

for r in sqlContext.sql("SELECT DISTINCT FIPS FROM MY_DF").map(lambda r: r.FIPS).collect():
    sqlContext.sql("SELECT * FROM MY_DF WHERE FIPS = '%s'" % r).rdd.saveAsTextFile('county_{}'.format(r))

What is the right way to do it with Spark 1.3.1/Python dataframes? I want to do it in a single job as opposed to N (or N + 1) jobs.

May be:

saveAsTextFileByKey()

Daniel Darabos · Accepted Answer

Spark in general does not have RDD operations with multiple outputs. But for writing files there is a nice trick: Write to multiple outputs by key Spark - one Spark job

Python Spark Dataframes: Better way to export groups to text file

Tags:

python

dataframe

apache-spark

bcollins

1 Answers

Daniel Darabos

Recent Activity

Donate For Us

Python Spark Dataframes: Better way to export groups to text file

Tags:

python

dataframe

apache-spark

bcollins

1 Answers

Daniel Darabos

Related questions

Recent Activity

Donate For Us