I want to export data to separate text files; I can do it with this hack:
for r in sqlContext.sql("SELECT DISTINCT FIPS FROM MY_DF").map(lambda r: r.FIPS).collect():
sqlContext.sql("SELECT * FROM MY_DF WHERE FIPS = '%s'" % r).rdd.saveAsTextFile('county_{}'.format(r))
What is the right way to do it with Spark 1.3.1/Python
dataframes? I want to do it in a single job as opposed to N (or N + 1) jobs.
May be:
saveAsTextFileByKey()
Spark in general does not have RDD operations with multiple outputs. But for writing files there is a nice trick: Write to multiple outputs by key Spark - one Spark job
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With