Spark version: 1.6.1, I use pyspark API.
DataFrame: df, which has two colume.
I have tried:
1: df.write.format('csv').save("hdfs://path/bdt_sum_vol.csv")
2: df.write.save('hdfs://path/bdt_sum_vol.csv', format='csv', mode='append')
3: df.coalesce(1).write.format('com.databricks.spark.csv').options(header='true').save('hdfs://path/')
4: df.write.format('com.databricks.spark.csv').save('hdfs://path/df.csv')
(All above didn't work, Failed to find data source)
or:
def toCSVLine(data):
return ','.join(str(d) for d in data)
lines = df.rdd.map(toCSVLine)
lines.saveAsTextFile('hdfs://path/df.csv')
(Permission denied)
Q:
1, How to solve "Failed to find data source"?
2, I used sudo to make the dictionary "/path" on hdfs, if I turn the dataframe to rdd, how to write the rdd to csv on hdfs?
Thanks a lot!
You could try to change ".save" to ".csv":
df.coalesce(1).write.mode('overwrite').option('header','true').csv('hdfs://path/df.csv')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With