Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save data frame in ".txt" file using pyspark

I have a dataframe with 1000+ columns. I need to save this dataframe as .txt file(not as .csv) with no header,mode should be "append"

used below command which is not working

df.coalesce(1).write.format("text").option("header", "false").mode("append").save("<path>")

error i got

pyspark.sql.utils.AnalysisException: 'Text data source supports only a single column,

Note: Should not use RDD to save. Becouse i need to save files multiple times in the same path.

like image 342
Alice Avatar asked Oct 19 '25 05:10

Alice


1 Answers

If you want to write out a text file for a multi column dataframe, you will have to concatenate the columns yourself. In the example below I am separating the different column values with a space and replacing null values with a *:

import pyspark.sql.functions as F

df = sqlContext.createDataFrame([("foo", "bar"), ("baz", None)], 
                            ('a', 'b'))

def myConcat(*cols):
    concat_columns = []
    for c in cols[:-1]:
        concat_columns.append(F.coalesce(c, F.lit("*")))
        concat_columns.append(F.lit(" "))  
    concat_columns.append(F.coalesce(cols[-1], F.lit("*")))
    return F.concat(*concat_columns)

df_text = df.withColumn("combined", myConcat(*df.columns)).select("combined")

df_text.show()

df_text.coalesce(1).write.format("text").option("header", "false").mode("append").save("output.txt")

This gives as output:

+--------+
|combined|
+--------+
| foo bar|
|   baz *|
+--------+

And your output file should look likes this

foo bar
baz *
like image 167
Alex Avatar answered Oct 22 '25 03:10

Alex



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!