I would like to read in a file with the following structure with Apache Spark.
628344092\t20070220\t200702\t2007\t2007.1370
The delimiter is \t. How can I implement this while using spark.read.csv()?
The csv is much too big to use pandas because it takes ages to read this file. Is there some way which works similar to
pandas.read_csv(file, sep = '\t')
Thanks a lot!
Use spark. read. option("delimiter", "\t"). csv(file) or sep instead of delimiter .
To enable spark to consider the "||" as a delimiter, we need to specify "sep" as "||" explicitly in the option() while reading the file. Spark infers "," as the default delimiter.
Use spark.read.option("delimiter", "\t").csv(file)
or sep
instead of delimiter
.
If it's literally \t
, not tab special character, use double \
: spark.read.option("delimiter", "\\t").csv(file)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With