I am trying to produce a formatted CSV file from pipe("|") delimited file using Apache Spark . input file contains:
apple|ball|cat
Blacktown| Bela vista| Greenacre
x|y|z
I am trying with:
val name= sc.textFile(input.txt")
val split=name.map(line=>line.split("|")).map( x => (x(0),x(2)) )
split.foreach(println)
Output:
(x,y)
(a,p)
(B,a)
My required output is:
(apple,cat)
(Blacktown, Greenacre)
(x,z)
A String
argument for split
function is a regular expression so if you want to use pipe it has to be escaped:
line.split("\\|")
otherwise it is interpreted as an alternation between two empty patterns.
You can also use variant which accepts Character
literal:
line.split('|')
or an Array
of Character
literals:
line.split(Array('|'))
It is also better to validate the input:
names.map(_.split("\\|")).collect {
case Array(x, _, y) => (x, y)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With