I have below code to read xml
Dataset<Row> dataset1 = SparkConfigXMLProcessor.sparkSession.read().format("com.databricks.spark.xml")
.option("rowTag", properties.get(EventHubConsumerConstants.IG_ORDER_TAG).toString())
.load(properties.get("C:\\inputOrders.xml").toString());
one of the column value getting new line character. i want to replace it with some character or just want to remove it. Please help
dataset1.withColumn("menuitemname_clean", regexp_replace(col("menuitemname"), "[\n\r]", " "))
Above code will work
This is what I used. I usually add a tab (\t), too. Having both \r and \n will find UNIX (\n), Windows (\r), and OSX (\r) newlines.
Dataset<Row> newDF = dataset1.withColumn("menuitemname", regexp_replace(col("menuitemname"), "\n|\r", ""));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With