I am trying to execute this code in Azure HdInsigth. I have a cluster Spark that is connected with Data Lake Storage.
spark.conf.set(
"fs.azure.sas.data.spmdevsharedstorage.blob.core.windows.net",
"xxxxxxxxxxx key xxxxxxxxxxx"
)
val shared_data = "wasbs://[email protected]/"
//Read Csv
val dfCsv = spark.read.option("inferSchema", "true").option("header", true).csv(shared_data + "/test/4G-pixel.csv")
val dfCsv_final_withcolumn = dfCsv.select($"latitude",$"longitude")
val dfCsv_final = dfCsv_final_withcolumn.withColumn("new_latitude",col("latitude")*100)
//write
dfCsv_final.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").mode("overwrite").save(shared_data + "/test/4G-pixel_edit.csv")
The code reads the csv file well. So, when write the new file csv I see the following error:
20/04/03 14:58:12 ERROR AzureNativeFileSystemStore: Encountered Storage Exception for delete on Blob: https://spmdevsharedstorage.blob.core.windows.net/data/test/4G-pixel_edit.csv/_temporary/0, Exception Details: This operation is not permitted on a non-empty directory. Error Code: DirectoryIsNotEmpty
org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: This operation is not permitted on a non-empty directory.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.delete(AzureNativeFileSystemStore.java:2627)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.delete(AzureNativeFileSystemStore.java:2637)
The new file csv is written to the Data Lake but the code stops. I need you to not see this error. How can I fix it?
I faced a similar issue.
I resolved it by using the below configuration.. set this to true.
--conf spark.hadoop.mapreduce.fileoutputcommitter.cleanup.skipped=true
or
spark.conf.set("spark.hadoop.mapreduce.fileoutputcommitter.cleanup.skipped","true")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With