How Spark read file with underline the beginning of the file name?

Tags:

scala

apache-spark

When I use Spark to parse log files, I notice that if the first character of filename is _ , the result will be empty. Here is my test code:

SparkSession spark = SparkSession
  .builder()
  .appName("TestLog")
  .master("local")
  .getOrCreate();
JavaRDD<String> input = spark.read().text("D:\\_event_2.log").javaRDD();
System.out.println("size : " + input.count());

If I modify the file name to event_2.log, the code will run it correctly. I found that the text function is defined as:

@scala.annotation.varargs
def text(paths: String*): Dataset[String] = {
  format("text").load(paths : _*).as[String](sparkSession.implicits.newStringEncoder)
}

I think it could be due to _ being scala's placeholder. How can I avoid this problem?

785

asked Jul 20 '16 09:07

iameven

1 Answers

This has nothing to do with Scala. Spark uses Hadoop Input API to read file, which ignore every file that starts with underscore(_) or dot (.)

I don't know how to disable this in Spark though.

102

answered Oct 03 '22 02:10

Kien Truong

Related questions
                            
                                Scala case class with function parameters
                            
                                Play Framework: File uploads - blocking or non-blocking?
                            
                                Scalaz Kleisli usage benefits
                            
                                How to cast a variable to certain runtime type got from TypeCast in Scala
                            
                                Apache Commons Unzip method?
                            
                                Structural Sharing in Scala Vector
                            
                                Why Spark doesn't allow map-side combining with array keys?
                            
                                How can one list all csv files in an HDFS location within the Spark Scala shell?
                            
                                How do I increment a UUID gatling feeder
                            
                                How to use countDistinct in Scala with Spark?
                            
                                Scala Play file upload: Cannot write an instance of views.html.uploadFile.type to HTTP response
                            
                                Exhaustiveness check for pattern matching in Scala 2.11
                            
                                What is the ? type?
                            
                                How to make gradle / intellij / play framework work together?
                            
                                Converting Map type in Case Class to StructField Type
                            
                                akka-stream - How to treat the last element of a stream differently in a Flow/Graph
                            
                                Moving Spark DataFrame from Python to Scala whithn Zeppelin
                            
                                @volatile usage unclear - sending an object with a `var` to another thread
                            
                                VectorAssembler does not support the StringType type scala spark convert
                            
                                How to use the free monad with Future[M[_]]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With