In Flink, how to write DataStream to single file?

Tags:

apache-flink

The writeAsText or writeAsCsv methods of a DataStream write as many files as worker threads. As far as I could see, the methods only let you specify the path to these files and some formatting.

For debugging and testing purposes, it would be really useful to be able to print everything to a single file, without having to change the set up to having a single worker thread.

Is there any non-overly-complicated way to achieve this? I suspect it should be possible implementing a custom SinkFunction, but not sure about that one (besides, it also feels like a hassle for something that seems relatively simple).

784

asked Aug 16 '16 14:08

houcros

2 Answers

You can achieve this by setting the parallelism to 1. This way, the writing happens only on one machine.

writeAsText(path).setParallelism(1);

195

answered Sep 27 '22 18:09

Robert Metzger

In Flink 1.13 this is not done with writeAsText function anymore, as it's deprecated.

As can be seen here now StreamingFileSink class and addSink operation should be used. Regarding setting the parallelism to 1, this is also done differently (by setting the StreamExecutionEnvironment parallelism to 1, with setParallelism method)

val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)

val sink: StreamingFileSink[String] = StreamingFileSink
  .forRowFormat(new Path(outPath), new SimpleStringEncoder[String]("UTF-8"))
  .build()

dataStream.map(_.toString).addSink(sink)

answered Sep 27 '22 18:09

eseuteo

Related questions
                            
                                Does Scala have a function application operator?
                            
                                "return this" in a covariant trait that return actual type
                            
                                Scala Puzzle: enforcing that two function arguments are of the same type AND both are a subtype of a given class
                            
                                If data fits on a single machine does it make sense to use Spark?
                            
                                Typesafe Activator NoSuchMethodError
                            
                                Scala : Way to use a function directly into the println(...) using string interpolation
                            
                                What is the proper way to return from an exception in Scala?
                            
                                When does it make sense to use implicit parameters in Scala, and what may be alternative scala idioms to consider?
                            
                                How to turn off Scala auto-completion of function with Unit return type in IntelliJ IDEA?
                            
                                Efficient way to check if a traversable has more than 1 element in Scala
                            
                                Spark toDebugString not nice in python
                            
                                How to return an option when reading a vector
                            
                                What does the double underscore in Scala imports mean?
                            
                                Akka: what is the reason of processing messages one at a time in an Actor?
                            
                                The main function in OCaml
                            
                                Generate return type signature in Scala with Intellij Idea
                            
                                Access key from mapValues or flatMapValues?
                            
                                Specs2: how to test a class with more than one injected dependency?
                            
                                Define a trait to be extended by case class in scala
                            
                                Pass scala function as java functional interface argument

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With