Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Test csv files equality with random line order (Junit)

I'm developing a project with Apache Flink and I'm using junit to test my operators.

However I'm facing an issue: because of the parallelism, flink will write its output csv file with a "random" line order, thus I cannot easily assert that the output file is equal to an expected output file with Junit.

Performance is not an issue since we are talking about small files (<100 lines) and only for tests.

Is there an easy solution?

like image 234
Ben Avatar asked Apr 08 '15 12:04

Ben


1 Answers

You can check your program in two stages:

  1. Test your individual function in isolation, e.g., a MapFunction. Here you check only your own code and the output should be deterministic (given that your function is deterministic).

  2. Test the full program. Here your code will be executed by Flink and the order of the result is not deterministic (unless you sort it). In Flink, we have some utility classes to test full programs (mainly used to run our own integration tests). These classes bring up a small, local Flink instance, run the tests, and compare it to an expected result (sorted or unordered). Check out the MultipleProgramsTestBase and how it is used for example in the DegreesITCase. You can use the MultipleProgramsTestBase by including the flink-test-utils Maven dependency. Depending on the Flink version you are using, things might look a bit different from the current master. Drop a comment here or ping the Flink user mailing list if you have questions.

like image 129
Fabian Hueske Avatar answered Nov 12 '22 03:11

Fabian Hueske