Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle large strings in unit tests?

I've got a question about testing methods working on strings. Everytime, I write a new test on a method that has a string as a parameter.

Now, some issues come up:

  • How to include a test string with \n, \r, \t, umlauts etc?
  • How to set the encoding?
  • Should I use external files that are opened by a FileInputStream? (too much overhead, imho)

So... what are your approaches to solve this?

like image 976
guerda Avatar asked Jan 20 '09 15:01

guerda


4 Answers

  • If you have a lot of them, keep test strings in separate class with string consts
  • Try not to keep the files on disk unless you must. I agree with your claim - this brings too much overhead (not to mention what happens if you start getting I/O errors)
  • Make sure you test strings with different line breaks (\n, \r\n, \r\n\r) for different OSs
like image 115
Yuval Adam Avatar answered Nov 18 '22 20:11

Yuval Adam


How to include a test string with \n, \r, \t, umlauts etc?

Um... just type it the way you want? You can use \n, \r and \t, umlauts stc. in Java String literals; if you're worried about the encoding of the source code file, you can use Unicode escape sequences, and you can produce them using the native2ascii tool that comes with the JDK.

How to set the encoding?

Once you have a Java String, it's too late to worry about encodings - they use UTF-16, and any encoding problems occur when translating between Strings and byte arrays (unlike C, Java keeps these concepts clearly separate)

Edit: If your Strings are too big to be comfortably used in source code or you're really worried about the treatment of line breaks and white space, then keeping each String in a separate file is probably best; in that case, the encoding must be specified when reading the file (In the constructor of InputStreamReader)

like image 28
Michael Borgwardt Avatar answered Nov 18 '22 22:11

Michael Borgwardt


For LARGE strings, I would use files. The performance is plenty fast enough for unit tests. For that small trade-off, you:

  1. Don't have to worry about escaping characters
  2. Can diff the content in source control
  3. Can validate the documents independently (ie, xml/html)
like image 2
Chase Seibert Avatar answered Nov 18 '22 21:11

Chase Seibert


You could use a scripting language to code your tests.

JRuby and Groovy support HERE documents that make it easier to define a big string that spans multiple lines

# In JRuby
mystring = <<EOS
This is a long string that
spans multiple lines.
EOS

# In Groovy
def mystring = """This is a long string that
spans multiple lines."""

This will also make your test code more easy to write as both languages have a lot of shortcuts that help write simpler code (but some might say less robust which does not matter as much if it is only unit testing code).

like image 1
Michel Avatar answered Nov 18 '22 20:11

Michel