Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to test word count program if there is any uncovered bugs?

I just revisited the classic C textbook K&R. And read the exercise 1-11:

How would you test the word count program? What kinds of input are most likely to uncover bugs if there are any?

Actually, I only have a basic idea to manually count an existing paragraph to get the exact word numbers and compare it with the result word count program calculates.

Is there anything I've missed? And what is the trick of the test?

EDIT

Answers summary:

Semantic definition of word, some special cases:

  • link word: "cat-walk"
  • small word: a, b,c
  • biiiiiig words: "a fooooooooo<40MILLIONLETTERS>ooooooo a" has 3 words

boundary conditions:

  • Texts with multiple spaces between words.
  • Texts bigger than 2GB
  • Words which contain a dash but no whitespace.
  • Non-ascii words.
  • Files in some different encoding (if your program supports that)
  • Characters which are surrounded by whitespace but do not contain any word characters (e.g. "hello - world")
  • Texts without any words
  • Texts with all words on a single line
like image 814
xiao 啸 Avatar asked Apr 19 '11 13:04

xiao 啸


2 Answers

Well, it depends on what you semantically define as words. Since it is you who's writing the word count program, you are supposed to know what a word is.

So to test this program, you have to think where are the corner cases: does a "linked-word" count as one or two words? Do you consider "I'm" to be one or two? Etc..

As for the K&R exercise, I guess they voluntarily forgot some of these corner cases, and they suggest that you, analyzing their code, find these caveats.

like image 164
Gui13 Avatar answered Sep 28 '22 05:09

Gui13


Here are some examples of texts that could uncover bugs:

  • Texts with multiple spaces between words.
  • Texts bigger than 2GB
  • Words which contain a dash but no whitespace.
  • Non-ascii words.
  • Files in some different encoding (if your program supports that)
  • Characters which are surrounded by whitespace but do not contain any word characters (e.g. "hello - world")
  • Texts without any words
  • Texts with all words on a single line
like image 29
Sjoerd Avatar answered Sep 28 '22 05:09

Sjoerd