Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C program, problem with newlines & tabs next to each other

Tags:

c

Here was my original code:

#include <stdio.h>

#define IN  1   // inside a word
#define OUT 0   // outside a word

// program to print input one word per line

int main(void)
{
  int c, state;

  state = OUT;
  while ((c = getchar()) != EOF) {
    if (c == ' ' || c == '\n' || c == '\t') {
      state = OUT;
      printf("\n");
    }
    else if (state == OUT) {
      state = IN;
    }
    if (state == IN) {
      putchar(c);
    }
  }
  return 0;
}

But the problem was if there were multiple blanks (spaces) or multiple tabs next to each other a newline would be printed for both. So I used a variable (last) to keep track of where I was:

#include <stdio.h>

#define IN  1   // inside a word
#define OUT 0   // outside a word

// program to print input one word per line, corrected bug if there was
// more than one space between words to only print one \n

int main(void)
{
  int c, last, state;

  last = EOF;
  state = OUT;
  while ((c = getchar()) != EOF) {
    if (c == ' ' || c == '\n' || c == '\t') {
      if (last != c) {
        state = OUT;
        printf("\n");
      }
    }
    else if (state == OUT) {
      state = IN;
    }
    if (state == IN) {
      putchar(c);
    }
    last = c;
  }
  return 0;
}

That solved it, except now if there is [blank][tab] next to each other, a newline gets printed for both.

Could someone please help?

like image 855
Matt2012 Avatar asked Jan 21 '23 06:01

Matt2012


2 Answers

Your problem with your original code is that you will output your newline for every whitespace character. You only want to do it when transitioning from word to non-word:

Change:

if (c == ' ' || c == '\n' || c == '\t') {
    state = OUT;
    printf("\n");
}

to:

if (c == ' ' || c == '\n' || c == '\t') {
    if (state == IN) printf("\n");
    state = OUT;
}

In fact, what I originally thought I'd suggest would be an enumeration for the states along the lines of:

enum eState {IN, OUT};
:
enum eState state = OUT;

but, for a simple finite state machine with only two states, you can just use an boolean:

#include <stdio.h>

#define FALSE (1==0)
#define TRUE  (1==1)
// Or: enum eBoolean {FALSE = 0, TRUE = 1};

int main (void) {
    int ch;
    int inWord = FALSE;     // Or: enum eBoolean inWord = FALSE;

    // Process every character.
    while ((ch = getchar()) != EOF) {
        // Check for whitespace.
        if (ch == ' ' || ch == '\n' || ch == '\t') {
            // Check if transitioning nonwhite to white.
            if (inWord) {
                printf("\n");
            }

            // Mark white no matter what.
            inWord = FALSE;
        } else {
            // Mark non whitespace.
            inWord = TRUE;
        }

        // If not whitespace, output character.
        if (inWord) {
            putchar(ch);
        }
    }
    return 0;
}
like image 82
paxdiablo Avatar answered Jan 30 '23 08:01

paxdiablo


As paxdiablo said, your program is a typical finite state automata (FSA). You have to print a new line in transitions from state OUT to state IN and only then.

Below is how I would write such code. In this particular case it can be made simpler, but the structure is interesting because typical and it applies to any FSA. You have a big external switch with a case for each state. Inside each case, you get another one that materialize transitions, here transition event are input characters. All is left to do is think about what should be done for each transition. Also this structure is quite efficient.

You should keep it in mind, it's really a very common one to have in your toolkit of pre-thought program structures. I certainly do it.

#include <stdio.h>

#define IN  1   // inside a word
#define OUT 0   // outside a word

// program to print input one word per line

int main(void)
{
  int c, state;

  state = OUT;
  while ((c = getchar()) != EOF) {
    switch (state){
    case OUT:
        switch (c){
        case ' ': case '\n': case '\t':
        break;
        default:
            putchar(c);
            state = IN;
        }
    break;
    case IN:
        switch (c){
        case ' ': case '\n': case '\t':
            putchar('\n');
            state = OUT;
        break;
        default:
            putchar(c);
        }
    break;
    }        
  }
  return 0;
}
like image 23
kriss Avatar answered Jan 30 '23 06:01

kriss