Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all comments from a C program - any possible improvements to this code?

Tags:

c

I'm learning C from the K&R book and for exercise 1.23 in the first chapter, I have to write a program that removes all comments given some C code that the user inputs. This is my completed program so far. Are there any improvements I can make to it?

/**
 Tuesday, 10/07/2013

 Exercise 1.23
 Write a program to remove all comments from a C 
 program. Don't forget to handle quoted strings 
 and character constants properly. C comments   
 don't nest.
**/

#include <stdio.h>
#define MAX_LENGTH 65536
#define NOT_IN_COMMENT 0
#define SINGLE_COMMENT 1
#define MULTI_COMMENT  2

main()
{
    char code[MAX_LENGTH];        /* Buffer that stores the inputted code */
    int size = 0;                 /* Length of the inputted code */
    int loop;                     /* Integer used for the for loop */
    char c;                       /* Character to input into */
    int status = NOT_IN_COMMENT;  /* Are we in a comment? What type? */
    int in_string = 0;            /* Are we inside of a string constant? */
    char last_character;          /* Value of the last character */


    /* Input all code into the buffer until escape sequence pressed */
    while ((c = getchar()) != EOF)
        code[size++] = c; 
    code[size] = '\0'; 


    /* Remove all comments from the code and display results to user */
    for (loop = 0; loop < size; loop++) {
        char current = code[loop]; 

        if (in_string) {
            if (current == '"') in_string = 0; 
            putchar(current);
        }

        else {
            if (status == NOT_IN_COMMENT) {
                if (current == '"') {
                    putchar(current);
                    in_string = 1; 
                    continue; 
                }

                if (current == '/' && last_character == '/') status = SINGLE_COMMENT;
                else if (current == '*' && last_character == '/') status = MULTI_COMMENT; 
                else if (current != '/' || (current == '/' && loop < size-1 && !(code[loop+1] == '/' || code[loop+1] == '*'))) putchar(current); 
            }

            else if (status == SINGLE_COMMENT) {
                if (current == '\n') {
                    status = NOT_IN_COMMENT; 
                    putchar('\n');
                }
            }

            else if (status == MULTI_COMMENT) {
                if (current == '/' && last_character == '*') status = NOT_IN_COMMENT; 
            }
        }

        last_character = current; 
    }
}
like image 209
Ryan Avatar asked Oct 08 '13 03:10

Ryan


2 Answers

Move your stripping of comments into a function (more useful), and read one line at a time with fgets(), last_character is ambiguous (does it mean last, or previous?), this uses far fewer calls to putchar(), only one printf (could use puts) per line, preserves most of what you were doing,

#include <stdio.h>
#include <string.h>
#define MAX_LENGTH 65536

#define NOT_IN_COMMENT 0
#define SINGLE_COMMENT 1
#define MULTI_COMMENT  2
int status = NOT_IN_COMMENT;  /* Are we in a comment? What type? */
int in_string = 0;            /* Are we inside of a string constant? */
char* stripcomments(char* stripped,char* code)
{
    int ndx;                      /* index for code[] */
    int ondx;                     /* index for output[] */
    char prevch;                  /* Value of the previous character */
    char ch;                      /* Character to input into */

    /* Remove all comments from the code and display results to user */
    for (ndx=ondx=0; ndx < strlen(code); ndx++)
    {
        char current = code[ndx];

        if (in_string) {
            if (current == '"') in_string = 0;
            stripped[ondx++] = current;
        }
        else {
            if (status == NOT_IN_COMMENT) {
                if (current == '"') {
                    stripped[ondx++] = current;
                    in_string = 1;
                    continue;
                }

                if (current == '/' && prevch == '/') status = SINGLE_COMMENT;
                else if (current == '*' && prevch == '/') status = MULTI_COMMENT;
                else if (current != '/' || (current == '/' && ndx < strlen(code)-1 && !(code[ndx+1] == '/' || code[ndx+1] == '*'))) stripped[ondx++] = current;
            }

            else if (status == SINGLE_COMMENT) {
                if (current == '\n') {
                    status = NOT_IN_COMMENT;
                    stripped[ondx++] = '\n';
                }
            }

            else if (status == MULTI_COMMENT) {
                if (current == '/' && prevch == '*') status = NOT_IN_COMMENT;
            }
        }
        prevch = current;
    }
    stripped[ondx] = '\0';
    return(stripped);
}

int main(void)
{
    char code[MAX_LENGTH];        /* Buffer that stores the inputted code */
    char stripped[MAX_LENGTH];

    while( fgets(code,sizeof(code),stdin) )
    {
        //printf("%s\n",code);
        //strip comments...
        stripcomments(stripped,code);
        if( strlen(stripped) > 0 ) printf("%s",stripped);
    }
}

I'll leave it to you to remove extra blank lines.

like image 145
ChuckCottrill Avatar answered Oct 27 '22 04:10

ChuckCottrill


When you're handling quoted strings, you should detect escaped quotes (\"). e.g. "\"/* not a comment */\"" is a valid string, but I think your code will strip the false comment from the middle of it.

If you want to be really correct, you should also handle line continuations (a line ending with a \ continues on the next line). For added hairiness, you also ought to handle trigraphs. ??/" is an escaped quote, and ??/ at the end of a line is a continuation.

The style of the code looks pretty good, although main should more properly be declared as int main(void).

like image 32
pburka Avatar answered Oct 27 '22 06:10

pburka