I'm learning C from the K&R book and for exercise 1.23 in the first chapter, I have to write a program that removes all comments given some C code that the user inputs. This is my completed program so far. Are there any improvements I can make to it?
/**
Tuesday, 10/07/2013
Exercise 1.23
Write a program to remove all comments from a C
program. Don't forget to handle quoted strings
and character constants properly. C comments
don't nest.
**/
#include <stdio.h>
#define MAX_LENGTH 65536
#define NOT_IN_COMMENT 0
#define SINGLE_COMMENT 1
#define MULTI_COMMENT 2
main()
{
char code[MAX_LENGTH]; /* Buffer that stores the inputted code */
int size = 0; /* Length of the inputted code */
int loop; /* Integer used for the for loop */
char c; /* Character to input into */
int status = NOT_IN_COMMENT; /* Are we in a comment? What type? */
int in_string = 0; /* Are we inside of a string constant? */
char last_character; /* Value of the last character */
/* Input all code into the buffer until escape sequence pressed */
while ((c = getchar()) != EOF)
code[size++] = c;
code[size] = '\0';
/* Remove all comments from the code and display results to user */
for (loop = 0; loop < size; loop++) {
char current = code[loop];
if (in_string) {
if (current == '"') in_string = 0;
putchar(current);
}
else {
if (status == NOT_IN_COMMENT) {
if (current == '"') {
putchar(current);
in_string = 1;
continue;
}
if (current == '/' && last_character == '/') status = SINGLE_COMMENT;
else if (current == '*' && last_character == '/') status = MULTI_COMMENT;
else if (current != '/' || (current == '/' && loop < size-1 && !(code[loop+1] == '/' || code[loop+1] == '*'))) putchar(current);
}
else if (status == SINGLE_COMMENT) {
if (current == '\n') {
status = NOT_IN_COMMENT;
putchar('\n');
}
}
else if (status == MULTI_COMMENT) {
if (current == '/' && last_character == '*') status = NOT_IN_COMMENT;
}
}
last_character = current;
}
}
Move your stripping of comments into a function (more useful), and read one line at a time with fgets(), last_character is ambiguous (does it mean last, or previous?), this uses far fewer calls to putchar(), only one printf (could use puts) per line, preserves most of what you were doing,
#include <stdio.h>
#include <string.h>
#define MAX_LENGTH 65536
#define NOT_IN_COMMENT 0
#define SINGLE_COMMENT 1
#define MULTI_COMMENT 2
int status = NOT_IN_COMMENT; /* Are we in a comment? What type? */
int in_string = 0; /* Are we inside of a string constant? */
char* stripcomments(char* stripped,char* code)
{
int ndx; /* index for code[] */
int ondx; /* index for output[] */
char prevch; /* Value of the previous character */
char ch; /* Character to input into */
/* Remove all comments from the code and display results to user */
for (ndx=ondx=0; ndx < strlen(code); ndx++)
{
char current = code[ndx];
if (in_string) {
if (current == '"') in_string = 0;
stripped[ondx++] = current;
}
else {
if (status == NOT_IN_COMMENT) {
if (current == '"') {
stripped[ondx++] = current;
in_string = 1;
continue;
}
if (current == '/' && prevch == '/') status = SINGLE_COMMENT;
else if (current == '*' && prevch == '/') status = MULTI_COMMENT;
else if (current != '/' || (current == '/' && ndx < strlen(code)-1 && !(code[ndx+1] == '/' || code[ndx+1] == '*'))) stripped[ondx++] = current;
}
else if (status == SINGLE_COMMENT) {
if (current == '\n') {
status = NOT_IN_COMMENT;
stripped[ondx++] = '\n';
}
}
else if (status == MULTI_COMMENT) {
if (current == '/' && prevch == '*') status = NOT_IN_COMMENT;
}
}
prevch = current;
}
stripped[ondx] = '\0';
return(stripped);
}
int main(void)
{
char code[MAX_LENGTH]; /* Buffer that stores the inputted code */
char stripped[MAX_LENGTH];
while( fgets(code,sizeof(code),stdin) )
{
//printf("%s\n",code);
//strip comments...
stripcomments(stripped,code);
if( strlen(stripped) > 0 ) printf("%s",stripped);
}
}
I'll leave it to you to remove extra blank lines.
When you're handling quoted strings, you should detect escaped quotes (\"
). e.g. "\"/* not a comment */\""
is a valid string, but I think your code will strip the false comment from the middle of it.
If you want to be really correct, you should also handle line continuations (a line ending with a \
continues on the next line). For added hairiness, you also ought to handle trigraphs. ??/"
is an escaped quote, and ??/
at the end of a line is a continuation.
The style of the code looks pretty good, although main should more properly be declared as int main(void)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With