I'm writing a program for school that asks to read text from a file, capitalizes everything, and removes the punctuation and spaces. The file "Congress.txt" contains
(Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the government for a redress of grievances.)
It reads in correctly but what I have so far to remove the punctuation, spaces, and capitalize causes some major problems with junk characters. My code so far is:
void processFile(char line[]) {
FILE *fp;
int i = 0;
char c;
if (!(fp = fopen("congress.txt", "r"))) {
printf("File could not be opened for input.\n");
exit(1);
}
line[i] = '\0';
fseek(fp, 0, SEEK_END);
fseek(fp, 0, SEEK_SET);
for (i = 0; i < MAX; ++i) {
fscanf(fp, "%c", &line[i]);
if (line[i] == ' ')
i++;
else if (ispunct((unsigned char)line[i]))
i++;
else if (islower((unsigned char)line[i])) {
line[i] = toupper((unsigned char)line[i]);
i++;
}
printf("%c", line[i]);
fprintf(csis, "%c", line[i]);
}
fclose(fp);
}
I don't know if it's an issue but I have MAX defined as 272 because that's what the text file is including punctuation and spaces.
My output I am getting is:
C╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠
╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠Press any key to continue . . .
The fundamental algorithm needs to be along the lines of:
while next character is not EOF
if it is alphabetic
save the upper case version of it in the string
null terminate the string
which translates into C as:
int c;
int i = 0;
while ((c = getc(fp)) != EOF)
{
if (isalpha(c))
line[i++] = toupper(c);
}
line[i] = '\0';
This code doesn't need the (unsigned char)
cast with the functions from <ctype.h>
because c
is guaranteed to contain either EOF (in which case it doesn't get into the body of the loop) or the value of a character converted to unsigned char
anyway. You only have to worry about the cast when you use char c
(as in the code in the question) and try to write toupper(c)
or isalpha(c)
. The problem is that plain char
can be a signed type, so some characters, notoriously ÿ (y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS), will appear as a negative value, and that breaks the requirements on the inputs to the <ctype.h>
functions. This code will attempt to case-convert characters that are already upper-case, but that's probably cheaper than a second test.
What else you do in the way of printing, etc is up to you. The csis
file stream is a global scope variable; that's a bit (tr)icky. You should probably terminate the output printing with a newline.
The code shown is vulnerable to buffer overflow. If the length of line
is MAX
, then you can modify the loop condition to:
while (i < MAX - 1 && (c = getc(fp)) != EOF)
If, as would be a better design, you change the function signature to:
void processFile(int size, char line[]) {
and assert that the size is strictly positive:
assert(size > 0);
and then the loop condition changes to:
while (i < size - 1 && (c = getc(fp)) != EOF)
Obviously, you change the call too:
char line[4096];
processFile(sizeof(line), line);
in the posted code, there is no intermediate processing, so the following code ignores the 'line[]' input parameter
void processFile()
{
FILE *fp = NULL;
if (!(fp = fopen("congress.txt", "r")))
{
printf("File could not be opened for input.\n");
exit(1);
}
// implied else, fopen successful
unsigned int c; // must be integer so EOF (-1) can be recognized
while( EOF != (c =(unsigned)fgetc(fp) ) )
{
if( (isalpha(c) || isblank(c) ) && !ispunct(c) ) // a...z or A...Z or space
{
// note toupper has no effect on upper case characters
// note toupper has no effect on a space
printf("%c", toupper(c));
fprintf(csis, "%c", toupper(c));
}
}
printf( "\n" );
fclose(fp);
} // end function: processFile
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With