Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to scan the rest of a line in c

Tags:

c

I have several lines of integers in a file E.g

100 20 300 20 9 45 -1
101 80 80 2 80 2 50 3 70 -1

I want to read in the first 2 characters and store them in integer variables, then store the rest of the characters in a string which I can iterate through later on.

do {
    fscanf(file, "%d %d", &var1,&var2);
    }while(!feof(file));

Now I want to scan the rest of the line, move to the next line and repeat. but I am not sure how to scan the rest of the line into a var3 string

..any ideas?

like image 779
James Avatar asked Dec 15 '22 18:12

James


2 Answers

Here's the first thing you do. Throw away any thought of using scanf("%s") unless you are totally in control of the input data. Otherwise you open yourself up for a buffer overflow.

This answer show a safe way to use fgets for user input, giving buffer overflow detection/avoidance, and line-clearing, which could easily be adapted to any input stream.

Once you have the line (and the whole line) as a string, and you therefore know the maximum size it could be, you can simply use:

char strBuff[1000]. str1[1000]; // Ensure both big enough.
:
// Use /getLine/fgets to get the line into strBuff.
:
int numScanned = sscanf (strBuff, "%d %d %[^\n]", &int1, &int2, str1);

What the %[^\n] format specifier does is to scan any number of non-newline characters into a string: [] represents a character class, the ^ means "match everything but the following characters, and the characters used for (non-)matching are the newline \n.

Standards citation follows at the bottom of this answer (a).


For example, using that function:

#include <stdio.h>
#include <string.h>

#define OK       0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
    int ch, extra;

    // Get line with buffer overrun protection.
    if (prmpt != NULL) {
        printf ("%s", prmpt);
        fflush (stdout);
    }
    if (fgets (buff, sz, stdin) == NULL)
        return NO_INPUT;

    // If it was too long, there'll be no newline. In that case, we flush
    // to end of line so that excess doesn't affect the next call.
    if (buff[strlen(buff)-1] != '\n') {
        extra = 0;
        while (((ch = getchar()) != '\n') && (ch != EOF))
            extra = 1;
        return (extra == 1) ? TOO_LONG : OK;
    }

    // Otherwise remove newline and give string back to caller.
    buff[strlen(buff)-1] = '\0';
    return OK;
}

with the following program:

int main (void) {
    int rc, numScanned, int1, int2;;
    char strBuff[100], str1[100];

    rc = getLine ("Yes> ", strBuff, sizeof(strBuff));
    if (rc == NO_INPUT) {
        // Extra NL since my system doesn't output that on EOF.
        printf ("\nNo input\n");
        return 1;
    }

    if (rc == TOO_LONG) {
        printf ("Input too long [%s]\n", strBuff);
        return 1;
    }

    printf ("OK [%s]\n", strBuff);

    numScanned = sscanf (strBuff, "%d %d %[^\n]", &int1, &int2, str1);
    printf ("numScanned = %d\n", numScanned);
    printf ("int1       = %d\n", int1);
    printf ("int2       = %d\n", int2);
    printf ("str1       = [%s]\n", str1);

    return 0;
}

gives the following output:

Yes> 100 20 300 20 9 45 -1 blah blah blah
OK [100 20 300 20 9 45 -1 blah blah blah]
numScanned = 3
int1       = 100
int2       = 20
str1       = [300 20 9 45 -1 blah blah blah]

(a) Section 7.20.6.2 The fscanf function of C11 (although this is unchanged from C99) states this about the [ format specifier, slightly paraphrased to remove irrelevant multi-byte stuff:

The [ format specifier matches a nonempty sequence of characters from a set of expected characters (the scanset).

The corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically.

The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket (]).

The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the right bracket character is in the scanlist and the next following right bracket character is the matching right bracket that ends the specification; otherwise the first following right bracket character is the one that ends the specification.

like image 67
paxdiablo Avatar answered Jan 03 '23 04:01

paxdiablo


Nah, you can use scanf provided that you know what size your buffer is. You can avoid buffer overflow and test for when it happened. The recovery logic messes up things, but it is still possible. I'd suggest making the buffer big enough that an overflow really is a give-up-and-die kind of error.

First suppose a 256-bye buffer, plus some other variables you need to declare. The longest string you can store there is 255 bytes. You probably want to scan internal blanks, but don't want the \n newline at the end to be part of your string. (That's the main problem with fgets, in this case.) The magic sequence is:

char var[256], endchar = '\n';
int n;

n = scanf("%255[^\n]%c", var, &endchar);
if ((n < 1) || (endchar!='\n') || ferror(stdin))
{
    if (n==2) { /*it's a buffer overflow*/ }
    else if (n==0 && !ferror(stdin)) { /*must be EOF on 1st byte*/ }
    else { /*an I/O error occurred*/ }
} else { /* OK */ }

That's pretty much bulletproof, and all the looping happens in the library. The scanf format breaks down as:

  1. %255[^\n]: A string of up to 255 of anything but a newline.
  2. %c: A single char that stores whatever is next, if anything.

The return value is the number of fields successfully stored. That, the ending value of endchar and the ferror() result tell you everything you need to know in a couple of if statements. A single if detects the normal case.

That allows for an EOF without a newline on the last line. In that case, feof(stdin) will be true for the outer loop to detect.

PS: The arguments against scanf %s (and the related %[]) are well founded, but %nnns and %nnn[] are perfectly safe if you can ensure that the "nnn" value agrees with the buffer size. Sadly, there's no way to supply a compute buffer size to the format. The best option I know of is to dynamically generate the scanf() format with sprintf().

like image 27
Mike Housky Avatar answered Jan 03 '23 06:01

Mike Housky