Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getline() vs. fgets(): Control memory allocation

Tags:

c

posix

To read lines from a file there are the getline() and fgets() POSIX functions (ignoring the dreaded gets()). It is common sense that getline() is preferred over fgets() because it allocates the line buffer as needed.

My question is: Isn’t that dangerous? What if by accident or malicious intent someone creates a 100GB file with no '\n' byte in it – won’t that make my getline() call allocate an insane amount of memory?

like image 962
edavid Avatar asked May 03 '19 12:05

edavid


People also ask

Why is getline better than fgets?

To read lines from a file there are the getline() and fgets() POSIX functions (ignoring the dreaded gets() ). It is common sense that getline() is preferred over fgets() because it allocates the line buffer as needed.

Does Getline allocate memory?

getline() , shown above is an interesting use-case because it is a library function that not only allocates memory it leaves to the caller to free, but can fail for a number of reasons, all of which must be taken into account.

How does Getline work in C?

The getline method reads a full line from a stream, such as a newline character. To finish the input, use the getline function to generate a stop character. The command will be completed, and this character will be removed from the input.


1 Answers

My question is: Isn’t that dangerous? What if by accident or malicious intent someone creates a 100GB file with no '\n' byte in it – won’t that make my getline() call allocate an insane amount of memory?

Yes, what you describe is a plausible risk. However,

  • if the program requires loading an entire line into memory at once, then allowing getline() to attempt to do that is not inherently more risky than writing your own code to do it with fgets(); and
  • if you have a program that has such a vulnerability, then you can mitigate the risk by using setrlimit() to limit the total amount of (virtual) memory it can reserve. This can be used to cause it to fail instead of successfully allocating enough memory to interfere with the rest of the system.

Best overall, I'd argue, is to write code that does not require input in units of full lines (all at once) in the first place, but such an approach has its own complexities.

like image 80
John Bollinger Avatar answered Sep 22 '22 05:09

John Bollinger