Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a string of length greater than 4096 bytes from stdin in C++

Tags:

c++

I am trying to read in a string of length in 10^5 order. I get incorrect string if the size of string grows beyond 4096. I am using the following code

string a;
cin>>a;

This didn't work then I tried reading character by character by following code

unsigned char c;
vector<unsigned char> a;
while(count>0){
 c = getchar();
 a.push_back(c);
 count--;
}

I have done necessary escaping for using getchar this also had the 4096 bytes problem. Can someone suggest a workaround or point to correct way of reading it.

like image 854
Baruntar Avatar asked Apr 05 '14 20:04

Baruntar


3 Answers

Using this test-program based on what you posted:

#include <iostream>
#include <string>


int main()
{
    std::string a;

    std::cin >> a;

    std::cout << a.length() << std::endl;
}

I can do:

./a.out < fact100000.txt

and get the output:

456574

However, if I copy'n'paste from an editor to the console, it stops at 4095. I expect that's a limit somewhere in the consoles copy'n'paste handling. The easy solution to that is of course to not use copy'n'paste, but redirect from a file. On some other systems, the restruction to 4KB of input may of course reside somewhere else. (Note that, at least on my system, I can happily copy and paste the 450KB of factorial result to another editor window, so in my system it's simply the console buffer that is the problem).

like image 119
Mats Petersson Avatar answered Nov 12 '22 03:11

Mats Petersson


It is because your terminal inputs are buffered in the I/O queue of the kernel.

Input and output queues of a terminal device implement a form of buffering within the kernel independent of the buffering implemented by I/O streams.

The terminal input queue is also sometimes referred to as its typeahead buffer. It holds the characters that have been received from the terminal but not yet read by any process.

The size of the input queue is described by the MAX_INPUT and _POSIX_MAX_INPUT parameters;

By default, your terminal is in Canonical mode.

In canonical mode, all input stays in the queue until a newline character is received, so the terminal input queue can fill up when you type a very long line.


We can change the input mode of terminal from canonical mode to non-canonical mode.

You can do it from terminal:

$ stty -icanon (change the input mode to non-canonical)
$ ./a.out (run your program)
$ stty icanon (change it back to canonical)

Or you can also do it programatically,

To change the input mode programatically we have to use low level terminal interface.

So you can do something like:

#include <iostream>
#include <string>
#include <stdio.h>
#include <termios.h> 
#include <unistd.h>

int clear_icanon(void)
{
  struct termios settings;
  int result;
  result = tcgetattr (STDIN_FILENO, &settings);
  if (result < 0)
    {
      perror ("error in tcgetattr");
      return 0;
    }

  settings.c_lflag &= ~ICANON;

  result = tcsetattr (STDIN_FILENO, TCSANOW, &settings);
  if (result < 0)
    {
      perror ("error in tcsetattr");
      return 0;
   }
  return 1;
}


int main()
{
    clear_icanon(); // Changes terminal from canonical mode to non canonical mode.

    std::string a;

    std::cin >> a;

    std::cout << a.length() << std::endl;
}
like image 35
BitFlip Avatar answered Nov 12 '22 02:11

BitFlip


This is much more likely to be a platform/OS problem than a C++ problem. What OS are you using, and what method are you using to get the string fed to stdin? It's pretty common for command-line arguments to be capped at a certain size.

In particular, given that you've tried reading one character at a time, and it still didn't work, this seems like a problem with getting the string to the program, rather than a C++ issue.

like image 45
Mark Bessey Avatar answered Nov 12 '22 03:11

Mark Bessey