I'm implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won't read it out. I'm reading character by character using >>
, and all the whitespace is gone. Is there any way to make the ifstream keep all the whitespace and read it out to me? I know that when reading whole strings, the read will stop at whitespace, but I was hoping that by reading character by character, I would avoid this behaviour.
Attempted: .get()
, recommended by many answers, but it has the same effect as std::noskipws
, that is, I get all the spaces now, but not the new-line character that I need to lex some constructs.
Here's the offending code (extended comments truncated)
while(input >> current) { always_next_struct val = always_next_struct(next); if (current == L' ' || current == L'\n' || current == L'\t' || current == L'\r') { continue; } if (current == L'/') { input >> current; if (current == L'/') { // explicitly empty while loop while(input.get(current) && current != L'\n'); continue; }
I'm breaking on the while
line and looking at every value of current
as it comes in, and \r
or \n
are definitely not among them- the input just skips to the next line in the input file.
std::ifstream Objects of this class maintain a filebuf object as their internal stream buffer, which performs input/output operations on the file they are associated with (if any). File streams are associated with files either on construction, or by calling member open .
std::noskipws This flag can be set with the skipws manipulator. When set, as many initial whitespace characters as necessary are read and discarded from the stream until a non-whitespace character is found. This would apply to every formatted input operation performed with operator>> on the stream.
There is a manipulator to disable the whitespace skipping behavior:
stream >> std::noskipws;
The operator>> eats whitespace (space, tab, newline). Use yourstream.get()
to read each character.
Edit:
Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).
Edit (analyzing code):
After
while(input.get(current) && current != L'\n'); continue;
there will be an \n
in current
, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current
. Is that not what you wanted?
I tried to reproduce your problem (using char
and cin
instead of wchar_t
and wifstream
):
//: get.cpp : compile, then run: get < get.cpp #include <iostream> int main() { char c; while (std::cin.get(c)) { if (c == '/') { char last = c; if (std::cin.get(c) && c == '/') { // std::cout << "Read to EOL\n"; while(std::cin.get(c) && c != '\n'); // this comment will be skipped // std::cout << "go to next line\n"; std::cin.putback(c); continue; } else { std::cin.putback(c); c = last; } } std::cout << c; } return 0; }
This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c)
statement. Without that the newline would not appear.
If it doesn't work the same for wifstream
, it would be very strange except for one reason: when the opened text file is not saved as 16bit char and the \n
char ends up in the wrong byte...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With