Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does reading a record struct fields from std::istream fail, and how can I fix it?

Suppose we have the following situation:

  • A record struct is declared as follows

    struct Person {
        unsigned int id;
        std::string name;
        uint8_t age;
        // ...
    };
    
  • Records are stored in a file using the following format:

    ID      Forename Lastname Age
    ------------------------------
    1267867 John     Smith    32
    67545   Jane     Doe      36
    8677453 Gwyneth  Miller   56
    75543   J. Ross  Unusual  23
    ...
    

The file should be read in to collect an arbitrary number of the Person records mentioned above:

std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;

Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
    persons.push_back(actRecord);
}

if(!ifs) {
    std::err << "Input format error!" << std::endl;
} 

Question:
What can I do to read in the separate values storing their values into the one actRecord variables' fields?

The above code sample ends up with run time errors:

Runtime error    time: 0 memory: 3476 signal:-1
stderr: Input format error!
like image 271
πάντα ῥεῖ Avatar asked Apr 13 '14 19:04

πάντα ῥεῖ


7 Answers

What can I do to read in the separate words forming the name into the one actRecord.name variable?

The general answer is: No, you can't do this without additional delimiter specifications and exceptional parsing for the parts forming the intended actRecord.name contents.
This is because a std::string field will be parsed just up to the next occurence of a whitespace character.

It's noteworthy that some standard formats (like e.g. .csv) may require to support distinguishing blanks (' ') from tab ('\t') or other characters, to delimit certain record fields (which may not be visible at a first glance).

Also note:
To read an uint8_t value as numeric input, you'll have to deviate using a temporary unsigned intvalue. Reading just a unsigned char (aka uint8_t) will screw up the stream parsing state.

like image 173
πάντα ῥεῖ Avatar answered Oct 11 '22 12:10

πάντα ῥεῖ


You have whitespace between firstname and lastname. Change your class to have firstname and lastname as separate strings and it should work. The other thing you can do is to read in two separate variables such as name1 and name2 and assign it as

actRecord.name = name1 + " " + name2;
like image 30
unxnut Avatar answered Oct 11 '22 12:10

unxnut


Here's an implementation of a manipulator I came up with that counts the delimiter through each extracted character. Using the number of delimiters you specify, it will extract words from the input stream. Here's a working demo.

template<class charT>
struct word_inserter_impl {
    word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
        : str_(str)
        , delim_(delim)
        , words_(words)
    { }

    friend std::basic_istream<charT>&
    operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
        typename std::basic_istream<charT>::sentry ok(is);

        if (ok) {
            std::istreambuf_iterator<charT> it(is), end;
            std::back_insert_iterator<std::string> dest(wi.str_);

            while (it != end && wi.words_) {
                if (*it == wi.delim_ && --wi.words_ == 0) {
                    break;
                }
                dest++ = *it++;
            }
        }
        return is;
    }
private:
    std::basic_string<charT>& str_;
    charT delim_;
    mutable std::size_t words_;
};

template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
    return word_inserter_impl<charT>(words, str, delim);
}

Now you can just do:

while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
    std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}

Live Demo

like image 34
David G Avatar answered Oct 11 '22 10:10

David G


A solution would be to read in the first entry into an ID variable.
Then read in all the other words from the line (just push them in a temporary vector) and construct the name of the individual with all the elements, except the last entry which is the Age.

This would allow you to still have the Age on the last position but be able to deal with name like "J. Ross Unusual".

Update to add some code which illustrates the theory above:

#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>

struct Person {
    unsigned int id;
    std::string name;
    int age;
};

int main()
{
    std::fstream ifs("in.txt");
    std::vector<Person> persons;

    std::string line;
    while (std::getline(ifs, line))
    {
        std::istringstream iss(line);

        // first: ID simply read it
        Person actRecord;
        iss >> actRecord.id;

        // next iteration: read in everything
        std::string temp;
        std::vector<std::string> tempvect;
        while(iss >> temp) {
            tempvect.push_back(temp);
        }

        // then: the name, let's join the vector in a way to not to get a trailing space
        // also taking care of people who do not have two names ...
        int LAST = 2;
        if(tempvect.size() < 2) // only the name and age are in there
        {
            LAST = 1;
        }
        std::ostringstream oss;
        std::copy(tempvect.begin(), tempvect.end() - LAST,
            std::ostream_iterator<std::string>(oss, " "));
        // the last element
        oss << *(tempvect.end() - LAST);
        actRecord.name = oss.str();

        // and the age
        actRecord.age = std::stoi( *(tempvect.end() - 1) );
        persons.push_back(actRecord);
    }

    for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
    {
        std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
    }
}
like image 41
Ferenc Deak Avatar answered Oct 11 '22 11:10

Ferenc Deak


Since we can easily split a line on whitespace and we know that the only value that can be separated is the name, a possible solution is to use a deque for each line containing the whitespace separated elements of the line. The id and the age can easily be retrieved from the deque and the remaining elements can be concatenated to retrieve the name:

#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>

struct Person {
    unsigned int id;
    std::string name;
    uint8_t age;
};

int main(int argc, char* argv[]) {

    std::ifstream ifs("SampleInput.txt");
    std::vector<Person> records;

    std::string line;
    while (std::getline(ifs,line)) {

        std::istringstream ss(line);

        std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});

        Person record;
        record.id = std::stoi(info.front()); info.pop_front();
        record.age = std::stoi(info.back()); info.pop_back();

        std::ostringstream name;
        std::copy
            ( info.begin()
            , info.end()
            , std::ostream_iterator<std::string>(name," "));
        record.name = name.str(); record.name.pop_back();

        records.push_back(std::move(record));
    }

    for (auto& record : records) {
        std::cout << record.id << " " << record.name << " " 
                  << static_cast<unsigned int>(record.age) << std::endl;
    }

    return 0;
}
like image 42
Veritas Avatar answered Oct 11 '22 12:10

Veritas


Another attempt at solving the parsing problem.

int main()
{
   std::ifstream ifs("test-115.in");
   std::vector<Person> persons;

   while (true)
   {
      Person actRecord;
      // Read the ID and the first part of the name.
      if ( !(ifs >> actRecord.id >> actRecord.name ) )
      {
         break;
      }

      // Read the rest of the line.
      std::string line;
      std::getline(ifs,line);

      // Pickup the rest of the name from the rest of the line.
      // The last token in the rest of the line is the age.
      // All other tokens are part of the name.
      // The tokens can be separated by ' ' or '\t'.
      size_t pos = 0;
      size_t iter1 = 0;
      size_t iter2 = 0;
      while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
              (iter2 = line.find('\t', pos)) != std::string::npos )
      {
         size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
         actRecord.name += line.substr(pos, (iter - pos + 1));
         pos = iter + 1;

         // Skip multiple whitespace characters.
         while ( isspace(line[pos]) )
         {
            ++pos;
         }
      }

      // Trim the last whitespace from the name.
      actRecord.name.erase(actRecord.name.size()-1);

      // Extract the age.
      // std::stoi returns an integer. We are assuming that
      // it will be small enough to fit into an uint8_t.
      actRecord.age = std::stoi(line.substr(pos).c_str());

      // Debugging aid.. Make sure we have extracted the data correctly.
      std::cout << "ID: " << actRecord.id
         << ", name: " << actRecord.name
         << ", age: " << (int)actRecord.age << std::endl;
      persons.push_back(actRecord);
   }

   // If came here before the EOF was reached, there was an
   // error in the input file.
   if ( !(ifs.eof()) ) {
       std::cerr << "Input format error!" << std::endl;
   } 
}
like image 39
R Sahu Avatar answered Oct 11 '22 10:10

R Sahu


When seeing such an input file, I think it is not a (new way) delimited file, but a good old fixed size fields one, like Fortran and Cobol programmers used to deal with. So I would parse it like that (note I separated forename and lastname) :

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

struct Person {
    unsigned int id;
    std::string forename;
    std::string lastname;
    uint8_t age;
    // ...
};

int main() {
    std::istream& ifs = std::ifstream("file.txt");
    std::vector<Person> persons;
    std::string line;
    int fieldsize[] = {8, 9, 9, 4};

    while(std::getline(ifs, line)) {
        Person person;
        int field = 0, start=0, last;
        std::stringstream fieldtxt;
        fieldtxt.str(line.substr(start, fieldsize[0]));
        fieldtxt >> person.id;
        start += fieldsize[0];
        person.forename=line.substr(start, fieldsize[1]);
        last = person.forename.find_last_not_of(' ') + 1;
        person.forename.erase(last);
        start += fieldsize[1];
        person.lastname=line.substr(start, fieldsize[2]);
        last = person.lastname.find_last_not_of(' ') + 1;
        person.lastname.erase(last);
        start += fieldsize[2];
        std::string a = line.substr(start, fieldsize[3]);
        fieldtxt.str(line.substr(start, fieldsize[3]));
        fieldtxt >> age;
        person.age = person.age;
        persons.push_back(person);
    }
    return 0;
}
like image 21
Serge Ballesta Avatar answered Oct 11 '22 10:10

Serge Ballesta