Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract data from a line which has fields separated by '|' character in C++?

I have data in the following format in a text file. Filename - empdata.txt Note that there are no blank space between the lines.

Sl|EmployeeID|Name|Department|Band|Location

1|327427|Brock Mcneil|Research and Development|U2|Pune

2|310456|Acton Golden|Advertising|P3|Hyderabad

3|305540|Hollee Camacho|Payroll|U3|Bangalore

4|218801|Simone Myers|Public Relations|U3|Pune

5|144051|Eaton Benson|Advertising|P1|Chennai

I have a class like this

class empdata
{
public:
int sl,empNO;
char name[20],department[20],band[3],location[20];
};

I created an array of objects of class empdata. How to read the data from the file which has n lines of data in the above specified format and store them to the array of (class)objects created?

This is my code

int main () {
string line;
ifstream myfile ("empdata.txt");
for(int i=0;i<10;i++) //processing only first 10 lines of the file
{
    getline (myfile,line);
    //What should I do with this "line" so that I can extract data 
    //from this line and store it in the class object?             
     
}

  return 0;
}

So basically my question is how to extract data from a string which has data separated by '|' character and store each data to a separate variable

like image 555
Anish Kumar Avatar asked Jul 27 '15 09:07

Anish Kumar


2 Answers

I prefer to use the String Toolkit. The String Toolkit will take care of converting the numbers as it parses.

Here is how I would solve it.

#include <fstream>
#include <strtk.hpp>   // http://www.partow.net/programming/strtk

using namespace std;

// using strings instead of character arrays
class Employee
{
    public:
    int index;
    int employee_number;
    std::string name;
    std::string department;
    std::string band;
    std::string location;
};


std::string filename("empdata.txt");

// assuming the file is text
std::fstream fs;
fs.open(filename.c_str(), std::ios::in);

if(fs.fail())  return false;   

const char *whitespace    = " \t\r\n\f";

const char *delimiter    = "|";

std::vector<Employee> employee_data;

// process each line in turn
while( std::getline(fs, line ) )
{

// removing leading and trailing whitespace
// can prevent parsing problemsfrom different line endings.

    strtk::remove_leading_trailing(whitespace, line);


    // strtk::parse combines multiple delimeters in these cases

    Employee e;

    if( strtk::parse(line, delimiter, e.index, e.employee_number, e.name, e.department, e.band, e.location) )
    {
         std::cout << "succeed" << std::endl;
     employee_data.push_back( e );
    }

}
like image 175
DannyK Avatar answered Oct 31 '22 02:10

DannyK


AFAIK, there is nothing that does it out of the box. But you have all the tools to build it yourself

The C way

You read the lines into a char * (with cin.getline()) and then use strtok, and strcpy

The getline way

The getline function accept a third parameter to specify a delimiter. You can make use of that to split the line through a istringstream. Something like :

int main() {
    std::string line, temp;
    std::ifstream myfile("file.txt");
    std::getline(myfile, line);
    while (myfile.good()) {
        empdata data;
        std::getline(myfile, line);
        if (myfile.eof()) {
            break;
        }
        std::istringstream istr(line);
        std::getline(istr, temp, '|');
        data.sl = ::strtol(temp.c_str(), NULL, 10);
        std::getline(istr, temp, '|');
        data.empNO = ::strtol(temp.c_str(), NULL, 10);
        istr.getline(data.name, sizeof(data.name), '|');
        istr.getline(data.department, sizeof(data.department), '|');
        istr.getline(data.band, sizeof(data.band), '|');
        istr.getline(data.location, sizeof(data.location), '|');
    }
    return 0;
}

This is the C++ version of the previous one

The find way

You read the lines into a string (as you currently do) and use string::find(char sep, size_t pos) to find next occurence of the separator and copy the data (from string::c_str()) between start of substring and separator to your fields

The manual way

You just iterate the string. If the character is a separator, you put a NULL at the end of current field and pass to next field. Else, you just write the character in current position of current field.

Which to choose ?

If you are more used to one of them, stick to it.

Following is just my opinion.

The getline way will be the simplest to code and to maintain.

The find way is mid level. It is still at a rather high level and avoids the usage of istringstream.

The manual way will be really low level, so you should structure it to make it maintainable. For example your could a explicit description of the lines as an array of fields with a maximimum size and current position. And as you have both int and char[] fields it will be tricky. But you can easily configure it the way you want. For example, your code only allow 20 characters for department field, whereas Research and Development in line 2 is longer. Without special processing, the getline way will leave the istringstream in bad state and will not read anything more. And even if you clear the state, you will be badly positionned. So you should first read into a std::string and then copy the beginning to the char * field.

Here is a working manual implementation :

class Field {
public:
    virtual void reset() = 0;
    virtual void add(empdata& data, char c) = 0;
};

class IField: public Field {
private:
    int (empdata::*data_field);
    bool ok;

public:
    IField(int (empdata::*field)): data_field(field) {
        ok = true;
        reset();
    }
    void reset() { ok = true; }
    void add(empdata& data, char c);
};

void IField::add(empdata& data, char c) {
    if (ok) {
        if ((c >= '0') && (c <= '9')) {
            data.*data_field = data.*data_field * 10  + (c - '0');
        }
        else {
            ok = false;
        }
    }
}


class CField: public Field {
private:
    char (empdata::*data_field);
    size_t current_pos;
    size_t size;

public:
    CField(char (empdata::*field), size_t size): data_field(field), size(size) {
        reset();
    }
    void reset() { current_pos = 0; }
    void add(empdata& data, char c);
};

void CField::add(empdata& data, char c) {
    if (current_pos < size) {
        char *ix = &(data.*data_field);
        ix[current_pos ++] = c;
        if (current_pos == size) {
            ix[size -1] = '\0';
            current_pos +=1;
        }
    }
}

int main() {
    std::string line, temp;
    std::ifstream myfile("file.txt");
    Field* fields[] = {
        new IField(&empdata::sl),
        new IField(&empdata::empNO),
        new CField(reinterpret_cast<char empdata::*>(&empdata::name), 20),
        new CField(reinterpret_cast<char empdata::*>(&empdata::department), 20),
        new CField(reinterpret_cast<char empdata::*>(&empdata::band), 3),
        new CField(reinterpret_cast<char empdata::*>(&empdata::location), 20),
        NULL
    };
    std::getline(myfile, line);
    while (myfile.good()) {
        Field** f = fields;
        empdata data = {0};
        std::getline(myfile, line);
        if (myfile.eof()) {
            break;
        }
        for (std::string::const_iterator it = line.begin(); it != line.end(); it++) {
            char c;
            c = *it;
            if (c == '|') {
                f += 1;
                if (*f == NULL) {
                    continue;
                }
                (*f)->reset();
            }
            else {
                (*f)->add(data, c);
            }
        }
        // do something with data ...
    }
    for(Field** f = fields; *f != NULL; f++) {
        free(*f);
    }
    return 0;
}

It is directly robust, efficient and maintainable : adding a field is easy, and it is tolerant to errors in input file. But it is way loooonger than the other ones, and would need much more tests. So I would not advise to use it without special reasons (necessity to accept multiple separators, optional fields and dynamic order, ...)

like image 34
Serge Ballesta Avatar answered Oct 31 '22 01:10

Serge Ballesta