Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Read matrices from file with multiple delimiters [duplicate]

Tags:

c++

matrix

So I am given a file with ten matrices, and I would like to read from file these matrices and save them into vectors/arrays, where each matrix is stored into either a vector or an array. However, the format of these matrices makes it hard for me to read the data(I'm not good with reading from input file).

the file has the following format. Elements of each matrix are separated by "," . Each row is separated by ";", and each matrix is separated by "|". For example three 2 by 2 matrices are as follows.

1,2;3,4|0,1;1,0|5,3;3,1|

And I just want to save matrices into three different vectors, but I am not sure how to do this.

I tried

    while(getline(inFile,line)){
        stringstream linestream(line);
        string value;
        while(getline(linestream, value, ','){
               //save into vector
        }
    }

But this is obviously very crude, and only seperates data by comma. Is there a way to separate the data with multiple delimiters?

Thank you!

like image 233
Kook-Jin Yeo Avatar asked Feb 23 '17 07:02

Kook-Jin Yeo


4 Answers

string line;
while(getline(infile, line, '|'))
{
    stringstream rowstream(line);
    string row;
    while(getline(rowstream, row, ';'))
    {
           stringstream elementstream(row);
            string element;
            while(getline(elementstream, element, ','))
            {
                cout << element << endl;                    
            }
    }
}

Using above code you can build the logic to store individual element as you like.

like image 173
sameerkn Avatar answered Nov 02 '22 01:11

sameerkn


I use this own function to split a string to a vector of strings :

/**
 * \brief   Split a string in substrings
 * \param   sep  Symbol separating the parts
 * \param   str  String to be splitted
 * \return  Vector containing the splitted parts
 * \pre     The separator can not be 0
 * \details Example :
 * \code
 * std::string str = "abc.def.ghi..jkl.";
 * std::vector<std::string> split_str = split('.', str); // the vector is ["abc", "def", "ghi", "", "jkl", ""]
 * \endcode
 */
std::vector<std::string> split(char sep, const std::string& str);

std::vector<std::string> split(char sep, const std::string& str)
{
  assert(sep != 0 && "PRE: the separator is null");
  std::vector<std::string> s;
  unsigned long int i = 0;
  for(unsigned long int j = 0; j < str.length(); ++j)
  {
    if(str[j] == sep)
    {
      s.push_back(str.substr(i, j - i));
      i = j + 1;
    }
  }
  s.push_back(str.substr(i, str.size() - i));
  return s;
}

Then, expecting you have a class Matrix, you can do something like :

std::string matrices_str;
std::ifstream matrix_file(matrix_file_name.c_str());
matrix_file >> matrices_str;
const std::vector<std::string> matrices = split('|', matrices_str);
std::vector<Matrix<double> > M(matrices.size());
for(unsigned long int i = 0; i < matrices.size(); ++i)
{
  const std::string& matrix = matrices[i];
  const std::vector<std::string> rows = split(';', matrix);
  for(unsigned long int j = 0; j < rows.size(); ++j)
  {
    const std::string& row = matrix[i];
    const std::vector<std::string> elements = split(',', row);
    for(unsigned long int k = 0; k < elements.size(); ++k)
    {
      const std::string& element = elements[k];
      if(j == 0 && k == 0)
        M[i].resize(rows.size(), elements.size());
      std::istringstream iss(element);
      iss >> M[i](j,k);
    }
  }
}

Or, compressed code :

std::string matrices_str;
std::ifstream matrix_file(matrix_file_name.c_str());
matrix_file >> matrices_str;
const std::vector<std::string> matrices = split('|', matrices_str);
std::vector<Matrix<double> > M(matrices.size());
for(unsigned long int i = 0; i < matrices.size(); ++i)
{
  const std::vector<std::string> rows = split(';', matrices[i]);
  for(unsigned long int j = 0; j < rows.size(); ++j)
  {
    const std::vector<std::string> elements = split(',', matrix[i]);
    for(unsigned long int k = 0; k < elements.size(); ++k)
    {
      if(j == 0 && k == 0)
        M[i].resize(rows.size(), elements[k].size());
      std::istringstream iss(elements[k]);
      iss >> M[i](j,k);
    }
  }
}
like image 45
Caduchon Avatar answered Nov 02 '22 00:11

Caduchon


You can use finite state machine concept. You need define states for each step. Read one char and then decide what it is (number or delimiter).

Here is concept how you could do it. For more reading check this on internet. text parsing, finite state machine, lexical analyzer, formal grammar

enum State
{
    DECIMAL_NUMBER,
    COMMA_D,
    SEMICOLON_D,
    PIPE_D,
    ERROR_STATE,
};

char GetChar()
{
    // implement proper reading from file
    static char* input = "1,2;3,4|0,1;1,0|5,3;3,1|";
    static int index = 0;

    return input[index++];
}

State GetState(char c)
{
    if ( isdigit(c) )
    {
        return DECIMAL_NUMBER;
    }
    else if ( c == ',' )
    {
        return COMMA_D;
    }
    else if ( c == ';' )
    {
        return SEMICOLON_D;
    }
    else if ( c == '|' )
    {
        return PIPE_D;
    }

    return ERROR_STATE;
}

int main(char* argv[], int argc)
{
    char c;
    while ( c = GetChar() )
    {
        State s = GetState(c);
        switch ( c )
        {
        case DECIMAL_NUMBER:
            // read numbers
            break;
        case COMMA_D:
            // append into row
            break;
        case SEMICOLON_D:
            // next row
            break;
        case PIPE_D:
            // finish one matrix
            break;
        case ERROR_STATE:
            // syntax error
            break;
        default:
            break;
        }
    }
    return 0;
}
like image 33
elanius Avatar answered Nov 02 '22 01:11

elanius


The example you have actually maps to a very simple byte machine.

Start with a zeroed matrix and something that keeps track where in the matrix you're writing. Read one character at a time. If the character is a digit, multiply the current number in the matrix by 10 and add the digit to it, if the character is a comma, advance to the next number in the row, if the character is a semi-colon go to the next row, if the character is a pipe, start a new matrix.

You might not want to do it exactly this way if the numbers are floating point. I'd save them in a buffer and use a standard method of parsing floating point numbers. But other than that you don't really need to keep much complex state or build a large parser. You might want to add error handling at a later stage, but even there the error handling is pretty trivial and only depends on the current character you're scanning.

like image 22
Art Avatar answered Nov 02 '22 00:11

Art