Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all substring's occurrences and locations

I'm writing a program to parse some data saved as text files. What I am trying to do is find the location of every needle in a haystack. I already can read the file in and determine the number of occurrences, but I am looking to find the index also.

like image 210
Thomas Havlik Avatar asked Oct 27 '10 15:10

Thomas Havlik


People also ask

Which method find the list of all occurrences of the pattern in the?

finditer() To get all occurrences of a pattern in a given string, you can use the regular expression method re. finditer(pattern, string) . The result is an iterable of match objects—you can retrieve the indices of the match using the match.

How do you find all occurrences of a substring in a string in C++?

If the substring doesn't occur in the string, the function returns string::npos . To find all occurrences of a substring in a string, we can repeatedly call the string::find function within a loop, where the search for the next substring starts with the index of the previous match.


2 Answers

string str,sub; // str is string to search, sub is the substring to search for

vector<size_t> positions; // holds all the positions that sub occurs within str

size_t pos = str.find(sub, 0);
while(pos != string::npos)
{
    positions.push_back(pos);
    pos = str.find(sub,pos+1);
}

Edit I misread your post, you said substring, and I assumed you meant you were searching a string. This will still work if you read the file into a string.

like image 74
Benjamin Lindley Avatar answered Nov 08 '22 04:11

Benjamin Lindley


I know an answer has been accepted, but this will also work, and will save you having to load in the file to a string..

#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>

using namespace std;

int main(void)
{
  const char foo[] = "foo";
  const size_t s_len = sizeof(foo) - 1; // ignore \0
  char block[s_len] = {0};

  ifstream f_in(<some file>);

  vector<size_t> f_pos;

  while(f_in.good())
  {
    fill(block, block + s_len, 0); // pedantic I guess..
    size_t cpos = f_in.tellg();
    // Get block by block..
    f_in.read(block, s_len);
    if (equal(block, block + s_len, foo))
    {
      f_pos.push_back(cpos);
    }
    else
    {
      f_in.seekg(cpos + 1); // rewind
    }
  }
}
like image 32
Nim Avatar answered Nov 08 '22 04:11

Nim