Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String and character mapping question for the guru's out there

Here's a problem thats got me stumped (solution wise):

Given a str S, apply character mappings Cm = {a=(m,o,p),d=(q,u),...} and print out all possible combinations using C or C++.

The string can be any length, and the number of character mappings varies, and there won't be any mappings that map to another map (thus avoiding circular dependencies).

As an example: string abba with mappings a=(e,o), d=(g,h), b=(i) would print:

abba,ebba,obba,abbe,abbo,ebbe,ebbo,obbe,obbo,aiba,aiia,abia,eiba,eiia,...... 
like image 436
Gio Avatar asked Feb 10 '10 15:02

Gio


2 Answers

Definitely possible, not really difficult... but this will generate lots of strings that's for sure.

The first thing to remark is that you know how many strings it's going to generate beforehand, so it's easy to do some sanity check :)

The second: it sounds like a recursive solution would be easy (like many traversal problems).

class CharacterMapper
{
public:
  CharacterMapper(): mGenerated(), mMapped()
  {
    for (int i = -128, max = 128; i != max; ++i)
      mMapped[i].push_back(i); // 'a' is mapped to 'a' by default
  }

  void addMapped(char origin, char target)
  {
    std::string& m = mMapped[origin];
    if (m.find(target) == std::string::npos) m.push_back(target);
  } // addMapped

  void addMapped(char origin, const std::string& target)
  {
    for (size_t i = 0, max = target.size(); i != max; ++i) this->addMapped(origin, target[i]);
  } // addMapped

  void execute(const std::string& original)
  {
    mGenerated.clear();
    this->next(original, 0);
    this->sanityCheck(original);
    this->print(original);
  }

private:
  void next(std::string original, size_t index)
  {
    if (index == original.size())
    {
      mGenerated.push_back(original);
    }
    else
    {
      const std::string& m = mMapped[original[index]];
      for (size_t i = 0, max = m.size(); i != max; ++i)
        this->next( original.substr(0, index) + m[i] + original.substr(index+1), index+1 );
    }
  } // next

  void sanityCheck(const std::string& original)
  {
    size_t total = 1;
    for (size_t i = 0, max = original.size(); i != max; ++i)
      total *= mMapped[original[i]].size();

    if (total != mGenerated.size())
      std::cout << "Failure: should have found " << total << " words, found " << mGenerated.size() << std::endl;
  }

  void print(const std::string& original) const
  {
    typedef std::map<char, std::string>::const_iterator map_iterator;
    typedef std::vector<std::string>::const_iterator vector_iterator;

    std::cout << "Original: " << original << "\n";

    std::cout << "Mapped: {";
    for (map_iterator it = mMapped.begin(), end = mMapped.end(); it != end; ++it)
      if (it->second.size() > 1) std::cout << "'" << it->first << "': '" << it->second.substr(1) << "'";
    std::cout << "}\n";

    std::cout << "Generated:\n";
    for (vector_iterator it = mGenerated.begin(), end = mGenerated.end(); it != end; ++it)
      std::cout << "  " << *it << "\n";
  }

  std::vector<std::string> mGenerated;
  std::map<char, std::string> mMapped;
}; // class CharacterMapper


int main(int argc, char* argv[])
{
  CharacterMapper mapper;
  mapper.addMapped('a', "eo");
  mapper.addMapped('d', "gh");
  mapper.addMapped('b', "i");
  mapper.execute("abba");
}

And here is the output:

Original: abba
Mapped: {'a': 'eo''b': 'i''d': 'gh'}
Generated:
  abba
  abbe
  abbo
  abia
  abie
  abio
  aiba
  aibe
  aibo
  aiia
  aiie
  aiio
  ebba
  ebbe
  ebbo
  ebia
  ebie
  ebio
  eiba
  eibe
  eibo
  eiia
  eiie
  eiio
  obba
  obbe
  obbo
  obia
  obie
  obio
  oiba
  oibe
  oibo
  oiia
  oiie
  oiio

Yeah, rather lengthy, but there's a lot that does not directly participate to the computation (initialization, checks, printing). The core methods is next which implements the recursion.

like image 167
Matthieu M. Avatar answered Oct 10 '22 05:10

Matthieu M.


EDIT: This should be the fastest and simplest possible algo. Some may argue with the style or portability; I think this is perfect for an embedded-type thing and I've spent long enough on it already. I'm leaving the original below.

This uses an array for mapping. The sign bit is used to indicate the end of a mapping cycle, so the array type has to be larger than the mapped type if you want to use the full unsigned range.

Generates 231M strings/sec or ~9.5 cycles/string on a 2.2GHz Core2. Testing conditions and usage as below.

#include <iostream>
using namespace std;

int const alphabet_size = CHAR_MAX+1;
typedef int map_t; // may be char or short, small performance penalty
int const sign_bit = 1<< CHAR_BIT*sizeof(map_t)-1;
typedef map_t cmap[ alphabet_size ];

void CreateMap( char *str, cmap &m ) {
    fill( m, m+sizeof(m)/sizeof(*m), 0 );
    char *str_end = strchr( str, 0 ) + 1;
    str_end[-1] = ' '; // space-terminated strings
    char prev = ' ';
    for ( char *pen = str; pen != str_end; ++ pen ) {
        if ( * pen == ' ' ) {
            m[ prev ] |= sign_bit;
            prev = 0;
        }
        m[ * pen ] = * pen;
        if ( prev != ' ' ) swap( m[prev], m[ *pen ] );
        prev = *pen;
    }
    for ( int mx = 0; mx != sizeof(m)/sizeof(*m); ++ mx ) {
        if ( m[mx] == 0 ) m[mx] = mx | sign_bit;
    }
}

bool NextMapping( char *s, char *s_end, cmap &m ) {
    for ( char *pen = s; pen != s_end; ++ pen ) {
        map_t oldc = *pen, newc = m[ oldc ];
        * pen = newc & sign_bit-1;
        if ( newc >= 0 ) return true;
    }
    return false;
}

int main( int argc, char **argv ) {
    uint64_t cnt = 0;
    cmap m;
    CreateMap( argv[1], m );
    char *s = argv[2], *s_end = strchr( s, 0 );
    do {
        ++ cnt;
    } while ( NextMapping( s, s_end, m ) );
    cerr << cnt;
    return 0;
}

ORIGINAL:

Not as short or robust as I'd like, but here's something.

  • Requires that the input string always contain the alphabetically first letter in each replacement set
  • Execute a la maptool 'aeo dgh bi' abbd
  • Output is in reverse-lexicographical order
  • Performance of about 22 cycles/string (100M strings/sec at 2.2 GHz Core2)
    • BUT my platform is trying to be clever with strings, slowing it down
    • If I change it to use char* strings instead, it runs at 142M strings/sec (~15.5 cycles/string)
  • Should be possible to go faster using a char[256] mapping table and another char[256] specifying which chars end a cycle.

The map data structure is an array of nodes linked into circular lists.

#include <iostream>
#include <algorithm>
using namespace std;

enum { alphabet_size = UCHAR_MAX+1 };

struct MapNode {
     MapNode *next;
     char c;
     bool last;
     MapNode() : next( this ), c(0), last(false) {}
};

void CreateMap( string s, MapNode (&m)[ alphabet_size ] ) {
    MapNode *mprev = 0;
    replace( s.begin(), s.end(), ' ', '\0' );
    char *str = const_cast<char*>(s.c_str()), *str_end = str + s.size() + 1;
    for ( char *pen = str; pen != str_end; ++ pen ) {
        if ( mprev == 0 ) sort( pen, pen + strlen( pen ) );
        if ( * pen == 0 ) {
            if ( mprev ) mprev->last = true;
            mprev = 0;
            continue;
        }
        MapNode &mnode = m[ * pen ];
        if ( mprev ) swap( mprev->next, mnode.next ); // link node in
        mnode.c = * pen; // tell it what char it is
        mprev = &mnode;
    }
       // make it easier to tell that a node isn't in any map
    for ( MapNode *mptr = m; mptr != m + alphabet_size; ++ mptr ) {
        if ( mptr->next == mptr ) mptr->next = 0;
    }
}

bool NextMapping( string &s, MapNode (&m)[ alphabet_size ] ) {
    for ( string::iterator it = s.begin(); it != s.end(); ++ it ) {
        MapNode &mnode = m[ * it ];
        if ( mnode.next ) {
            * it = mnode.next->c;
            if ( ! mnode.last ) return true;
        }
    }
    return false;
}

int main( int argc, char **argv ) {
    MapNode m[ alphabet_size ];
    CreateMap( argv[1], m );
    string s = argv[2];
    do {
        cerr << s << endl;
    } while ( NextMapping( s, m ) );
 return 0;
}
like image 45
Potatoswatter Avatar answered Oct 10 '22 07:10

Potatoswatter