Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ : generate all subsets from set with one condition

Tags:

c++

I’m trying to write code that generate all subsets from set with one condition like If I have threshold =2, and three set:

1, 2, 3, 4, 5
1,3,5
1,3,4

Then the program will output:

The generating set at the first iteration:

1 = number of frequency = 3
2 = number of frequency = 1
3 = number of frequency = 3
4 = number of frequency = 2
5= number of frequency = 2

Since the frequency of number 2 < threshold, I will exclude this set form any further superset,

The generating set at the second iteration:

1,3 = number of frequency = 3
1,4 = number of frequency = 2
1,5 = number of frequency = 2
3,4 = number of frequency = 2
3,5= number of frequency = 2
4,5= number of frequency = 1

Since the frequency of number (4,5) < threshold, I will exclude this set form any further superset,

The generating set at the third iteration

1,3,4= number of frequency = 2
1,3,5= number of frequency = 2

The generating set at the fourth iteration

No more superset because (4,5) < threshold we can’t generate (1,3,4,5)

I wrote the program, and I already generate all the subset, but Failed in two things:

  • I couldn't search in the map std::map <int,std::pair<list<int>, int>> CandList to count the similar set (number of frequency)
  • I couldn't figure out how to apply the condition

Please I appreciate any help.

This my code:

int threshold = 2;
std::vector<std::list<int>> data;
std::map<int, int> FISupp;
typedef std::pair<list<int>, int> combo;
std::map <int,combo> CandList;
std::list<int> FrqList;



/*
input:Threshold =2, and data=
1 2 3 4 5
1 3 4 5
1 2 3 5
1 3

at first scan after PassOne function:
FISupp(1,4)
FISupp(2,2)
FISupp(3,4)
FISupp(4,4)
FISupp(5,3)

at k scan after Passk function:
---
*/
int Lsize = 2; // Level size

void ScanData()
{
    ifstream in;
    in.open("mydata.txt");
    /* mydata.txt
    1 2 3 4 5
    1 3 4 5
    1 2 3 5
    1 3
    */
    std::string line;
    int i = 0;

    while (std::getline(in, line))
    {
        std::stringstream Sline1(line);
        std::stringstream ss(line);
        std::list<int> inner;
        int info;

        while (ss >> info)
            inner.push_back(info);

        data.push_back(inner);
    }
}


/* first pass to generate first Candidates items */
void PassOne()
{
    for (unsigned i = 0; i < data.size(); ++i)
    {
        std::list<int>::iterator li;

        for (li = data[i].begin(); li != data[i].end(); ++li)
            FISupp[*li] += 1;
    }


    /*update the FFISupp by erasing all first Candidates items  with support < Threshold*/

    std::map<int, int> ::iterator current = FISupp.begin();

    std::list<int> ls; /* save Candidates itemes with support < Threshold*/
    while (current != FISupp.end())
    {
        if (current->second < threshold)
        {
            ls.push_back(current->first);
            current = FISupp.erase(current);
        }
        else
            ++current;
    }


    /*update the the orginal data by erasing all first Candidates items  with support < Threshold*/
    for (unsigned i = 0; i < data.size(); ++i)
    {
        std::list<int>::iterator li;
        std::list<int>::iterator item = ls.begin();

        while (item != ls.end())
        {
            for (li = data[i].begin(); li != data[i].end(); ++li)
            {
                if (*li == *item)
                {
                    li = data[i].erase(li);
                    break;
                }
            }
            ++item;
        }

    }


}


void FrequentItem(list<int> l,   int indx)
{
    int a = 0;
    for (list<int>::iterator it = l.begin(); it != l.end(); ++it)
    {
        //std::list <int> &m2 = CandList[indx].first;

        //auto itr = m2.find(*it);

        //auto itr = std::find(CandList.begin(), CandList.end(), *it);

        auto itr = CandList.find(*it);
        if (itr != CandList.end())
        {
            a += CandList[indx].second;
            CandList[indx].first.push_back(*it);
            CandList[indx].second = a;
        }

    }

}

int ind = 0;
void Passk(int j, std::list<int>::iterator Itm , int q = 0)
{

    if (Lsize == q)
    {
        FrequentItem(FrqList, ind);
        ++ind;
        return;
    }

    else
    {

        for (std::list<int>::iterator Itm2 = Itm; Itm2 != data[j].end(); ++Itm2)
        {
                FrqList.push_back(*Itm2);
                Passk(j,  ++Itm2, q + 1);
                FrqList.pop_back();
                --Itm2;

        }

    }


}



void main(int argc, char *argv[])
{
    int temp = 0;
    int j = -1;

    ScanData();
    PassOne();

    while (Lsize <= data.size()) // How to stop the loop when there is no more candidate >= threshold???
    {
        for (unsigned i = 0; i < data.size(); ++i)
        {
            std::list<int>::iterator items = data[i].begin();
            Passk(++j, items);  
        }

        j = -1;
        ++ Lsize;

    }

    data.clear();
    system("PAUSE");
    return;
}
like image 531
phoenix Avatar asked Oct 21 '22 07:10

phoenix


1 Answers

Ok, I'll try for an answer. But first the assumptions:

  • You are working with ordered sets, i.e. the elements are strictly ascending.
  • You consider "normal" sets, i.e. no multi-sets where duplicate elements might appear.
  • Both assumptions might easily be relaxed, but I'll use this as base.

For this scenario, it might be more natural to encode your sets by bit-vectors (for instance using std::vector<bool> or boost::dynamic_bitset<>). In such a bit-vector, if the i-th element is set, it means that the number i is present in the set.

For example, your three sets are represented by this

1 1 1 1 1
1 0 1 0 1
1 0 1 1 0

Iteration 1: In your first iteration, you simply have to sum the elements, which is rather easy in this representation. One obtains

    1 1 1 1 1
    1 0 1 0 1
    1 0 1 1 0
   -----------
    3 1 3 2 2

Next you discard all elements below your threshold, which simply amounts to setting the second row to zero:

    1 0 1 1 1
    1 0 1 0 1
    1 0 1 1 0

Iteration K: Here, you count the occurence of all K-subsets, and discard them if their number is smaller than threshold. That is, formally, you generate the K-stencils

{ 1 1 0 0 0, 1 0 1 0 0, ... , 0 0 0 1 1}   (for K=2)
{ 1 1 1 0 0, 1 1 0 1 0, ... , 0 0 1 1 1}   (for K=3)

and so on. For each of these K-stencils, you count the occurence and eventually discard (note that K may also be one). So, you have three tasks, namely

  1. Generation: is obtained by permutation of the initial bitvector {1 ... 1 0 ... 0}, where K elements are ordered to the left.

  2. Counting: loop over the vectors of your set, and use bitwise and to check whether the current vector contains the stencil. For example: 1 0 1 1 1 & 0 0 0 1 1 == 0 0 0 1 1?.

  3. Discarding: apply the inversed stencil via bitwise and (inversion is done via flip()). This will remove the relevant subsets. Finally discard any subset smaller than the number of iterations (e.g., in iteration 3, subset of size 2 are removed).

Here is an implemetation using primarily boost::dynamic_bitset<>, but std::vector<bool> for the permutations (that is because I didn't want to code a permutation by myself, but this can certainly be improved). Note that there are no maps or other more sophisticated storage schemes:

#include<vector>
#include<iostream>
#include<algorithm>
#include<string>
#include<sstream>
#include<boost/dynamic_bitset.hpp>

//only for nice output of a bitset
std::string screenOutput(const boost::dynamic_bitset<>& bits)
{
    int n=bits.size();
    std::string ret;
    for(int i=n-1;i>=0;--i)
    {
        if(bits[i])
        {
           std::stringstream out;
           out<<i+1<<" ";
           ret=out.str()+ret;
        }
    }
    return "{"+ret+"}";
}

//function implementing the actual logic
void removeSubsets(std::vector<boost::dynamic_bitset<> > &currentSet, size_t K, size_t thresh)
{
    size_t n=currentSet.front().size();

    //create initial stencil {1 ... 1 0 ... 0}
    std::vector<bool> stencil(n);
    for(size_t i=0;i<K;++i)
        stencil[i]=true;

    //apply permutations to initial stencil
    do
    {
         //generate dynamic_bitset from permuted vector<bool>
         boost::dynamic_bitset<> temp(n);
         for(size_t i=0;i<n;++i)
              temp[i]=stencil[i];

         //count the occurence of the stencil
         size_t count=0;
         for(size_t j=0;j<currentSet.size();++j)
         {
              if((currentSet[j] & temp) == temp)
                 ++count;
         }

         //remove if at least one and less than thresh is found
         if(count<thresh && count>0)
         {
              boost::dynamic_bitset<> tempFlip=temp;
              tempFlip.flip();
              for(size_t j=0;j<currentSet.size();++j)
              {
                    //remove stencil from all bitset which contain it
                    if((currentSet[j] & temp) == temp)
                      currentSet[j]= (currentSet[j] & tempFlip);
              }
         }
    }
    while(std::prev_permutation(stencil.begin(),stencil.end()));

    //further remove all supersets which contain less than K elements
    for(size_t j=0;j<currentSet.size();++j)
         if(currentSet[j].count()<K)
         {
               currentSet[j]=boost::dynamic_bitset<>(n,0);
         }
}

The code can be used like this:

int main()
{
    //initialize set of three bit-vectors (all elements to true)
    std::vector<boost::dynamic_bitset<> > yourSet(3, boost::dynamic_bitset<>(5, (1<<5)-1) );

    //set corresponding elements to false
    yourSet[1][1]=false;
    yourSet[1][3]=false;
    yourSet[2][1]=false;
    yourSet[2][4]=false;

    std::cout<<"Original sets"<<std::endl;
    for(size_t i=0;i<3;++i)
        std::cout<<screenOutput(yourSet[i])<<std::endl;
    std::cout<<std::endl;

    removeSubsets(yourSet, 1, 2);
    std::cout<<"After iteration 1:"<<std::endl;
    for(size_t i=0;i<3;++i)
        std::cout<<screenOutput(yourSet[i])<<std::endl;
    std::cout<<std::endl;

    removeSubsets(yourSet, 2, 2);
    std::cout<<"After iteration 2:"<<std::endl;
    for(size_t i=0;i<3;++i)
        std::cout<<screenOutput(yourSet[i])<<std::endl;
    std::cout<<std::endl;

    removeSubsets(yourSet, 3, 2);
    std::cout<<"After iteration 3:"<<std::endl;
    for(size_t i=0;i<3;++i)
        std::cout<<screenOutput(yourSet[i])<<std::endl;
    std::cout<<std::endl;
}

It outputs:

Original set:
{1 2 3 4 5}
{1 3 5}
{1 3 4}

After iteration 1:
{1 3 4 5}
{1 3 5}
{1 3 4}

After iteration 2:
{}
{1 3 5}
{1 3 4}

After iteration 3:
{}
{}
{}

Man, I obviously have too much time.


EDIT: corrected code. Still you have to check whether it actually brings what you wanted.

like image 190
davidhigh Avatar answered Oct 23 '22 04:10

davidhigh