I’m trying to write code that generate all subsets from set with one condition like If I have threshold =2, and three set:
1, 2, 3, 4, 5
1,3,5
1,3,4
Then the program will output:
The generating set at the first iteration:
1 = number of frequency = 3
2 = number of frequency = 1
3 = number of frequency = 3
4 = number of frequency = 2
5= number of frequency = 2
Since the frequency of number 2 < threshold, I will exclude this set form any further superset,
The generating set at the second iteration:
1,3 = number of frequency = 3
1,4 = number of frequency = 2
1,5 = number of frequency = 2
3,4 = number of frequency = 2
3,5= number of frequency = 2
4,5= number of frequency = 1
Since the frequency of number (4,5) < threshold, I will exclude this set form any further superset,
The generating set at the third iteration
1,3,4= number of frequency = 2
1,3,5= number of frequency = 2
The generating set at the fourth iteration
No more superset because (4,5) < threshold we can’t generate (1,3,4,5)
I wrote the program, and I already generate all the subset, but Failed in two things:
std::map <int,std::pair<list<int>,
int>> CandList
to count the similar set (number of frequency)Please I appreciate any help.
This my code:
int threshold = 2;
std::vector<std::list<int>> data;
std::map<int, int> FISupp;
typedef std::pair<list<int>, int> combo;
std::map <int,combo> CandList;
std::list<int> FrqList;
/*
input:Threshold =2, and data=
1 2 3 4 5
1 3 4 5
1 2 3 5
1 3
at first scan after PassOne function:
FISupp(1,4)
FISupp(2,2)
FISupp(3,4)
FISupp(4,4)
FISupp(5,3)
at k scan after Passk function:
---
*/
int Lsize = 2; // Level size
void ScanData()
{
ifstream in;
in.open("mydata.txt");
/* mydata.txt
1 2 3 4 5
1 3 4 5
1 2 3 5
1 3
*/
std::string line;
int i = 0;
while (std::getline(in, line))
{
std::stringstream Sline1(line);
std::stringstream ss(line);
std::list<int> inner;
int info;
while (ss >> info)
inner.push_back(info);
data.push_back(inner);
}
}
/* first pass to generate first Candidates items */
void PassOne()
{
for (unsigned i = 0; i < data.size(); ++i)
{
std::list<int>::iterator li;
for (li = data[i].begin(); li != data[i].end(); ++li)
FISupp[*li] += 1;
}
/*update the FFISupp by erasing all first Candidates items with support < Threshold*/
std::map<int, int> ::iterator current = FISupp.begin();
std::list<int> ls; /* save Candidates itemes with support < Threshold*/
while (current != FISupp.end())
{
if (current->second < threshold)
{
ls.push_back(current->first);
current = FISupp.erase(current);
}
else
++current;
}
/*update the the orginal data by erasing all first Candidates items with support < Threshold*/
for (unsigned i = 0; i < data.size(); ++i)
{
std::list<int>::iterator li;
std::list<int>::iterator item = ls.begin();
while (item != ls.end())
{
for (li = data[i].begin(); li != data[i].end(); ++li)
{
if (*li == *item)
{
li = data[i].erase(li);
break;
}
}
++item;
}
}
}
void FrequentItem(list<int> l, int indx)
{
int a = 0;
for (list<int>::iterator it = l.begin(); it != l.end(); ++it)
{
//std::list <int> &m2 = CandList[indx].first;
//auto itr = m2.find(*it);
//auto itr = std::find(CandList.begin(), CandList.end(), *it);
auto itr = CandList.find(*it);
if (itr != CandList.end())
{
a += CandList[indx].second;
CandList[indx].first.push_back(*it);
CandList[indx].second = a;
}
}
}
int ind = 0;
void Passk(int j, std::list<int>::iterator Itm , int q = 0)
{
if (Lsize == q)
{
FrequentItem(FrqList, ind);
++ind;
return;
}
else
{
for (std::list<int>::iterator Itm2 = Itm; Itm2 != data[j].end(); ++Itm2)
{
FrqList.push_back(*Itm2);
Passk(j, ++Itm2, q + 1);
FrqList.pop_back();
--Itm2;
}
}
}
void main(int argc, char *argv[])
{
int temp = 0;
int j = -1;
ScanData();
PassOne();
while (Lsize <= data.size()) // How to stop the loop when there is no more candidate >= threshold???
{
for (unsigned i = 0; i < data.size(); ++i)
{
std::list<int>::iterator items = data[i].begin();
Passk(++j, items);
}
j = -1;
++ Lsize;
}
data.clear();
system("PAUSE");
return;
}
Ok, I'll try for an answer. But first the assumptions:
For this scenario, it might be more natural to encode your sets by bit-vectors (for instance using std::vector<bool>
or boost::dynamic_bitset<>
). In such a bit-vector, if the i
-th element is set, it means that the number i
is present in the set.
For example, your three sets are represented by this
1 1 1 1 1
1 0 1 0 1
1 0 1 1 0
Iteration 1: In your first iteration, you simply have to sum the elements, which is rather easy in this representation. One obtains
1 1 1 1 1
1 0 1 0 1
1 0 1 1 0
-----------
3 1 3 2 2
Next you discard all elements below your threshold, which simply amounts to setting the second row to zero:
1 0 1 1 1
1 0 1 0 1
1 0 1 1 0
Iteration K: Here, you count the occurence of all K-subsets, and discard them if their number is smaller than threshold. That is, formally, you generate the K-stencils
{ 1 1 0 0 0, 1 0 1 0 0, ... , 0 0 0 1 1} (for K=2)
{ 1 1 1 0 0, 1 1 0 1 0, ... , 0 0 1 1 1} (for K=3)
and so on. For each of these K
-stencils, you count the occurence and eventually discard (note that K
may also be one). So, you have three tasks, namely
Generation: is obtained by permutation of the initial bitvector {1 ... 1 0 ... 0}
, where K
elements are ordered to the left.
Counting: loop over the vectors of your set, and use bitwise and
to check whether the current vector contains the stencil. For example: 1 0 1 1 1 & 0 0 0 1 1 == 0 0 0 1 1
?.
Discarding: apply the inversed stencil via bitwise and
(inversion is done via flip()
). This will remove the relevant subsets. Finally discard any subset smaller than the number of iterations (e.g., in iteration 3, subset of size 2 are removed).
Here is an implemetation using primarily boost::dynamic_bitset<>
, but std::vector<bool>
for the permutations (that is because I didn't want to code a permutation by myself, but this can certainly be improved). Note that there are no maps or other more sophisticated storage schemes:
#include<vector>
#include<iostream>
#include<algorithm>
#include<string>
#include<sstream>
#include<boost/dynamic_bitset.hpp>
//only for nice output of a bitset
std::string screenOutput(const boost::dynamic_bitset<>& bits)
{
int n=bits.size();
std::string ret;
for(int i=n-1;i>=0;--i)
{
if(bits[i])
{
std::stringstream out;
out<<i+1<<" ";
ret=out.str()+ret;
}
}
return "{"+ret+"}";
}
//function implementing the actual logic
void removeSubsets(std::vector<boost::dynamic_bitset<> > ¤tSet, size_t K, size_t thresh)
{
size_t n=currentSet.front().size();
//create initial stencil {1 ... 1 0 ... 0}
std::vector<bool> stencil(n);
for(size_t i=0;i<K;++i)
stencil[i]=true;
//apply permutations to initial stencil
do
{
//generate dynamic_bitset from permuted vector<bool>
boost::dynamic_bitset<> temp(n);
for(size_t i=0;i<n;++i)
temp[i]=stencil[i];
//count the occurence of the stencil
size_t count=0;
for(size_t j=0;j<currentSet.size();++j)
{
if((currentSet[j] & temp) == temp)
++count;
}
//remove if at least one and less than thresh is found
if(count<thresh && count>0)
{
boost::dynamic_bitset<> tempFlip=temp;
tempFlip.flip();
for(size_t j=0;j<currentSet.size();++j)
{
//remove stencil from all bitset which contain it
if((currentSet[j] & temp) == temp)
currentSet[j]= (currentSet[j] & tempFlip);
}
}
}
while(std::prev_permutation(stencil.begin(),stencil.end()));
//further remove all supersets which contain less than K elements
for(size_t j=0;j<currentSet.size();++j)
if(currentSet[j].count()<K)
{
currentSet[j]=boost::dynamic_bitset<>(n,0);
}
}
The code can be used like this:
int main()
{
//initialize set of three bit-vectors (all elements to true)
std::vector<boost::dynamic_bitset<> > yourSet(3, boost::dynamic_bitset<>(5, (1<<5)-1) );
//set corresponding elements to false
yourSet[1][1]=false;
yourSet[1][3]=false;
yourSet[2][1]=false;
yourSet[2][4]=false;
std::cout<<"Original sets"<<std::endl;
for(size_t i=0;i<3;++i)
std::cout<<screenOutput(yourSet[i])<<std::endl;
std::cout<<std::endl;
removeSubsets(yourSet, 1, 2);
std::cout<<"After iteration 1:"<<std::endl;
for(size_t i=0;i<3;++i)
std::cout<<screenOutput(yourSet[i])<<std::endl;
std::cout<<std::endl;
removeSubsets(yourSet, 2, 2);
std::cout<<"After iteration 2:"<<std::endl;
for(size_t i=0;i<3;++i)
std::cout<<screenOutput(yourSet[i])<<std::endl;
std::cout<<std::endl;
removeSubsets(yourSet, 3, 2);
std::cout<<"After iteration 3:"<<std::endl;
for(size_t i=0;i<3;++i)
std::cout<<screenOutput(yourSet[i])<<std::endl;
std::cout<<std::endl;
}
It outputs:
Original set:
{1 2 3 4 5}
{1 3 5}
{1 3 4}
After iteration 1:
{1 3 4 5}
{1 3 5}
{1 3 4}
After iteration 2:
{}
{1 3 5}
{1 3 4}
After iteration 3:
{}
{}
{}
Man, I obviously have too much time.
EDIT: corrected code. Still you have to check whether it actually brings what you wanted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With