I have a table like the one below and would like to calculate the different combinations of factors present. example the number of time all are present (1 indicates presence and 0 is for absence). number of time first is absent but rest are present, number of time second is absent but others are present and and also for doubles and triples absent and rest being present.
In shell it is quite simple to check the number of time all are present
awk '{if (($2 == 1) && ($3==1) && ($4==1) && ($5==1) && ($6==1)) print $1}'ALL_Freq_motif_AE_Uper
but the problem is of computing all possible combinations present.
the table looks like this:
CEBP HEB TAL1 RUNX1 SPI1
1 1 1 1 1
0 1 1 1 1
1 1 0 0 1
1 1 1 1 0
0 0 0 1 1
Now different combination arises from this table
1 combination where all are present.
2 first is absent and all others are present
3 last is absent but others are present
4 third and fourth are absent but others are present
5 first three absent but others are present.
In a table like this which has a fixed number of columns and n number of rows, how can I compute these combinations of presence and absence?
Kindly help.
Thank you
Assuming that data
contains your data, this could do the job:
with open("data") as f:
lines=[line.strip().split() for line in f]
combinations={}
for combination in lines[1:]:
key=", ".join([lines[0][i]
for i in xrange(len(combination))
if combination[i] != '0'])
combinations[key]=combinations.setdefault(key, 0)+1
for key, value in combinations.iteritems():
print value, '\t', key
or, using the collections module:
import collections
with open("data") as f:
lines=[line.strip().split() for line in f]
combinations=collections.Counter(
", ".join(lines[0][i]
for i in xrange(len(combination))
if combination[i] != '0')
for combination in lines[1:])
for key, value in combinations.iteritems():
print value, '\t', key
EDIT: Another version saving resources using a generator expression
import collections
with open("data") as f:
lines=(line.strip().split() for line in f)
header=next(lines)
combinations=collections.Counter(
", ".join(header[i]
for i in xrange(len(combination))
if combination[i] != '0')
for combination in lines)
for key, value in combinations.iteritems():
print value, '\t', key
I'm sure this could be improved.
A Perl program that counts all combinations as if binary numbers. I repeated a few rows to make sure the counting worked.
use strict;
use warnings;
use Bit::Vector;
# CEBP HEB TAL1 RUNX1 SPI1
my @factors = (
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[0, 1, 1, 1, 1],
[1, 1, 0, 0, 1],
[1, 1, 1, 1, 0],
[0, 0, 0, 1, 1],
[0, 0, 0, 1, 1],
[0, 0, 0, 1, 1],
);
my %combo;
for my $row (@factors) {
my $v = Bit::Vector->new_Bin(32, join('', @$row))->to_Dec;
$combo{$v}++;
}
for my $v (sort keys %combo) {
printf "Result: %3d %5s Count: %d\n",
$v,
Bit::Vector->new_Dec(5, $v)->to_Bin,
$combo{$v}
;
}
Output:
Result: 15 01111 Count: 1
Result: 25 11001 Count: 1
Result: 3 00011 Count: 3
Result: 30 11110 Count: 1
Result: 31 11111 Count: 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With