Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing the combinations for presence or absence of a factor in a table

I have a table like the one below and would like to calculate the different combinations of factors present. example the number of time all are present (1 indicates presence and 0 is for absence). number of time first is absent but rest are present, number of time second is absent but others are present and and also for doubles and triples absent and rest being present.

In shell it is quite simple to check the number of time all are present

awk '{if (($2 == 1) && ($3==1) && ($4==1) && ($5==1) && ($6==1)) print $1}'ALL_Freq_motif_AE_Uper

but the problem is of computing all possible combinations present.

the table looks like this:

CEBP    HEB     TAL1    RUNX1   SPI1
1       1       1       1       1
0       1       1       1       1
1       1       0       0       1
1       1       1       1       0
0       0       0       1       1

Now different combination arises from this table

1 combination where all are present.
2 first is absent and all others are present
3 last is absent but others are present
4 third and fourth are absent but others are present
5 first three absent but others are present.

In a table like this which has a fixed number of columns and n number of rows, how can I compute these combinations of presence and absence?

Kindly help.

Thank you

like image 841
Angelo Avatar asked Jan 16 '23 21:01

Angelo


2 Answers

Assuming that data contains your data, this could do the job:

with open("data") as f:
        lines=[line.strip().split() for line in f]
combinations={}
for combination in lines[1:]:
        key=", ".join([lines[0][i]
                for i in xrange(len(combination))
                if combination[i] != '0'])
        combinations[key]=combinations.setdefault(key, 0)+1
for key, value in combinations.iteritems():
        print value, '\t', key

or, using the collections module:

import collections

with open("data") as f:
        lines=[line.strip().split() for line in f]

combinations=collections.Counter(
        ", ".join(lines[0][i]
                for i in xrange(len(combination))
                        if combination[i] != '0')
                for combination in lines[1:])

for key, value in combinations.iteritems():
        print value, '\t', key

EDIT: Another version saving resources using a generator expression

import collections

with open("data") as f:
        lines=(line.strip().split() for line in f)
        header=next(lines)
        combinations=collections.Counter(
                ", ".join(header[i]
                        for i in xrange(len(combination))
                                if combination[i] != '0')
                        for combination in lines)
        for key, value in combinations.iteritems():
                print value, '\t', key

I'm sure this could be improved.

like image 155
hochl Avatar answered Jan 29 '23 14:01

hochl


A Perl program that counts all combinations as if binary numbers. I repeated a few rows to make sure the counting worked.

use strict;
use warnings;
use Bit::Vector;

# CEBP       HEB     TAL1      RUNX1   SPI1
my @factors = (
    [1,      1,       1,       1,       1],
    [1,      1,       1,       1,       1],
    [1,      1,       1,       1,       1],
    [0,      1,       1,       1,       1],
    [1,      1,       0,       0,       1],
    [1,      1,       1,       1,       0],
    [0,      0,       0,       1,       1],
    [0,      0,       0,       1,       1],
    [0,      0,       0,       1,       1],
);

my %combo;

for my $row (@factors) {
    my $v = Bit::Vector->new_Bin(32, join('', @$row))->to_Dec;
    $combo{$v}++;
}

for my $v (sort keys %combo) {
    printf "Result: %3d  %5s Count: %d\n", 
        $v, 
        Bit::Vector->new_Dec(5, $v)->to_Bin,
        $combo{$v}
    ;
}

Output:

Result:  15  01111 Count: 1
Result:  25  11001 Count: 1
Result:   3  00011 Count: 3
Result:  30  11110 Count: 1
Result:  31  11111 Count: 3
like image 45
Bill Ruppert Avatar answered Jan 29 '23 16:01

Bill Ruppert