Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python equivalent of R "split"-function

Tags:

python

r

grouping

In R, you could split a vector according to the factors of another vector:

> a <- 1:10
  [1]  1  2  3  4  5  6  7  8  9 10
> b <- rep(1:2,5)
  [1] 1 2 1 2 1 2 1 2 1 2

> split(a,b)

   $`1`
   [1] 1 3 5 7 9
   $`2`
   [1]  2  4  6  8 10

Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).

Is there anything handy in python like that, except from the itertools.groupby approach?

like image 548
dorvak Avatar asked Oct 25 '13 18:10

dorvak


3 Answers

From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)

>>> a = range(1, 11)
>>> b = [0,1] * 5

>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])

Then you can use itertools.compress:

def split(x, f):
    return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))

If you need more general input (multiple numbers), something like the following will return an n-tuple:

def split(x, f):
    count = max(f) + 1
    return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )  

>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])
like image 84
John Spong Avatar answered Nov 08 '22 12:11

John Spong


Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.


Here's one way with itertools.

import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]

{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}

This gives you a dictionary, which is analogous to the named list that you get from R's split.

like image 24
Matthew Plourde Avatar answered Nov 08 '22 11:11

Matthew Plourde


As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:

a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]

from collections import defaultdict
def split(x, f):
    res = defaultdict(list)
    for v, k in zip(x, f):
        res[k].append(v)
    return res

>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})
like image 1
Zelazny7 Avatar answered Nov 08 '22 11:11

Zelazny7