Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python creating new list using a "template list"

Tags:

python

numpy

Suppose i have:

x1 = [1, 3, 2, 4]

and:

x2 = [0, 1, 1, 0]

with the same shape

now i want to "put x2 ontop of x1" and sum up all the numbers of x1 corresponding to the numbers of x2

so the end result is:

end = [1+4 ,3+2]  # end[0] is the sum of all numbers of x1 where a 0 was in x2

this is a naive implementation using list to further clarify the question

store_0 = 0
store_1 = 0
x1 = [1, 3, 4, 2]
x2 = [0, 1, 1, 0]
for value_x1 ,value_x2 in zip(x1 ,x2):
    if value_x2 == 0:
        store_0 += value_x1
    elif value_x2 == 1:
        store_1 += value_x1

so my question: is there is a way to implement this in numpy without using loops or in general just faster?

like image 292
user15770670 Avatar asked Apr 26 '21 18:04

user15770670


People also ask

How to create a list of lists in Python?

To create a list in python we need to use .append method to create a list of lists. After writing the above code (create a list of lists), Ones you will print “List of Lists:”, listoflists” then the output will appear as a “ Lists of Lists: “ [ [101,102, 103], [104, 105]] “. You can refer to the below screenshot for creating a list of lists.

How do I create a list from a custom template?

You’ll know the custom templates not only from the naming but also the “generic” icon for custom lists. And here you go! After hitting “Create”, you’ll get your new list based on the template you selected – and if your template had contents included, you’ll even get them prepopulated.

How to create an empty list in Python?

We can also create an empty list by using the built-in function list () without any arguments. After writing the above code (python create an empty list), once you will print ” list ” then the output will appear as ” [] “. Here, we can see that the empty list has been created.

How do you make a string template in Python?

Python String Template: The Python string Template is created by passing the template string to its constructor. It supports $-based substitutions. This class has 2 key methods: substitute (mapping, **kwds): This method performs substitutions using a dictionary with a process similar to key-based mapping objects.


Video Answer


3 Answers

In this particular example (and, in general, for unique, duplicated, and groupby kinds of operations), pandas is faster than a pure numpy solution:

A pandas way, using Series (credit: very similar to @mcsoini's answer):

def pd_group_sum(x1, x2):
    return pd.Series(x1, index=x2).groupby(x2).sum()

A pure numpy way, using np.unique and some fancy indexing:

def np_group_sum(a, groups):
    _, ix, rix = np.unique(groups, return_index=True, return_inverse=True)
    return np.where(np.arange(len(ix))[:, None] == rix, a, 0).sum(axis=1)

Note: a better pure numpy way is inspired by @Woodford's answer:

def selsum(a, g, e):
    return a[g==e].sum()

vselsum = np.vectorize(selsum, signature='(n),(n),()->()')

def np_group_sum2(a, groups):
    return vselsum(a, groups, np.unique(groups))

Yet another pure numpy way is inspired by a comment from @mapf about using argsort(). That in itself already takes 45ms, but we may try something based on np.argpartition(x2, len(x2)-1) instead, since that takes only 7.5ms by itself on the benchmark below:

def np_group_sum3(a, groups):
    ix = np.argpartition(groups, len(groups)-1)
    ends = np.nonzero(np.diff(np.r_[groups[ix], groups.max() + 1]))[0]
    return np.diff(np.r_[0, a[ix].cumsum()[ends]])

(Slightly modified) example

x1 = np.array([1, 3, 2, 4, 8])  # I added a group for sake of generality
x2 = np.array([0, 1, 1, 0, 7])

>>> pd_group_sum(x1, x2)
0    5
1    5
7    8

>>> np_group_sum(x1, x2)  # and all the np_group_sum() variants
array([5, 5, 8])

Speed

n = 1_000_000
x1 = np.random.randint(0, 20, n)
x2 = np.random.randint(0, 20, n)

%timeit pd_group_sum(x1, x2)
# 13.9 ms ± 65.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np_group_sum(x1, x2)
# 171 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np_group_sum2(x1, x2)
# 66.7 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np_group_sum3(x1, x2)
# 25.6 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Going via pandas is faster, in part because of numpy issue 11136.

like image 140
Pierre D Avatar answered Oct 13 '22 05:10

Pierre D


>>> x1 = np.array([1, 3, 2, 7])
>>> x2 = np.array([0, 1, 1, 0])
>>> for index in np.unique(x2):
>>>     print(f'{index}: {x1[x2==index].sum()}')
0: 8
1: 5
>>> # or in one line
>>> [(index, x1[x2==index].sum()) for index in np.unique(x2)]
[(0, 8), (1, 5)]
like image 5
Woodford Avatar answered Oct 13 '22 04:10

Woodford


Would a pandas one-liner be ok?

store_0, store_1 = pd.DataFrame({"x1": x1, "x2": x2}).groupby("x2").x1.sum()

Or as a dictionary, for arbitrarily many values in x2:

pd.DataFrame({"x1": x1, "x2": x2}).groupby("x2").x1.sum().to_dict()

Output:

{0: 5, 1: 5}
like image 3
mcsoini Avatar answered Oct 13 '22 04:10

mcsoini