How to sum the values of an array based on reference values from another array

Question

How can I have sums of an a array based on the unique values of an b array, assuming both have same dimension and shape?

In other words, I expect to have an output consisting of the sums of array b for each value of array a. (In the example below: sum for value 1 = xxx, sum for value 2 = yyy... sum for value 11 = zzz)

a = [[ 5  1 10 11  6]
     [ 5  3  8 10  9]
     [ 2  1 10  8  7]
     [ 7 10  7  8 11]
     [10 10  3  0 11]]
b = [[508 220 316 557 737]
    [625 419 161 736 426]
    [389 608 760 885 232] 
    [396 309 522 204 842]
    [403 831 225 549 797]]

could you rephrase your question, hard to understand.

YOLO
– YOLO

2018-12-23 17:11:22 +00:00
Commented Dec 23, 2018 at 17:11 — YOLO
– YOLO, Commented Dec 23, 2018 at 17:11
ok, post the expected result

RomanPerekhrest
– RomanPerekhrest

2018-12-23 17:13:05 +00:00
Commented Dec 23, 2018 at 17:13 — RomanPerekhrest
– RomanPerekhrest, Commented Dec 23, 2018 at 17:13

Thierry Lathuille · Accepted Answer · 2018-12-23 17:27:13Z

2

You can do that using numpy:

import numpy as np

a = np.array(
    [[ 5,  1, 10, 11,  6],
     [ 5,  3,  8, 10,  9],
     [ 2,  1, 10,  8,  7],
     [ 7, 10,  7,  8, 11],
     [10, 10,  3,  0, 11]])
b = np.array(
    [[508, 220, 316, 557, 737],
    [625, 419, 161, 736, 426],
    [389, 608, 760, 885, 232],
    [396, 309, 522, 204, 842],
    [403, 831, 225, 549, 797]])

values = np.unique(a)
# will be [ 0  1  2  3  5  6  7  8  9 10 11]

out = {}
for value in values:
    out[value] = sum(b[np.where(a==value)])

print(out)
# {0: 549, 1: 828, 2: 389, 3: 644, 5: 1133, 6: 737, 7: 1150, 8: 1250, 9: 426, 10: 3355, 11: 2196}

Or with a dict comprehension, all in one line:

out = {value: sum(b[np.where(a==value)]) for value in np.unique(a)}

answered Dec 23, 2018 at 17:27

Thierry Lathuille

24.4k10 gold badges49 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

B. M. · Accepted Answer · 2018-12-23 19:39:28Z

1

Pandas is a direct and efficient way for such things :

df=pd.DataFrame(data=b.ravel(),index=a.ravel()) 
sums=df.groupby(level=0).sum()

#        0
# 0    549
# 1    828
# 2    389
# 3    644
# 5   1133
# 6    737
# 7   1150
# 8   1250
# 9    426
# 10  3355
# 11  2196

Benchmarks :

a=np.random.randint(0,10**4,size=10**5)
b=np.random.randint(0,10**6,size=10**5)

In [19]: %timeit pd.DataFrame(b,a).groupby(level=0).sum()
58.7 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [20]: %timeit for aa, bb in zip(a,b):result[aa] += bb
223 ms ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [21]: %timeit for value in np.unique(a): out[value] = np.sum(b[np.where(a==value)])
5.67 s ± 933 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Dec 23, 2018 at 19:39

answered Dec 23, 2018 at 17:37

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

1 Comment

renan-brso Over a year ago

Thanks! I'm using the solution you recommended. However, I'm working with a Monte Carlos Simulation (10000 df's are supposed to be created when I run the script, but it is returning memory error. Any idea about how to overcome this problem? I'm using Python 2.7 32bit

Paweł Kordowski · Accepted Answer · 2018-12-23 17:29:32Z

1

Or manually:

from itertools import chain
from collections import defaultdict

a = [[ 5,  1, 10, 11,  6],
     [ 5,  3,  8, 10,  9],
     [ 2,  1, 10,  8,  7],
     [ 7, 10,  7,  8, 11],
     [10, 10,  3,  0, 11]]
b = [[508, 220, 316, 557, 737],
    [625, 419, 161, 736, 426],
    [389, 608, 760, 885, 232],
    [396, 309, 522, 204, 842],
    [403, 831, 225, 549, 797]]

result = defaultdict(int)

for aa, bb in zip(chain(*a), chain(*b)):
    result[aa] += bb

print(result)

#defaultdict(<class 'int'>, {5: 1133, 1: 828, 10: 3355, 11: 2196, 6: 737, 3: 644, 8: 1250, 9: 426, 2: 389, 7: 1150, 0: 549})

answered Dec 23, 2018 at 17:29

Paweł Kordowski

2,7681 gold badge17 silver badges21 bronze badges

Collectives™ on Stack Overflow

How to sum the values of an array based on reference values from another array

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related