NumPy sum one array based on values in another array for each matching element in 3rd array

Question

I have two numpy arrays, one containing values and one containing each values category.

values=np.array([1,2,3,4,5,6,7,8,9,10])
valcats=np.array([101,301,201,201,102,302,302,202,102,301])

I have another array containing the unique categories I'd like to sum across.

categories=np.array([101,102,201,202,301,302])

My issue is that I will be running this same summing process a few billion times and every microsecond matters.

My current implementation is as follows.

catsums=[]
for x in categories:
    catsums.append(np.sum(values[np.where(valcats==x)]))

The resulting catsums should be:

[1, 14, 7, 8, 12, 13]

My current run time is about 5 µs. I am somewhat new still to Python and was hoping to find a fast solution by potentially combining the first two arrays or lamdba or something cool I don't even know about.

Thanks for reading!

what is your expected output considering the example you gave? — piRSquared
– piRSquared, Commented Jul 23, 2017 at 16:26
upvoted your question , You have 15 rep now ,feel free to upvote and accept @piRSquared answer — BENY
– BENY, Commented Jul 23, 2017 at 16:32

Divakar · Accepted Answer · 2017-07-23 16:43:39Z

8

You can use searchsorted and bincount -

np.bincount(np.searchsorted(categories, valcats), values)

answered Jul 23, 2017 at 16:43

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

piRSquared Over a year ago

What would you add if the categories array wasn't already sorted?

Divakar Over a year ago

@piRSquared I would sort it and then feed it into the solution.

piRSquared Over a year ago

I was thinking more along the lines of passing the sorter parameter... , sorter=categories.argsort()

Divakar Over a year ago

@piRSquared There's a lot of overhead involved. Don't think its worth it for a case like this.

Divakar Over a year ago

@hrschbck You can do : ids = np.searchsorted(categories, valcats) and then df.groupby('ids')['values'].max().values, if you were looking for an easy way. For sum, you would reuse ids : np.bincount(ids, values).

piRSquared · Accepted Answer · 2017-07-23 16:46:37Z

8

@Divakar just posted a very good answer. If you already have the array of categories defined, I'd use @Divakar's answer. If you don't have unique values already define, I'd use mine.

I'd use pd.factorize to factorize the categories. Then use np.bincount with weights parameter set to be the values array

f, u = pd.factorize(valcats)
np.bincount(f, values).astype(values.dtype)

array([ 1, 12,  7, 14, 13,  8])

pd.factorize also produces the unique values in the u variable. We can line up the results with u to see that we've arrived at the correct solution.

np.column_stack([u, np.bincount(f, values).astype(values.dtype)])

array([[101,   1],
       [301,  12],
       [201,   7],
       [102,  14],
       [302,  13],
       [202,   8]])

You can make this more obvious using a pd.Series

f, u = pd.factorize(valcats)
pd.Series(np.bincount(f, values).astype(values.dtype), u)

101     1
301    12
201     7
102    14
302    13
202     8
dtype: int64

Why pd.factorize and not np.unique?

We could have done this equivalently with

 u, f = np.unique(valcats, return_inverse=True)

But, np.unique sorts the values and that runs in nlogn time. On the other hand pd.factorize does not sort and runs in linear time. For larger data sets, pd.factorize will dominate performance.

edited Jul 23, 2017 at 16:46

answered Jul 23, 2017 at 16:29

piRSquared

296k68 gold badges509 silver badges654 bronze badges

3 Comments

BENY Over a year ago

Nice solution ~ +1

piRSquared Over a year ago

Thank you @Wen (-:

piRSquared Over a year ago

That is a new question... and I have an answer for you.

Collectives™ on Stack Overflow

NumPy sum one array based on values in another array for each matching element in 3rd array

2 Answers 2

5 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related