I have two numpy arrays, one containing values and one containing each values category.
values=np.array([1,2,3,4,5,6,7,8,9,10])
valcats=np.array([101,301,201,201,102,302,302,202,102,301])
I have another array containing the unique categories I'd like to sum across.
categories=np.array([101,102,201,202,301,302])
My issue is that I will be running this same summing process a few billion times and every microsecond matters.
My current implementation is as follows.
catsums=[]
for x in categories:
catsums.append(np.sum(values[np.where(valcats==x)]))
The resulting catsums should be:
[1, 14, 7, 8, 12, 13]
My current run time is about 5 µs. I am somewhat new still to Python and was hoping to find a fast solution by potentially combining the first two arrays or lamdba or something cool I don't even know about.
Thanks for reading!