I am trying to find a vectorized way to accomplish the follwing:
Say I have an array of x and y values. Note that the x values are not always ints and CAN be negative:
import numpy as np
x = np.array([-1,-1,-1,3,2,2,2,5,4,4], dtype=float)
y = np.array([0,1,0,1,0,1,0,1,0,1])
I want to group the y array by the sorted, unique values of the x array and summarize the counts for each y class. So the example above would look like this:
array([[ 2., 1.],
[ 2., 1.],
[ 0., 1.],
[ 1., 1.],
[ 0., 1.]])
Where the first column represents the count of '0' values for each unique value of x and the second column represents the count of '1' values for each unique value of x.
My current implementation looks like this:
x_sorted, y_sorted = x[x.argsort()], y[x.argsort()]
def collapse(x_sorted, y_sorted):
uniq_ids = np.unique(x_sorted, return_index=True)[1]
y_collapsed = np.zeros((len(uniq_ids), 2))
x_collapsed = x_sorted[uniq_ids]
for idx, y in enumerate(np.split(y_sorted, uniq_ids[1:])):
y_collapsed[idx,0] = (y == 0).sum()
y_collapsed[idx,1] = (y == 1).sum()
return (x_collapsed, y_collapsed)
collapse(x_sorted, y_sorted)
(array([-1, 2, 3, 4, 5]),
array([[ 2., 1.],
[ 2., 1.],
[ 0., 1.],
[ 1., 1.],
[ 0., 1.]]))
This doesn't seem very much in the spirit of numpy, however, and I'm hoping some vectorized method exists for this kind of operation. I am trying to do this without resorting to pandas. I know that library has a very convenient groupby operation.