Split a numpy array with a binary list in a single operation

Question

I can split an array into two smaller arrays like this:

>>> import numpy as np
>>> a = np.array([1,2,3,4,5])
>>> selector = np.array([True, False, True, True, False])
>>> selected, not_selected = a[selector], a[~ selector]
>>> selected
array([1, 3, 4])
>>> not_selected
array([2, 5])

But, even though I am generating selected and not_selected on the same line, I am (at least I think I am) effectively operating on a twice, once with selector and again with its inverse. How would I generate selected and not_selected using a truly single, and presumably faster, numpy operation? Or is this still the best way to do it?

Divakar · Accepted Answer · 2019-07-08 09:38:11Z

2

If you are open to numba, we can gain some memory efficiency and that transfers to noticeable perf. boost -

from numba import njit

@njit(parallel=True)
def select_numba(a, selector, out1, out2):
    iter1 = 0
    iter2 = 0
    for i,j in zip(a,selector):
        if j:
            out1[iter1] = i
            iter1 += 1
        else:
            out2[iter2] = i
            iter2 += 1
    return out1,out2

def select(a, selector):
    L = np.count_nonzero(selector)
    nL = len(selector)-L
    out1 = np.empty(L, dtype=a.dtype)
    out2 = np.empty(nL, dtype=a.dtype)
    select_numba(a,selector, out1, out2)        
    return out1,out2

Sample run -

In [65]: a = np.array([1,2,3,4,5])
    ...: selector = np.array([True, False, True, True, False])

In [66]: select(a, selector)
Out[66]: (array([1, 3, 4]), array([2, 5]))

Benchmarking on large dataset

In [60]: np.random.seed(0)
    ...: a = np.random.randint(0,9,(100000))
    ...: selector = np.random.rand(len(a))>0.5

In [62]: %timeit selected, not_selected = a[selector], a[~ selector]
1000 loops, best of 3: 1.2 ms per loop

In [63]: %timeit select(a, selector)
1000 loops, best of 3: 454 µs per loop

answered Jul 8, 2019 at 9:38

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David Wallace Over a year ago

I didn't know about numba, so upvoting for that. But I am waiting, if it's possible, for an accepted answer which just uses numpy

Collectives™ on Stack Overflow

Split a numpy array with a binary list in a single operation

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related