Python numpy array row extraction based on another's array column - faster solution required

Question

Two 2D numpy arrays are given (arr_all and arr_sub) whereas the second is a random subset of the first. I need to get the rows of the first one (arr_all) that are not included in the second one (arr_sub) based on an ID in one column that exist in both arrays. e.g.:

arr_all = array([[ x,  y,  z,  id_1],
#        [x,  y,  z,  id_2],
#        [x,  y,  z,  id_3],
#        [x,  y,  z,  id_4],
#        [x,  y,  z,  id_5]])

arr_sub = array([[ x,  y,  z,  id_1],
#        [x,  y,  z,  id_2],
#        [x,  y,  z,  id_5]])

wanted output:

arr_remain = array([[ x,  y,  z,  id_3],
#        [x,  y,  z,  id_4]])

A working solution would be:

list_remain = []
for i in range(len(ds_all)):
if ds_all[i][3] not in ds_trees[:,3]:
    list_remain.append(ds_all[i])

arr_remain = np.array(list_remain)

This solution however is unfortunately only good for a small dataset because of it's horrible slow runtime. Since my original dataset contains over 26 mio rows, this is not sufficient.

I tried to adapt solutions like this, this or this but I didn't manage to add the check if the ID exist in the other arrays column.

Nk03 · Accepted Answer · 2021-07-10 21:41:03Z

0

Here's one way:

arr_remain = arr_all[~np.in1d(arr_all[:,-1], arr_sub[:,-1])]
# or arr_remain = arr_all[~np.isin(arr_all[:,-1], arr_sub[:,-1])]

OUTPUT:

array([['x', 'y', 'z', 'id_3'],
       ['x', 'y', 'z', 'id_4']], dtype='<U4')

edited Jul 10, 2021 at 21:41

answered Jul 10, 2021 at 20:04

Nk03

15k2 gold badges11 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

The dude Over a year ago

thanks a lot! This is way faster. Just one note, my IDE complained about in1d and preferred isin. Seems to be the more recent solution for this task.

Collectives™ on Stack Overflow

Python numpy array row extraction based on another's array column - faster solution required

1 Answer 1

OUTPUT:

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

OUTPUT:

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related