Fast array manipulation based on element inclusion in binary matrix

Question

For a large set of randomly distributed points in a 2D lattice, I want to efficiently extract a subarray, which contains only the elements that, approximated as indices, are assigned to non-zero values in a separate 2D binary matrix. Currently, my script is the following:

lat_len = 100 # lattice length
input = np.random.random(size=(1000,2)) * lat_len
binary_matrix = np.random.choice(2, lat_len * lat_len).reshape(lat_len, -1)

def landed(input):
    output = []
    input_as_indices = np.floor(input)
    for i in range(len(input)):
        if binary_matrix[input_as_indices[i,0], input_as_indices[i,1]] == 1:
            output.append(input[i])
    output = np.asarray(output)
    return output

However, I suspect there must be a better way of doing this. The above script can take quite long to run for 10000 iterations.

rth · Accepted Answer · 2015-06-22 12:34:07Z

4

You are correct. The calculation above, can be be done more efficiently without a for loop in python using advanced numpy indexing,

def landed2(input):
    idx = np.floor(input).astype(np.int)
    mask = binary_matrix[idx[:,0], idx[:,1]] == 1
    return input[mask]

res1 = landed(input)
res2 = landed2(input)
np.testing.assert_allclose(res1, res2)

this results in a ~150x speed-up.

edited Jun 22, 2015 at 12:34

answered Jun 21, 2015 at 19:44

rth

11.3k7 gold badges58 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

neither-nor Over a year ago

Incredible. Can you explain what change made the most difference in performance, e.g., the use of int?

rth Over a year ago

The difference in performance is due to the fact that we avoid the for loop in python, and use advanced numpy indexing instead (I added a link above) that is coded more efficiently in C. Casting to integers is only a side effect, since indexes can't have a float dtype and must be either integers or boolean.

Community · Accepted Answer · 2017-05-23 12:21:43Z

3

It seems you can squeeze in a noticeable performance boost if you work with linearly indexed arrays. Here's a vectorized implementation to solve our case, similar to @rth's answer, but using linear indexing -

# Get floor-ed indices
idx = np.floor(input).astype(np.int)

# Calculate linear indices 
lin_idx = idx[:,0]*lat_len + idx[:,1]

# Index raveled/flattened version of binary_matrix with lin_idx
# to extract and form the desired output
out = input[binary_matrix.ravel()[lin_idx] ==1]

Thus, in short we have:

out = input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]

Runtime tests -

This section compares the proposed approach in this solution against the other solution that uses row-column indexing.

Case #1(Original datasizes):

In [62]: lat_len = 100 # lattice length
    ...: input = np.random.random(size=(1000,2)) * lat_len
    ...: binary_matrix = np.random.choice(2, lat_len * lat_len).
                                             reshape(lat_len, -1)
    ...: 

In [63]: idx = np.floor(input).astype(np.int)

In [64]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
10000 loops, best of 3: 121 µs per loop

In [65]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
10000 loops, best of 3: 103 µs per loop

Case #2(Larger datasizes):

In [75]: lat_len = 1000 # lattice length
    ...: input = np.random.random(size=(100000,2)) * lat_len
    ...: binary_matrix = np.random.choice(2, lat_len * lat_len).
                                             reshape(lat_len, -1)
    ...: 

In [76]: idx = np.floor(input).astype(np.int)

In [77]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
100 loops, best of 3: 18.5 ms per loop

In [78]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
100 loops, best of 3: 13.1 ms per loop

Thus, the performance boost with this linear indexing seems to be about 20% - 30%.

edited May 23, 2017 at 12:21

CommunityBot

11 silver badge

answered Jun 28, 2015 at 13:20

Divakar

222k19 gold badges273 silver badges374 bronze badges

3 Comments

rayryeng Over a year ago

When you're not busy answering numpy questions, come visit us in the MATLAB chat room :) We miss you! chat.stackoverflow.com/rooms/81987/matlab

Divakar Over a year ago

@rayryeng Nice place for MATLABans! Oops! I didn't mean "Bans" , meant more like MATLAB people! I guess I will come back when there are more people in there :)

rayryeng Over a year ago

MATLABians :) ok I'll see you there eventually!

Collectives™ on Stack Overflow

Fast array manipulation based on element inclusion in binary matrix

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related