5

For a large set of randomly distributed points in a 2D lattice, I want to efficiently extract a subarray, which contains only the elements that, approximated as indices, are assigned to non-zero values in a separate 2D binary matrix. Currently, my script is the following:

lat_len = 100 # lattice length
input = np.random.random(size=(1000,2)) * lat_len
binary_matrix = np.random.choice(2, lat_len * lat_len).reshape(lat_len, -1)

def landed(input):
    output = []
    input_as_indices = np.floor(input)
    for i in range(len(input)):
        if binary_matrix[input_as_indices[i,0], input_as_indices[i,1]] == 1:
            output.append(input[i])
    output = np.asarray(output)
    return output   

However, I suspect there must be a better way of doing this. The above script can take quite long to run for 10000 iterations.

2 Answers 2

4

You are correct. The calculation above, can be be done more efficiently without a for loop in python using advanced numpy indexing,

def landed2(input):
    idx = np.floor(input).astype(np.int)
    mask = binary_matrix[idx[:,0], idx[:,1]] == 1
    return input[mask]

res1 = landed(input)
res2 = landed2(input)
np.testing.assert_allclose(res1, res2)

this results in a ~150x speed-up.

Sign up to request clarification or add additional context in comments.

2 Comments

Incredible. Can you explain what change made the most difference in performance, e.g., the use of int?
The difference in performance is due to the fact that we avoid the for loop in python, and use advanced numpy indexing instead (I added a link above) that is coded more efficiently in C. Casting to integers is only a side effect, since indexes can't have a float dtype and must be either integers or boolean.
3

It seems you can squeeze in a noticeable performance boost if you work with linearly indexed arrays. Here's a vectorized implementation to solve our case, similar to @rth's answer, but using linear indexing -

# Get floor-ed indices
idx = np.floor(input).astype(np.int)

# Calculate linear indices 
lin_idx = idx[:,0]*lat_len + idx[:,1]

# Index raveled/flattened version of binary_matrix with lin_idx
# to extract and form the desired output
out = input[binary_matrix.ravel()[lin_idx] ==1]

Thus, in short we have:

out = input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]

Runtime tests -

This section compares the proposed approach in this solution against the other solution that uses row-column indexing.

Case #1(Original datasizes):

In [62]: lat_len = 100 # lattice length
    ...: input = np.random.random(size=(1000,2)) * lat_len
    ...: binary_matrix = np.random.choice(2, lat_len * lat_len).
                                             reshape(lat_len, -1)
    ...: 

In [63]: idx = np.floor(input).astype(np.int)

In [64]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
10000 loops, best of 3: 121 µs per loop

In [65]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
10000 loops, best of 3: 103 µs per loop

Case #2(Larger datasizes):

In [75]: lat_len = 1000 # lattice length
    ...: input = np.random.random(size=(100000,2)) * lat_len
    ...: binary_matrix = np.random.choice(2, lat_len * lat_len).
                                             reshape(lat_len, -1)
    ...: 

In [76]: idx = np.floor(input).astype(np.int)

In [77]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1]
100 loops, best of 3: 18.5 ms per loop

In [78]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
100 loops, best of 3: 13.1 ms per loop

Thus, the performance boost with this linear indexing seems to be about 20% - 30%.

3 Comments

When you're not busy answering numpy questions, come visit us in the MATLAB chat room :) We miss you! chat.stackoverflow.com/rooms/81987/matlab
@rayryeng Nice place for MATLABans! Oops! I didn't mean "Bans" , meant more like MATLAB people! I guess I will come back when there are more people in there :)
MATLABians :) ok I'll see you there eventually!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.