Call functions with varying parameters to modify a numpy array efficiently

Question

I want to eliminate the unefficient for loop from this code

import numpy as np

x = np.zeros((5,5))

for i in range(5):
    x[i] = np.random.choice(i+1, 5)

While maintaining the output given

[[0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 2. 2. 1. 0.]
 [1. 2. 3. 1. 0.]
 [1. 0. 3. 3. 1.]]

I have tried this

i = np.arange(5)
x[i] = np.random.choice(i+1, 5)

But it outputs

[[0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]]

Is it possible to remove the loop? If not, which is the most efficient way to proceed for a big array and a lot of repetitions?

Divakar · Accepted Answer · 2018-07-21 11:47:54Z

Create a random int array with the highest number per row as the number of columns. Hence, we can use np.random.randint with its high arg set as the no. of cols. Then, perform modulus operation to set across each row a different limit defined by the row number. Thus, we would have a vectorized implementation like so -

def create_rand_limited_per_row(m,n):
    s = np.arange(1,m+1)
    return np.random.randint(low=0,high=n,size=(m,n))%s[:,None]

Sample run -

In [45]: create_rand_limited_per_row(m=5,n=5)
Out[45]: 
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 2, 0, 2, 1],
       [0, 0, 1, 3, 0],
       [1, 2, 3, 3, 2]])

To leverage multi-core with numexpr module for large data -

import numexpr as ne

def create_rand_limited_per_row_numepxr(m,n):
    s = np.arange(1,m+1)[:,None]
    a = np.random.randint(0,n,(m,n))
    return ne.evaluate('a%s')

Benchmarking

# Original approach
def create_rand_limited_per_row_loopy(m,n):
    x = np.empty((m,n),dtype=int)
    for i in range(m):
        x[i] = np.random.choice(i+1, n)
    return x

Timings on 1k x 1k data -

In [71]: %timeit create_rand_limited_per_row_loopy(m=1000,n=1000)
10 loops, best of 3: 20.6 ms per loop

In [72]: %timeit create_rand_limited_per_row(m=1000,n=1000)
100 loops, best of 3: 14.3 ms per loop

In [73]: %timeit create_rand_limited_per_row_numepxr(m=1000,n=1000)
100 loops, best of 3: 6.98 ms per loop

Collectives™ on Stack Overflow

Call functions with varying parameters to modify a numpy array efficiently

1 Answer 1

Benchmarking

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Benchmarking

Comments

Your Answer

Sign up or log in

Post as a guest

Related