Create a random int array with the highest number per row as the number of columns. Hence, we can use np.random.randint with its high arg set as the no. of cols. Then, perform modulus operation to set across each row a different limit defined by the row number. Thus, we would have a vectorized implementation like so -
def create_rand_limited_per_row(m,n):
s = np.arange(1,m+1)
return np.random.randint(low=0,high=n,size=(m,n))%s[:,None]
Sample run -
In [45]: create_rand_limited_per_row(m=5,n=5)
Out[45]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[1, 2, 0, 2, 1],
[0, 0, 1, 3, 0],
[1, 2, 3, 3, 2]])
To leverage multi-core with numexpr module for large data -
import numexpr as ne
def create_rand_limited_per_row_numepxr(m,n):
s = np.arange(1,m+1)[:,None]
a = np.random.randint(0,n,(m,n))
return ne.evaluate('a%s')
Benchmarking
# Original approach
def create_rand_limited_per_row_loopy(m,n):
x = np.empty((m,n),dtype=int)
for i in range(m):
x[i] = np.random.choice(i+1, n)
return x
Timings on 1k x 1k data -
In [71]: %timeit create_rand_limited_per_row_loopy(m=1000,n=1000)
10 loops, best of 3: 20.6 ms per loop
In [72]: %timeit create_rand_limited_per_row(m=1000,n=1000)
100 loops, best of 3: 14.3 ms per loop
In [73]: %timeit create_rand_limited_per_row_numepxr(m=1000,n=1000)
100 loops, best of 3: 6.98 ms per loop