more efficient array calculation on this python code case

Question

I have written a function which takes an N by N array and compute an output array based on it.

heres how my code looks like this:

 def calculate_output(input,N):
    output = np.zeros((N, N))
    for y in range(N):
        for x in range(N):
            val1 = 0 if y-1<0 else output[y-1][x]+input[y][x]
            val2 = 0 if x-1<0 else output[y][x-1]+input[y][x]
            output[y][x] = max(val1,val2)
    return output

N = 10000
input = np.reshape(np.random.binomial(1, [0.25] * N * N), (N, N))
output =calculate_output(input,N)

however this compution is not fast enough and takes about 300 seconds on my machine.(compared to 3 seconds when implemented on C++) is there any way to improve this without writing a C extension? I have tries using pypy but in this case the code is even slower using pypy

If yes, your solution almost does that. To be correct you need to have output[y][x] = input[y][x] + max(val1,val2) instead of adding the input in the conditional, since we count the weight of the starting square. — orlp
– orlp, Commented Jul 2, 2021 at 13:22
HI, indeed it is I applied your advice and it did infact improve the performannce. however, it is till quite slow, is there any way to improve this? — Ruben Wolfe
– Ruben Wolfe, Commented Jul 2, 2021 at 14:36

Jérôme Richard · Accepted Answer · 2021-07-02 20:29:13Z

CPython is very slow because it is an interpreter and it clearly cannot compete with C and C++ in such a case. The usual approach to reduce the cost of the interpreter is to avoid loops as much as possible and use few Numpy vectorized calls instead. However in this case, it is barely possible to write an efficient implementation using Numpy vectorized calls.

On the other hand PyPy is often much better for numerical codes because of the JIT compilation. But its implementation of Numpy is not great at all mainly because they used an implementation of Numpy rewritten in Python which is not as good as the native Numpy implementation and the native implementation would not be efficient because of the way Python modules are currently implemented. To put it shortly, AFAIK, the PyPy JIT cannot optimize Numpy access with the native implementation. As the result, the JIT can be slower than the CPython interpreter in your case.

However, you can speed up the code a lot using the Numba JIT compiler which has been written for this exact use-case. Moreover, few optimizations can be implemented to speed up the code even more (whatever the programming language used):

conditionals are generally slow, you can move them in loops performing only the borders
writing zeros initially in the output matrix is not required and is actually slower
Using 2D direct indexing is cleaner and likely a bit faster
integers can be used instead of floating-point numbers since the output contains only integers and computing integers is faster than computing the same operation with floating-point numbers.

import numba as nb

@nb.njit(['int32[:,::1](int32[:,::1],int32)', 'int64[:,::1](int64[:,::1],int64)'])
def calculate_output(input,N):
    output = np.empty((N, N), input.dtype)
    for x in range(0,N):
        val2 = 0 if x-1<0 else output[0,x-1]+input[0,x]
        output[0,x] = max(0,val2)
    for y in range(1,N):
        val1 = 0 if y-1<0 else output[y-1,0]+input[y,0]
        output[y,0] = max(val1,0)
    for y in range(1,N):
        for x in range(1,N):
            val1 = output[y-1,x]+input[y,x]
            val2 = output[y,x-1]+input[y,x]
            output[y,x] = max(val1,val2)
    return output

The resulting calculate_output call is 730 times faster on my machine.

Collectives™ on Stack Overflow

more efficient array calculation on this python code case

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related