How I can improve a loop for in a Python code?

Question

I am translating this code from Matlab to Python. The code function fine but it is painfully slow in python. In Matlab, the code runs in way less then a minute, in python it took 30 min!!! Someone with mode experience in python could help me?

# P({ai})
somai = 0
for i in range(1, n):
    somaj = 0
    for j in range(1, n):
        exponencial = math.exp(-((a[i] - a[j]) * (a[i] - a[j])) / dev_a2 - ((b[i] - b[j]) * (b[i] - b[j])) / dev_b2)
        somaj = somaj + exponencial

    somai = somai + somaj

Well, first i would like to advice you, that u can use somai += somaj and somaj += exponencial ... also ... what u r doing here is acctually an O(n^2) algorithm ... question is ... how big arrays are you using ... also if you cant recude the number of steps you are making ... I am not an expert on this, but as far as I remember, function exp should be O(n) too ... so that way it makes O(n^3) which is kinda bad ... and I can imagine waiting for 30 mins if n is something like 150 ... — StyleZ
– StyleZ, Commented Jul 21, 2020 at 19:40

mimocha · Accepted Answer · 2020-07-21 20:55:51Z

As with MATLAB, I'd recommend you vectorize your code. Iterating by for-loops can be much slower than the lower level implementation of MATLAB and numpy.

Your operations (a[i] - a[j])*(a[i] - a[j]) are pairwise squared-Euclidean distance for all N data points. You can calculate a pairwise distance matrix using scipy's pdist and squareform functions -- pdist, squareform.

Then you calculate the difference between pairwise distance matrices A and B, and sum the exponential decay. So you could get a vectorized code like:

import numpy as np
from scipy.spatial.distance import pdist
from scipy.spatial.distance import squareform

# Example data
N = 1000
a = np.random.rand(N,1)
b = np.random.rand(N,1)
dev_a2 = np.random.rand()
dev_b2 = np.random.rand()

# `a` is an [N,1] matrix (i.e. column vector)
A = pdist(a, 'sqeuclidean')
# Change to pairwise distance matrix
A = squareform(A)
# Divide all elements by same divisor
A = A / dev_a2

# Then do the same for `b`'s
# `b` is an [N,1] matrix (i.e. column vector)
B = pdist(b, 'sqeuclidean')
B = squareform(B)
B = B / dev_b2

# Calculate exponential decay
expo = np.exp(-(A-B))

# Sum all elements
total = np.sum(expo)

Here's a quick timing comparison between the iterative method and this vectorized code.

N: 1000 | Iter Output: 2729989.851117 | Vect Output: 2732194.924364
Iter time: 6.759 secs | Vect time: 0.031 secs

N: 5000 | Iter Output: 24855530.997400 | Vect Output: 24864471.007726
Iter time: 171.795 secs | Vect time: 0.784 secs

Note that the final results are not exactly the same. I'm not sure why this is, it might be rounding error or math error on my part, but I'll leave that to you.

FailureGod · Accepted Answer · 2020-07-21 21:05:06Z

1

TLDR

Use numpy

Why Numpy?

Python, by default, is slow. One of the powers of python is that it plays nicely with C and has tons of libraries. The one that will help you hear is numpy. Numpy is mostly implemented in C and, when used properly, is blazing fast. The trick is to phrase the code in such a way that you keep the execution inside numpy and outside of python proper.

Code and Results

import math
import numpy as np

n = 1000
np_a = np.random.rand(n)
a = list(np_a)
np_b = np.random.rand(n)
b = list(np_b)
dev_a2, dev_b2 = (1, 1)

def old():
    somai = 0.0
    for i in range(0, n):
        somaj = 0.0
        for j in range(0, n):
            tmp_1 = -((a[i] - a[j]) * (a[i] - a[j])) / dev_a2
            tmp_2 = -((b[i] - b[j]) * (b[i] - b[j])) / dev_b2
            exponencial = math.exp(tmp_1 + tmp_2)
            somaj += exponencial
        somai += somaj
    return somai


def new():
    tmp_1 = -np.square(np.subtract.outer(np_a, np_a)) / dev_a2
    tmp_2 = -np.square(np.subtract.outer(np_b, np_b)) / dev_a2
    exponential = np.exp(tmp_1 + tmp_2)
    somai = np.sum(exponential)
    return somai

old = 1.76 s ± 48.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new = 24.6 ms ± 66.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) This is about a 70x improvement
old yields 740919.6020840995
new yields 740919.602084099

Explanation

You'll notice I broke up your code with the tmp_1 and tmp_2 a bit for clarity.

np.random.rand(n): This creates an array of length n that has random floats going from 0 to 1 (excluding 1) (documented here).

np.subtract.outer(a, b): Numpy has modules for all the operators that allow you do various things with them. Lets say you had np_a = [1, 2, 3], np.subtract.outer(np_a, np_a) would yield

array([[ 0, -1, -2],
       [ 1,  0, -1],
       [ 2,  1,  0]])

Here's a stackoverflow link if you want to go deeper on this. (also the word "outer" comes from "outer product" like from linear algebra)

np.square: simply squares every element in the matrix.

/: In numpy when you do arithmetic operators between scalars and matrices it does the appropriate thing and applies that operation to every element in the matrix.

np.exp: like np.square

np.sum: sums every element together and returns a scalar.

answered Jul 21, 2020 at 21:05

FailureGod

3822 silver badges15 bronze badges

4 Comments

Edilson Ferreira Over a year ago

This code is very good indeed, but the problem is that this small differences in the rounds of the numbers bring to wrong results in the and of calculations. That's because I need a result with a precision of 0.001 at least. But I will try to improve the code. Thanks!!!

FailureGod Over a year ago

So the difference between the standard and numpy results is 5.28x10^(-10). Any differences in answer are going to be due to the intricacies of how float calculations are done. If that matters to you, you shouldn't be using numpy, standard python, or standard matlab. If that matters to you I can find an infinite precision library.

FailureGod Over a year ago

Also because of this the inherent imprecision in floats you shouldn't do a direct comparison (==) you should do check to see if they are within some small difference of each other.

Edilson Ferreira Over a year ago

I change the order of some calculation in other parts of the code and now it gives exactly the same value! (a precision of 10e-9). The time to run was around 1.7 seconds, near the Matlab with (1.4 seconds), and much more faster the the old code (222.1 s). @FailureGod, Thanks a lot for your help!!!

Collectives™ on Stack Overflow

How I can improve a loop for in a Python code?

2 Answers 2

Comments

TLDR

Why Numpy?

Code and Results

Explanation

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

TLDR

Why Numpy?

Code and Results

Explanation

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related