Python: Vectorizing evaluations of arrays of lambda functions

Question

How would you vectorize the evaluation of arrays of lambda functions?

Here's an example to understand what I'm talking about. (And even though I'm using numpy arrays, I'm not limiting myself to only using numpy.)

Let's say I have the following numpy arrays.

array1 = np.array(["hello", 9])
array2 = np.array([lambda s: s == "hello", lambda num: num < 10])

(You could store these kinds of objects in numpy without throwing an error, believe it or not.) What I want is something akin to the following.

array2 * array1 
# Return np.array([True, True]). PS: An explanation of how to `AND` all of 
# booleans together quickly would be nice too.

Of course, this seems impractical for arrays of size 2, but for arrays of arbitrary sizes, I'll assume this would yield a performance boost because of all of the low level optimizations.

So, anyone know how to write this weird kind of python code?

'this is a giant performance boost' -- Is this something you have tested, or is it just an assumption? — Dolda2000
– Dolda2000, Commented Mar 23, 2015 at 6:42
@Dolda2000: From my experience using numpy arrays on large datasets consisting of numbers, I'd assume this is true. But vectorized function evals? Not sure. I guess I'll reword my above post a little. — hlin117
– hlin117, Commented Mar 23, 2015 at 6:44
None of the things NumPy ordinarily does to speed up vectorized operations apply to this kind of thing. For NumPy to speed things up, it needs to keep as much work as possible out of interpreted Python and in low-level C routines. That's impossible here. — user2357112
– user2357112, Commented Mar 23, 2015 at 6:49
Thanks for your insight, @user2357112. I was thinking this could happen. I'm hoping a non-numpy solution could appear here then. — hlin117
– hlin117, Commented Mar 23, 2015 at 6:52
@hlin117: Don't hold your hopes too high. The main point here is that, however you choose to loop over your array, the main overhead will be in the lambda function evaluation, rather than in the looping construct, so you won't get rid of it no matter how you choose to loop. — Dolda2000
– Dolda2000, Commented Mar 23, 2015 at 7:03

cge · Accepted Answer · 2015-03-23 07:26:26Z

The simple answer, of course, is that you can't easily do this with numpy (or with standard Python, for that matter). Numpy doesn't actually vectorize most operations itself, to my knowledge: it uses libraries like BLAS/ATLAS/etc that do for certain situations. Even if it did, it would do so in C for specific situations: it certainly can't vectorize Python function execution.

If you want to involve multiprocessing in this, it is possible, but it depends on your situation. Are your individual function applications time-consuming, making them feasible to send out one-by-one, or do you need a very large number of fast function executions, in which case you'd probably want to send batches of them to each process?

In general, because of what could be argued as poor fundamental design (eg, the Global Interpreter Lock), it's very difficult with standard Python to have lightweight parallelization as you're hoping for here. There are significantly heavier methods, like the multiprocessing module or Ipython.parallel, but these require some work to use.

hlin117 · Accepted Answer · 2015-03-24 20:57:40Z

Alright guys, I have an answer: numpy's vectorize.

Please read the edited section though. You'll discover that python actually optimizes code for you, which actually defeats the purpose of using numpy arrays in this case. (But using numpy arrays does not decrease the performance.)

The last test really shows is that python lists are as efficient as they could be, and so this vectorization procedure is unnecessary. This is why I didn't mark this question as the "best answer".

Setup code:

def factory(i): return lambda num: num==i

array1 = list()
for i in range(10000): array1.append(factory(i))
array1 = np.array(array1)
array2 = np.array(xrange(10000))

The "unvectorized" version:

def evaluate(array1, array2):
    return [func(val) for func, val in zip(array1, array2)]

%timeit evaluate(array1, array2)
# 100 loops, best of 3: 10 ms per loop

The vectorized version

def evaluate2(func, b): return func(b)
vec_evaluate = np.vectorize(evaluate2)
vec_evaluate(array1, array2)
# 100 loops, best of 3: 2.65 ms per loop

EDIT

Okay, I just wanted to paste more benchmarks that I received using the above tests, except with different test cases.

I made a third edit, showing what happens if you simply use python lists. The long story short, you actually won't regret much. This test case is on the very bottom.

Test cases only involving integers

In summary, if n is small, then the unvectorized version is better. Otherwise, vectorized is the way to go.

With n = 30

%timeit evaluate(array1, array2)
# 10000 loops, best of 3: 35.7 µs per loop
%timeit vec_evaluate(array1, array2)
# 10000 loops, best of 3: 27.6 µs per loop

With n = 7

%timeit evaluate(array1, array2)
100000 loops, best of 3: 9.93 µs per loop
%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 21.6 µs per loop

Test cases involving strings

Vectorization wins.

Setup code:

def factory(i): return lambda num: str(num)==str(i)

array1 = list()
for i in range(7):
    array1.append(factory(i))
array1 = np.array(array1)
array2 = np.array(xrange(7))

With n = 10000

%timeit evaluate(array1, array2)
10 loops, best of 3: 36.7 ms per loop

%timeit vec_evaluate(array1, array2)
100 loops, best of 3: 6.57 ms per loop

With n = 7

%timeit evaluate(array1, array2)
10000 loops, best of 3: 28.3 µs per loop

%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 27.5 µs per loop

Random tests

Just to see how branch prediction played a role. From what I'm seeing, it didn't really change much. Vectorization still usually wins.

Setup code.

def factory(i):
    if random() < 0.5:
        return lambda num: str(num) == str(i)
    return lambda num: num == i

When n = 10000

%timeit evaluate(array1, array2)
10 loops, best of 3: 25.7 ms per loop

%timeit vec_evaluate(array1, array2)
100 loops, best of 3: 4.67 ms per loop

When n = 7

%timeit evaluate(array1, array2)
10000 loops, best of 3: 23.1 µs per loop

%timeit vec_evaluate(array1, array2)
10000 loops, best of 3: 23.1 µs per loop

Using python lists instead of numpy arrays

I ran this test to see what happened when I chose not to use the "optimized" numpy arrays, and I received some very surprising results.

The setup code is almost the same, except I'm choosing not to use numpy arrays. I'm also doing this test for only the "random" case.

def factory(i):
    if random() < 0.5:
        return lambda num: str(num) == str(i)
    return lambda num: num == i


array1 = list()
for i in range(10000): array1.append(factory(i))
array2 = range(10000)

And the "unvectorized" version:

%timeit evaluate(array1, array2)
100 loops, best of 3: 4.93 ms per loop

You could see this is actually pretty surprising, because this is almost the same benchmark I was receiving with my random test case involving the vectorized evaluate.

%timeit vec_evaluate(array1, array2)
10 loops, best of 3: 19.8 ms per loop

Likewise, if you change these into numpy arrays before using vec_evaluate, you get the same 4.5 ms benchmark.

What you are really comparing is a list iteration with a numpy array iteration. vectorize just conveniently wraps the array iteration. It does not speed it up (there's a note to that effect in its doc).
I think the last experiment really says that, if you're using python lists, then stick to python list operations. If you're using numpy arrays, then vectorize the function before doing anything. (At least for my own use case, I'm interested in numpy arrays, but it's useful knowing that lists already try to squeeze as much performance as they can.)

Collectives™ on Stack Overflow

Python: Vectorizing evaluations of arrays of lambda functions

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related