1

1. Consider the following traversal of a numpy.ndarray

for ii in xrange(0,(nxTes-2)):
    if ( (xCom-dtaCri-xcTes[ii]) * (xCom-dtaCri-xcTes[ii+1]) ) <= 0.0:
        nxL=ii
    if ( (xCom+dtaCri-xcTes[ii]) * (xCom+dtaCri-xcTes[ii+1]) ) <= 0.0:
        nxR=ii+1

2. xCom, dtaCri and xcTes are of type() numpy.float64, float and numpy.ndarray respectively

3. The full block above is repeated for nyTes and nzTes i.e. a total of three blocks are done in the main algorithm loop. The goal is to create a region of interest with window size dtaCri and center at comparison point xCom using positional data from xcTes

4. The above code is more or less a straight port from Matlab wherein the same block executes at somewhere around three to four times the speed.

5. Question: Is it possible to optimize the block above with respect to execution time and if so how?

6. So far I have tried some minor tweaks such as altering data types and using range() instead of xrange() from which I saw no noticeable changes in performance.

2
  • MATLAB is probably faster because it is doing some form of jit compilation. That lets you think iteratively. numpy is more like old MATLAB where you had to focus on whole-matrix operations. Can you write that in MATLAB without the iteration? (with masks and so on)? Commented Dec 21, 2016 at 21:17
  • Agreed. And no, I am not sure there is a "simple" way of rewriting the Matlab code. Atleast not at present. I will instead spend some time trying to perfect the Python interpretation. Commented Dec 21, 2016 at 21:39

1 Answer 1

2

Pre-compute those boolean conditional outputs before going into the loop in a vectorized manner and making use of slicing, which are just views into the input array, like so -

parte1 = ( (xCom-dtaCri-xcTes[:nxTes-2]) * (xCom-dtaCri-xcTes[1:nxTes-1]) ) <=0.0
parte2 = ( (xCom+dtaCri-xcTes[:nxTes-2]) * (xCom+dtaCri-xcTes[1:nxTes-1]) ) <=0.0

We could see few computations are repeated. So, we could use some re-use there -

p = xCom-xcTes[:nxTes-1]
p0 = p - dtaCri
p1 = p + dtaCri
parte1 = p0[:-1]*p0[1:] <= 0.0
parte2 = p1[:-1]*p1[1:] <= 0.0

Then, just use those bools in the loop -

for ii in xrange(0,(nxTes-2)):
    if parte1[ii]:
        nxL=ii
    if parte2[ii]:
        nxR=ii+1

The idea is to do minimal work inside the loop with focus on performance.

I am assuming you have more work going in the loop that is using nxL and nxR, because otherwise we are overwriting values into those two variables.

Sign up to request clarification or add additional context in comments.

11 Comments

Thank you. Yes, there are several more sections happening later in the algorithm that depend on the region of interest generated from the above segment. I will try to implement your suggestion and report back with results.
Very interesting result: I am processing medical images and the old method takes ~2.34s per processed layer (i.e. roughly 2½s per x, y or z) New method, where one major change is making use of slicing, takes ~0.49s Net result: ~4.8 times performance increase. Seems xmas came early this year. Thank you.
@Peter That's not bad, is it! Awesome, have a great holiday!
You're still doing ~2x the work generating parte1 and parte2, since most values are duplicated between the arrays.
Not as to drag this comment section out forever but I just had to get a better understanding of what was going on in this new suggestion. I felt it SHOULD be somewhat faster (and a bit more than what I saw initially) so I did one more round of analysis and can now report a performance increase of ~3.8819% relative to previous method. With this new insight I have decided to go with the updated method, espec. since these sections of the algo are used frequently. Thank you once again!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.