Optimization of linear solving for small matrices (10x10)

Ask Question

Asked 6 years, 9 months ago

Modified 6 years, 9 months ago

Viewed 61 times

How to optimize the hell out of linear system solving for small matrices (10x10)? This would be used in an AR engine for a few games, but has to be done very fast.

This solver is to be executed in excess of 1 000 000 times in microseconds on an Intel CPU. I am talking to the extreme level of optimization used in graphiscs for computer games. No matter if I code it in assembly and architecture-specific, or study precision or reliability tradeoffs reductions and use floating point hacks (like many games, I use the -ffast-math compile flag, no problem). The solve can even fail for about 20% of the time!

Eigen's partialPivLu is the fastest in my current benchmark, outperforming LAPACK when optimized with -O3 and a good compiler. But now I am at the point of handcrafting a custom linear solver. Any advice would be greatly appreciated. I will make my final solution open source and share it here.

edited Mar 2, 2019 at 1:17

asked Mar 1, 2019 at 23:31

rfabbri

1114 bronze badges

\$\begingroup\$ Does software.intel.com/en-us/mkl have anything useful for you in that effort? \$\endgroup\$

Patrick Hughes
– Patrick Hughes

2019-03-01 23:57:38 +00:00
Commented Mar 1, 2019 at 23:57
\$\begingroup\$ @PatrickHughes: will try it and report back here. first experience with icc was not as fast as gcc with --ffast-math though \$\endgroup\$

rfabbri
– rfabbri

2019-03-02 01:19:19 +00:00
Commented Mar 2, 2019 at 1:19
1

\$\begingroup\$ @PatrickHughes my tests with ICC produced code just as fast as code optimized with GCC, so I see no immediate use for MKL. Perhaps if I implement my own linear solve and try to optimize its operations, then MKL could help with the optimized components. Still, I'd avoid having such a dependency for this project. \$\endgroup\$

rfabbri
– rfabbri

2019-03-07 01:14:48 +00:00
Commented Mar 7, 2019 at 1:14
\$\begingroup\$ You may be near the limit of what the hardware can do, too. I remember one time we had an Intel field engineer in the office to consult on our 3D graphics processing for their integrated graphics chips, even with fully custom SIMD data massaging we got less than +5% from what we started with. Best of luck with the optimizing! \$\endgroup\$

Patrick Hughes
– Patrick Hughes

2019-03-07 01:52:53 +00:00
Commented Mar 7, 2019 at 1:52

Add a comment |

0 You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Optimization of linear solving for small matrices (10x10)

0

You must log in to answer this question.

Hot Network Questions

Optimization of linear solving for small matrices (10x10)

0

You must log in to answer this question.

Related

Hot Network Questions