5

Code 1 shows the parallelization of 'for' loop using openmp. I would like to achieve similar parallelization after unrolling the 'for' loops using template metaprogramming (refer Code 2). Could you please help?

Code 1: Outer for loop run in parallel with four threads

void some_algorithm()
{
  // code
}

int main()
{
  #pragma omp parallel for
  for (int i=0; i<4; i++)
  {
    //some code
    for (int j=0;j<10;j++)
    {
      some_algorithm()
    }
  }
}

Code 2: Same as Code 1, I want to run outer for loop in parallel using openmp. How to do that?1

template <int I, int ...N>
struct Looper{
    template <typename F, typename ...X>
    constexpr void operator()(F& f, X... x) {
        for (int i = 0; i < I; ++i) {
            Looper<N...>()(f, x..., i);
        }
    }
};

template <int I>
struct Looper<I>{
    template <typename F, typename ...X>
    constexpr void operator()(F& f, X... x) {
        for (int i = 0; i < I; ++i) {
            f(x..., i);
        }
    }
};


int main()
{
    Looper<4, 10>()(some_algorithm); 
}

1Thanks to Nim for code 2 How to generate nested loops at compile time?

3
  • Maybe OpenMP is overkill for such small loops. Only loops with a reasonably small number of iterations can be unrolled. Commented Oct 4, 2020 at 12:32
  • OpenMP manages a thread pool in background. It's the first thing you should think about when doing parallel work without OpenMP. Commented Oct 4, 2020 at 14:21
  • @πάντα ῥεῖ, I agree, however, in my case, if some_algorithm has very complex logic, so I prefer to run each outer loop i in a separate thread. Basically, my question is - is it possible to combine openmp with template programming? Commented Oct 4, 2020 at 17:35

1 Answer 1

1

If you remove the constexpr declarations, then you can use _Pragma("omp parallel for"), something like this

#include <omp.h>

template <int I, int ...N>
struct Looper{
    template <typename F, typename ...X>
    void operator()(F& f, X... x) {
        _Pragma("omp parallel for if (!omp_in_parallel())")
        for (int i = 0; i < I; ++i) {
            Looper<N...>()(f, x..., i);
        }
    }
};

template <int I>
struct Looper<I>{
    template <typename F, typename ...X>
    void operator()(F& f, X... x) {
        for (int i = 0; i < I; ++i) {
            f(x..., i);
        }
    }
};

void some_algorithm(...) {
}
int main()
{
    Looper<4, 10>()(some_algorithm); 
}

Which you can see being compiled to use OpenMP at https://godbolt.org/z/nPrcWP (observe the call to GOMP_parallel...). The code also compiles with LLVM (switch the compiler to see :-)).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.