![]() |
Peano
|
To compile with multicore support, you have to invoke the configure script with the option
where value is
cpp
. This adds support through C++14 threads.tbb
. This adds support through Intel's Threading Building Blocks. If you use this option, you first have to ensure that your CXXFLAGS and LDFLAGS point to the right include or library, respectively, directories. LDFLAGS also has to compromise either -ltbb or -tbb_debug.tbb_extension
. Is an alternative TBB version that does not require the very latest TBB extensions.openmp
. This adds OpenMP support. We currently develop against OpenMP 4.x though some of our routines use OpenMP target and thus are developed against OpenMP 5.sycl
. We have a SYCL support for the multithreading, through, within the Intel toolchain, it might be more appropriate to combine sycl on the GPU with the tbb backend for multithreading.Our vision is that each code should be totally independent of the multithreading implementation chosen. Indeed, Peano 4 itself does not contain any direct multithreading library calls. It solely relies on the classes and functions from tarch::multicore.
tbb_extension
. The central instance managing the threads on a system is tarch::multicore::Core. This is a singleton and the name thus is slightly wrong. It does not really represent one core but rather represents the landscape of cores. You can setup the multithreading environment through Core's configure() routine, but this is optional. Indeed, multithreading should work without calling configure() at all. Each multitheading backend offers its own realisation of the Core class.
For multithreaded code, it is important that the code can lock (protect) code regions and free them. For this, the multithreading layer offers different semaphores. Each multithreading backend maps these logical concepts onto its internal synchronisation mechanism. Usually, I use the semaphores through lock objects. As they rely on the semaphore implementations, they are generic and work for any backend.
So the standard use is that you have
and then you use it by locking it:
The class tarch::multicore::BooleanSemaphore should be studied for a more detailed documentation of the thread locks.
The plain semaphore does not work recursively, i.e. if a thread locks a semaphore and then locks again, it deadlocks itself. You can avoid this by switching to a recursive semaphore. They are more expensive yet allow that a thread locks the same semaphore repeatedly.
Each semaphore also supports a try_lock() operation, so you can do busy polling. try_lock's are not supported by the tarch::multicore::Lock object. You have to access the semaphore directly and manually free it later, whereas the Lock object automatically frees the semaphore in its destructor (unless you do so before that manually).
At the moment, the tarch does not provide a backend-independent abstraction of atomics. We hope that such a thing would be obsolete anyway, as C++'s std::atomic works with all backends. If this is not the case and if you need atomics, you have to write backend-specific variants protected by preprocessor macros (see below).
If you have a piece of code that locks a sempahore recursively, the second call to lock will deadlock. You will have to use the tarch::multicore::RecursiveLock.
Sometimes, it is fine for many threads to access a code block concurrently as they only read data. At the same time, if one thread writes some data, noone else should read or write concurrently. In this case, you have to use tarch::multicore::MultiReadSingleWriteSemaphore.
The namespace tarch::multicore provides a set of preprocessor macros and further information how to write bespoke code for codes that have to distinguish single-threaded realisation from bespoke variants.
All the tasking is modelled through the class tarch::multicore::Task. That is, Peano expects each task to be a subclass of this guy. However, there's already a specialisation of the class accepting a functor if you prefer to work with lambdas.
The actually task submission, dependency tracking, and synchronisation then is all realised through functions within the namespace tarch::multicore. Consult the function tarch::multicore::spawnTask().
Peano models all of its interna as tasks. Each Peano 4 task is a subclass of tarch::multicore::Task. However, these classes might not be mapped 1:1 onto native tasks. In line with other APIs such as OneTBB, we distinguish different task types or task graph types, respectively:
In Peano, task DAGs are built up along the task workflow. That is, each task that is not used within a fork-join region or is totally free is assigned a unique number when we spawn it.
Whenever we define a task, we can also define its dependencies. This is a sole completion dependency: you tell the task system which task has to be completed before the currently submitted one is allowed to start. A DAG thus can be built up layer by layer. We start with the first task. This task might be immediately executed - we do not care - and then we continue to work our way down through the graph adding node by node.
In line with OpenMP and TBB - where we significantly influenced the development of the dynamic task API - outgoing dependencies should be declared before we use them.
The core idea behind task fusion is that there's no right, generic, global task granularity on modern HPC systems. Instead, we may assume that
In Peano, we try to use as small tasks as possible, but equip the runtime to bundle (fuse) these tasks later on at runtime. This is different to other codes which require codes to come up with the large, fused tasks a priori - for example by working with larger patches.
Tasks that can be fused have to implement the canFuse() predicate, and they have to provide a function which can handle a set of tasks of the same type. Details on how the fusion is implemented are provided by tarch::multicore::taskfusion.
Most implementation details can be found in the namespace documentation.