![]() |
Peano
|
Hard coded strategy for the single black hole setup. More...
#include <MulticoreOrchestration.h>
Public Member Functions | |
MulticoreOrchestration () | |
virtual | ~MulticoreOrchestration ()=default |
virtual void | startBSPSection (int nestedParallelismLevel) override |
Start a fork/join section. More... | |
virtual void | endBSPSection (int nestedParallelismLevel) override |
End fork/join section. More... | |
virtual int | getNumberOfTasksToHoldBack (int taskType) override |
How many tasks should be held back. More... | |
virtual FuseInstruction | getNumberOfTasksToFuseAndTargetDevice (int taskType) override |
Ensure right cardinality ends up on GPU. More... | |
virtual bool | fuseTasksImmediatelyWhenSpawned (int taskType) override |
Ensure Finite Volume tasks end up on GPU asap. More... | |
virtual ExecutionPolicy | paralleliseForkJoinSection (int nestedParallelismLevel, int numberOfTasks, int taskType) override |
Determine how to parallelise a fork/join section. More... | |
Private Attributes | |
int | _nestedBSPLevels |
Number of nested fork/join levels. More... | |
int | _maxFiniteVolumeTasks |
Maximum number of finite volume tasks in the system. More... | |
int | _finiteVolumeTasksInThisBSPSection |
Current number of finite volume tasks that already have been spawned. More... | |
Hard coded strategy for the single black hole setup.
The single black hole setup is fixed setup, where we have a large domain covered by higher order patches plus a small area in the centre which is covered by Finite Volumes. The latter are very compute heavy and hence quickly become the bottleneck. So we have to process them as soon as possible. We could realise this by giving those tasks a higher priority than all other tasks, but I actually prefer to realise all scheduling within this orchestration object. Certainly, priorities might do a fine job as well.
I hijack getNumberOfTasksToHoldBack() to keep track of the total number of enclave tasks. This is a constant here, so I can derive it via a max function and I know that the right value will be in there after the first grid sweep.
The "magic" happens in getNumberOfTasksToHoldBack() and the documentation of this rule provides some further details.
We disable any nested parallelism. See paralleliseForkJoinSection()'s documentation.
Definition at line 55 of file MulticoreOrchestration.h.
benchmarks::exahype2::ccz4::MulticoreOrchestration::MulticoreOrchestration | ( | ) |
Definition at line 18 of file MulticoreOrchestration.cpp.
|
virtualdefault |
|
overridevirtual |
End fork/join section.
Decrement the counter _nestedBSPLevels. If the outermost parallel region joins, we can update _maxFiniteVolumeTasks.
Definition at line 32 of file MulticoreOrchestration.cpp.
|
overridevirtual |
Ensure Finite Volume tasks end up on GPU asap.
Always true as we want to get the Finite Volume tasks to the accelerator as soon as possible. All other tasks might remain on the CPU or not. Here, it makes no big difference.
Definition at line 75 of file MulticoreOrchestration.cpp.
|
overridevirtual |
Ensure right cardinality ends up on GPU.
If we hae a Finite Volume task, we send it off to the GPU immediately. If we have FD4 tasks, we wait until we have 16 of them and then send them off. The 16 is arbitrary. I just needed one number.
Definition at line 58 of file MulticoreOrchestration.cpp.
|
overridevirtual |
How many tasks should be held back.
If we have a GPU and we are given a FV task, we hold it back, as we know that fuseTasksImmediatelyWhenSpawned(int taskType) yields true immediately and we hence offload. If there is no GPU, we map FV tasks onto proper tasks immediately.
The routine is basically where the magic happens and where the logic of the class description is realised.
Definition at line 39 of file MulticoreOrchestration.cpp.
|
overridevirtual |
Determine how to parallelise a fork/join section.
I found nested parallelism to be brutally slow, so I always return tarch::multicore::orchestration::Strategy::ExecutionPolicy::RunSerially if more than one parallel level is embedded into each other. Otherwise, I'm happy for a section to be processed in parallel.
Definition at line 79 of file MulticoreOrchestration.cpp.
|
overridevirtual |
Start a fork/join section.
Reset _finiteVolumeTasksInThisBSPSection is this is the start of the outermost parallel region.
Definition at line 24 of file MulticoreOrchestration.cpp.
|
private |
Current number of finite volume tasks that already have been spawned.
Definition at line 75 of file MulticoreOrchestration.h.
|
private |
Maximum number of finite volume tasks in the system.
I don't use this value at the moment, but might want to use it for GPUs.
Definition at line 70 of file MulticoreOrchestration.h.
|
private |
Number of nested fork/join levels.
Important for paralleliseForkJoinSection() to decide if the parallel region should actually be processed concurrently.
Definition at line 62 of file MulticoreOrchestration.h.