[OpenMP] Add an information flag for device data transfers

This patch adds an information flag that indicated when data is being copied to
and from the device. This will be helpful for finding redundant or unnecessary
data transfers in applications.

Reviewed By: jdoerfert, grokos

Differential Revision: https://reviews.llvm.org/D103927
This commit is contained in:
Joseph Huber 2021-06-08 16:43:59 -04:00 committed by Huber, Joseph
parent f9649d123d
commit df965513a9
4 changed files with 81 additions and 40 deletions

View file

@ -89,6 +89,7 @@ with `-g` for full debug information. A full list of flags supported by
* Dump the contents of the device pointer map at kernel exit: ``0x04``
* Indicate when an entry is changed in the device mapping table: ``0x08``
* Print OpenMP kernel information from device plugins: ``0x10``
* Indicate when data is copied to and from the device: ``0x20``
Any combination of these flags can be used by setting the appropriate bits. For
example, to enable printing all data active in an OpenMP target region along
@ -137,44 +138,53 @@ provide the following output from the runtime library.
.. code-block:: text
Info: Device supports up to 65536 CUDA blocks and 1024 threads with a warp size of 32
Info: Entering OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
Info: to(X[0:N])[16384]
Info: tofrom(Y[0:N])[16384]
Info: Creating new map entry with HstPtrBegin=0x00007fff963f4000,
TgtPtrBegin=0x00007fff963f4000, Size=16384, Name=X[0:N]
Info: Creating new map entry with HstPtrBegin=0x00007fff963f8000,
TgtPtrBegin=0x00007fff963f00000, Size=16384, Name=Y[0:N]
Info: to(X[0:N])[16384]
Info: tofrom(Y[0:N])[16384]
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e99000,
TgtPtrBegin=0x00007f15dc600000, Size=16384, Name=X[0:N]
Info: Copying data from host to device, HstPtr=0x00007ffde9e99000,
TgtPtr=0x00007f15dc600000, Size=16384, Name=X[0:N]
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e95000,
TgtPtrBegin=0x00007f15dc604000, Size=16384, Name=Y[0:N]
Info: Copying data from host to device, HstPtr=0x00007ffde9e95000,
TgtPtr=0x00007f15dc604000, Size=16384, Name=Y[0:N]
Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:14:1:
Info: Host Ptr Target Ptr Size (B) RefCount Declaration
Info: 0x00007fff963f4000 0x00007fd225004000 16384 1 Y[0:N] at zaxpy.cpp:13:17
Info: 0x00007fff963f8000 0x00007fd225000000 16384 1 X[0:N] at zaxpy.cpp:13:11
Info: 0x00007ffde9e95000 0x00007f15dc604000 16384 1 Y[0:N] at zaxpy.cpp:13:17
Info: 0x00007ffde9e99000 0x00007f15dc600000 16384 1 X[0:N] at zaxpy.cpp:13:11
Info: Entering OpenMP kernel at zaxpy.cpp:6:1 with 4 arguments:
Info: firstprivate(N)[8] (implicit)
Info: use_address(Y)[0] (implicit)
Info: tofrom(D)[16] (implicit)
Info: use_address(X)[0] (implicit)
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffe37d8be80,
TgtPtrBegin=0x00007f90ff004000, Size=0, updated RefCount=2, Name=Y
Info: Creating new map entry with HstPtrBegin=0x00007fff963f33ff0,
TgtPtrBegin=0x00007fd225003ff0, Size=16, Name=D
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffe37d8fe80,
TgtPtrBegin=0x00007f90ff000000, Size=0, updated RefCount=2, Name=X
Info: Launching kernel __omp_offloading_fd02_c2c4ac1a__Z5daxpyPNSt3__17complexIdEES2_S1_m_l6
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffde9e95000,
TgtPtrBegin=0x00007f15dc604000, Size=0, updated RefCount=2, Name=Y
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e94fb0,
TgtPtrBegin=0x00007f15dc608000, Size=16, Name=D
Info: Copying data from host to device, HstPtr=0x00007ffde9e94fb0,
TgtPtr=0x00007f15dc608000, Size=16, Name=D
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffde9e99000,
TgtPtrBegin=0x00007f15dc600000, Size=0, updated RefCount=2, Name=X
Info: Launching kernel __omp_offloading_fd02_e25f6e76__Z5zaxpyPSt7complexIdES1_S0_m_l6
with 8 blocks and 128 threads in SPMD mode
Info: Removing map entry with HstPtrBegin=0x00007fff963f33ff0,
TgtPtrBegin=0x00007fd225003ff0, Size=16, Name=D
Info: Copying data from device to host, TgtPtr=0x00007f15dc608000,
HstPtr=0x00007ffde9e94fb0, Size=16, Name=D
Info: Removing map entry with HstPtrBegin=0x00007ffde9e94fb0,
TgtPtrBegin=0x00007f15dc608000, Size=16, Name=D
Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:6:1:
Info: Host Ptr Target Ptr Size (B) RefCount Declaration
Info: 0x00007fff963f4000 0x00007fd225004000 16384 1 Y[0:N] at zaxpy.cpp:13:17
Info: 0x00007fff963f8000 0x00007fd225000000 16384 1 X[0:N] at zaxpy.cpp:13:11
Info: 0x00007ffde9e95000 0x00007f15dc604000 16384 1 Y[0:N] at zaxpy.cpp:13:17
Info: 0x00007ffde9e99000 0x00007f15dc600000 16384 1 X[0:N] at zaxpy.cpp:13:11
Info: Exiting OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
Info: to(X[0:N])[16384]
Info: tofrom(Y[0:N])[16384]
Info: Removing map entry with HstPtrBegin=0x00007fff963f4000,
TgtPtrBegin=0x00007fff963f4000, Size=16384, Name=X[0:N]
Info: Removing map entry with HstPtrBegin=0x00007fff963f8000,
TgtPtrBegin=0x00007fff963f00000, Size=16384, Name=Y[0:N]
Info: to(X[0:N])[16384]
Info: tofrom(Y[0:N])[16384]
Info: Copying data from device to host, TgtPtr=0x00007f15dc604000,
HstPtr=0x00007ffde9e95000, Size=16384, Name=Y[0:N]
Info: Removing map entry with HstPtrBegin=0x00007ffde9e95000,
TgtPtrBegin=0x00007f15dc604000, Size=16384, Name=Y[0:N]
Info: Removing map entry with HstPtrBegin=0x00007ffde9e99000,
TgtPtrBegin=0x00007f15dc600000, Size=16384, Name=X[0:N]
From this information, we can see the OpenMP kernel being launched on the CUDA
device with enough threads and blocks for all ``1024`` iterations of the loop in

View file

@ -52,6 +52,8 @@ enum OpenMPInfoType : uint32_t {
OMP_INFOTYPE_MAPPING_CHANGED = 0x0008,
// Print kernel information from target device plugins.
OMP_INFOTYPE_PLUGIN_KERNEL = 0x0010,
// Print whenever data is transferred to the device
OMP_INFOTYPE_DATA_TRANSFER = 0x0020,
// Enable every flag.
OMP_INFOTYPE_ALL = 0xffffffff,
};

View file

@ -420,6 +420,18 @@ int32_t DeviceTy::deleteData(void *TgtPtrBegin) {
// Submit data to device
int32_t DeviceTy::submitData(void *TgtPtrBegin, void *HstPtrBegin, int64_t Size,
AsyncInfoTy &AsyncInfo) {
if (getInfoLevel() & OMP_INFOTYPE_DATA_TRANSFER) {
LookupResult LR = lookupMapping(HstPtrBegin, Size);
auto *HT = &*LR.Entry;
INFO(OMP_INFOTYPE_DATA_TRANSFER, DeviceID,
"Copying data from host to device, HstPtr=" DPxMOD ", TgtPtr=" DPxMOD
", Size=%" PRId64 ", Name=%s\n",
DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin), Size,
(HT && HT->HstPtrName) ? getNameFromMapping(HT->HstPtrName).c_str()
: "unknown");
}
if (!AsyncInfo || !RTL->data_submit_async || !RTL->synchronize)
return RTL->data_submit(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size);
else
@ -430,6 +442,17 @@ int32_t DeviceTy::submitData(void *TgtPtrBegin, void *HstPtrBegin, int64_t Size,
// Retrieve data from device
int32_t DeviceTy::retrieveData(void *HstPtrBegin, void *TgtPtrBegin,
int64_t Size, AsyncInfoTy &AsyncInfo) {
if (getInfoLevel() & OMP_INFOTYPE_DATA_TRANSFER) {
LookupResult LR = lookupMapping(HstPtrBegin, Size);
auto *HT = &*LR.Entry;
INFO(OMP_INFOTYPE_DATA_TRANSFER, DeviceID,
"Copying data from device to host, TgtPtr=" DPxMOD ", HstPtr=" DPxMOD
", Size=%" PRId64 ", Name=%s\n",
DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin), Size,
(HT && HT->HstPtrName) ? getNameFromMapping(HT->HstPtrName).c_str()
: "unknown");
}
if (!RTL->data_retrieve_async || !RTL->synchronize)
return RTL->data_retrieve(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size);
else

View file

@ -1,4 +1,4 @@
// RUN: %libomptarget-compile-nvptx64-nvidia-cuda -gline-tables-only && env LIBOMPTARGET_INFO=31 %libomptarget-run-nvptx64-nvidia-cuda 2>&1 | %fcheck-nvptx64-nvidia-cuda -allow-empty -check-prefix=INFO
// RUN: %libomptarget-compile-nvptx64-nvidia-cuda -gline-tables-only && env LIBOMPTARGET_INFO=63 %libomptarget-run-nvptx64-nvidia-cuda 2>&1 | %fcheck-nvptx64-nvidia-cuda -allow-empty -check-prefix=INFO
// REQUIRES: nvptx64-nvidia-cuda
#include <stdio.h>
@ -14,28 +14,34 @@ int main() {
int C[N];
int val = 1;
// INFO: CUDA device 0 info: Device supports up to {{.*}} CUDA blocks and {{.*}} threads with a warp size of {{.*}}
// INFO: Libomptarget device 0 info: Entering OpenMP data region at info.c:{{[0-9]+}}:1 with 3 arguments:
// INFO: CUDA device 0 info: Device supports up to {{[0-9]+}} CUDA blocks and {{[0-9]+}} threads with a warp size of {{[0-9]+}}
// INFO: Libomptarget device 0 info: Entering OpenMP data region at info.c:{{[0-9]+}}:{{[0-9]+}} with 3 arguments:
// INFO: Libomptarget device 0 info: alloc(A[0:64])[256]
// INFO: Libomptarget device 0 info: tofrom(B[0:64])[256]
// INFO: Libomptarget device 0 info: to(C[0:64])[256]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=A[0:64]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=B[0:64]
// INFO: Libomptarget device 0 info: Copying data from host to device, HstPtr={{.*}}, TgtPtr={{.*}}, Size=256, Name=B[0:64]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=C[0:64]
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:1:
// INFO: Libomptarget device 0 info: Copying data from host to device, HstPtr={{.*}}, TgtPtr={{.*}}, Size=256, Name=C[0:64]
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:{{[0-9]+}}:
// INFO: Libomptarget device 0 info: Host Ptr Target Ptr Size (B) RefCount Declaration
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 C[0:64] at info.c:{{[0-9]+}}:7
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 B[0:64] at info.c:{{[0-9]+}}:7
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 A[0:64] at info.c:{{[0-9]+}}:7
// INFO: Libomptarget device 0 info: Entering OpenMP kernel at info.c:{{[0-9]+}}:1 with 1 arguments:
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 C[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 B[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 A[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
// INFO: Libomptarget device 0 info: Entering OpenMP kernel at info.c:{{[0-9]+}}:{{[0-9]+}} with 1 arguments:
// INFO: Libomptarget device 0 info: firstprivate(val)[4]
// INFO: CUDA device 0 info: Launching kernel {{.*}} with {{.*}} and {{.*}} threads in {{.*}} mode
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:1:
// INFO: CUDA device 0 info: Launching kernel __omp_offloading_{{.*}}main{{.*}} with {{[0-9]+}} blocks and {{[0-9]+}} threads in Generic mode
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:{{[0-9]+}}:
// INFO: Libomptarget device 0 info: Host Ptr Target Ptr Size (B) RefCount Declaration
// INFO: Libomptarget device 0 info: 0x{{.*}} 0x{{.*}} 256 1 C[0:64] at info.c:{{[0-9]+}}:7
// INFO: Libomptarget device 0 info: 0x{{.*}} 0x{{.*}} 256 1 B[0:64] at info.c:{{[0-9]+}}:7
// INFO: Libomptarget device 0 info: 0x{{.*}} 0x{{.*}} 256 1 A[0:64] at info.c:{{[0-9]+}}:7
// INFO: Libomptarget device 0 info: Exiting OpenMP data region at info.c:{{[0-9]+}}:1
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 C[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 B[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
// INFO: Libomptarget device 0 info: {{.*}} {{.*}} 256 1 A[0:64] at info.c:{{[0-9]+}}:{{[0-9]+}}
// INFO: Libomptarget device 0 info: Exiting OpenMP data region at info.c:{{[0-9]+}}:{{[0-9]+}} with 3 arguments:
// INFO: Libomptarget device 0 info: alloc(A[0:64])[256]
// INFO: Libomptarget device 0 info: tofrom(B[0:64])[256]
// INFO: Libomptarget device 0 info: to(C[0:64])[256]
// INFO: Libomptarget device 0 info: Copying data from device to host, TgtPtr={{.*}}, HstPtr={{.*}}, Size=256, Name=B[0:64]
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=C[0:64]
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=B[0:64]
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=A[0:64]