[OpenMP] Improve ref count debug messages

For example, without this patch:

```
$ cat test.c
int main() {
  int x;
  #pragma omp target enter data map(alloc: x)
  #pragma omp target exit data map(release: x)
  ;
  return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1
```

There are two problems in this example:

* `RefCount` is not reported when a mapping is created, but it might
  be 1 or infinite.  In this case, because it's created by `omp target
  enter data`, it's 1.  Seeing that would make later `RefCount`
  messages easier to understand.
* `RefCount` is still 1 at the `omp target exit data`, but it's
  reported as `updated`.  The reason it's still 1 is that, upon
  deletions, the reference count is generally not updated in
  `DeviceTy::getTgtPtrBegin`, where the report is produced.  Instead,
  it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually
  removed from the mapping table.

This patch makes the following changes:

* Report the reference count when creating a mapping.
* Where an existing mapping is reported, always report a reference
  count action:
    * `update suppressed` when `UpdateRefCount=false`
    * `incremented`
    * `decremented`
    * `deferred final decrement`, which replaces the misleading
      `updated` in the above example
* Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does
  not zero the reference count.  (Please advise if these comments miss
  the point.)
* For unified shared memory, don't report confusing messages like
  `RefCount=` or `RefCount= updated` given that reference counts are
  irrelevant in this case.  Instead, just report `for unified shared
  memory`.
* Use `INFO` not `DP` consistently for `Mapping exists` messages.
* Fix device table dumps to print `INF` instead of `-1` for an
  infinite reference count.

Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D104559
This commit is contained in:
Joel E. Denny 2021-06-23 09:37:54 -04:00
parent 0c0628c92c
commit 48421ac441
5 changed files with 112 additions and 68 deletions

View file

@ -141,50 +141,66 @@ provide the following output from the runtime library.
Info: Entering OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
Info: to(X[0:N])[16384]
Info: tofrom(Y[0:N])[16384]
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e99000,
TgtPtrBegin=0x00007f15dc600000, Size=16384, Name=X[0:N]
Info: Copying data from host to device, HstPtr=0x00007ffde9e99000,
TgtPtr=0x00007f15dc600000, Size=16384, Name=X[0:N]
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e95000,
TgtPtrBegin=0x00007f15dc604000, Size=16384, Name=Y[0:N]
Info: Copying data from host to device, HstPtr=0x00007ffde9e95000,
TgtPtr=0x00007f15dc604000, Size=16384, Name=Y[0:N]
Info: Creating new map entry with HstPtrBegin=0x00007fff0d259a40,
TgtPtrBegin=0x00007fdba5800000, Size=16384, RefCount=1, Name=X[0:N]
Info: Copying data from host to device, HstPtr=0x00007fff0d259a40,
TgtPtr=0x00007fdba5800000, Size=16384, Name=X[0:N]
Info: Creating new map entry with HstPtrBegin=0x00007fff0d255a40,
TgtPtrBegin=0x00007fdba5804000, Size=16384, RefCount=1, Name=Y[0:N]
Info: Copying data from host to device, HstPtr=0x00007fff0d255a40,
TgtPtr=0x00007fdba5804000, Size=16384, Name=Y[0:N]
Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:14:1:
Info: Host Ptr Target Ptr Size (B) RefCount Declaration
Info: 0x00007ffde9e95000 0x00007f15dc604000 16384 1 Y[0:N] at zaxpy.cpp:13:17
Info: 0x00007ffde9e99000 0x00007f15dc600000 16384 1 X[0:N] at zaxpy.cpp:13:11
Info: 0x00007fff0d255a40 0x00007fdba5804000 16384 1 Y[0:N] at zaxpy.cpp:13:17
Info: 0x00007fff0d259a40 0x00007fdba5800000 16384 1 X[0:N] at zaxpy.cpp:13:11
Info: Entering OpenMP kernel at zaxpy.cpp:6:1 with 4 arguments:
Info: firstprivate(N)[8] (implicit)
Info: use_address(Y)[0] (implicit)
Info: tofrom(D)[16] (implicit)
Info: use_address(X)[0] (implicit)
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffde9e95000,
TgtPtrBegin=0x00007f15dc604000, Size=0, updated RefCount=2, Name=Y
Info: Creating new map entry with HstPtrBegin=0x00007ffde9e94fb0,
TgtPtrBegin=0x00007f15dc608000, Size=16, Name=D
Info: Copying data from host to device, HstPtr=0x00007ffde9e94fb0,
TgtPtr=0x00007f15dc608000, Size=16, Name=D
Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffde9e99000,
TgtPtrBegin=0x00007f15dc600000, Size=0, updated RefCount=2, Name=X
Info: Launching kernel __omp_offloading_fd02_e25f6e76__Z5zaxpyPSt7complexIdES1_S0_m_l6
Info: Mapping exists (implicit) with HstPtrBegin=0x00007fff0d255a40,
TgtPtrBegin=0x00007fdba5804000, Size=0, RefCount=2 (incremented), Name=Y
Info: Creating new map entry with HstPtrBegin=0x00007fff0d2559f0,
TgtPtrBegin=0x00007fdba5808000, Size=16, RefCount=1, Name=D
Info: Copying data from host to device, HstPtr=0x00007fff0d2559f0,
TgtPtr=0x00007fdba5808000, Size=16, Name=D
Info: Mapping exists (implicit) with HstPtrBegin=0x00007fff0d259a40,
TgtPtrBegin=0x00007fdba5800000, Size=0, RefCount=2 (incremented), Name=X
Info: Mapping exists with HstPtrBegin=0x00007fff0d255a40,
TgtPtrBegin=0x00007fdba5804000, Size=0, RefCount=2 (update suppressed)
Info: Mapping exists with HstPtrBegin=0x00007fff0d2559f0,
TgtPtrBegin=0x00007fdba5808000, Size=16, RefCount=1 (update suppressed)
Info: Mapping exists with HstPtrBegin=0x00007fff0d259a40,
TgtPtrBegin=0x00007fdba5800000, Size=0, RefCount=2 (update suppressed)
Info: Launching kernel __omp_offloading_10305_c08c86__Z5zaxpyPSt7complexIdES1_S0_m_l6
with 8 blocks and 128 threads in SPMD mode
Info: Copying data from device to host, TgtPtr=0x00007f15dc608000,
HstPtr=0x00007ffde9e94fb0, Size=16, Name=D
Info: Removing map entry with HstPtrBegin=0x00007ffde9e94fb0,
TgtPtrBegin=0x00007f15dc608000, Size=16, Name=D
Info: Mapping exists with HstPtrBegin=0x00007fff0d259a40,
TgtPtrBegin=0x00007fdba5800000, Size=0, RefCount=1 (decremented)
Info: Mapping exists with HstPtrBegin=0x00007fff0d2559f0,
TgtPtrBegin=0x00007fdba5808000, Size=16, RefCount=1 (deferred final decrement)
Info: Copying data from device to host, TgtPtr=0x00007fdba5808000,
HstPtr=0x00007fff0d2559f0, Size=16, Name=D
Info: Mapping exists with HstPtrBegin=0x00007fff0d255a40,
TgtPtrBegin=0x00007fdba5804000, Size=0, RefCount=1 (decremented)
Info: Removing map entry with HstPtrBegin=0x00007fff0d2559f0,
TgtPtrBegin=0x00007fdba5808000, Size=16, Name=D
Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:6:1:
Info: Host Ptr Target Ptr Size (B) RefCount Declaration
Info: 0x00007ffde9e95000 0x00007f15dc604000 16384 1 Y[0:N] at zaxpy.cpp:13:17
Info: 0x00007ffde9e99000 0x00007f15dc600000 16384 1 X[0:N] at zaxpy.cpp:13:11
Info: 0x00007fff0d255a40 0x00007fdba5804000 16384 1 Y[0:N] at zaxpy.cpp:13:17
Info: 0x00007fff0d259a40 0x00007fdba5800000 16384 1 X[0:N] at zaxpy.cpp:13:11
Info: Exiting OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
Info: to(X[0:N])[16384]
Info: tofrom(Y[0:N])[16384]
Info: Copying data from device to host, TgtPtr=0x00007f15dc604000,
HstPtr=0x00007ffde9e95000, Size=16384, Name=Y[0:N]
Info: Removing map entry with HstPtrBegin=0x00007ffde9e95000,
TgtPtrBegin=0x00007f15dc604000, Size=16384, Name=Y[0:N]
Info: Removing map entry with HstPtrBegin=0x00007ffde9e99000,
TgtPtrBegin=0x00007f15dc600000, Size=16384, Name=X[0:N]
Info: Mapping exists with HstPtrBegin=0x00007fff0d255a40,
TgtPtrBegin=0x00007fdba5804000, Size=16384, RefCount=1 (deferred final decrement)
Info: Copying data from device to host, TgtPtr=0x00007fdba5804000,
HstPtr=0x00007fff0d255a40, Size=16384, Name=Y[0:N]
Info: Mapping exists with HstPtrBegin=0x00007fff0d259a40,
TgtPtrBegin=0x00007fdba5800000, Size=16384, RefCount=1 (deferred final decrement)
Info: Removing map entry with HstPtrBegin=0x00007fff0d255a40,
TgtPtrBegin=0x00007fdba5804000, Size=16384, Name=Y[0:N]
Info: Removing map entry with HstPtrBegin=0x00007fff0d259a40,
TgtPtrBegin=0x00007fdba5800000, Size=16384, Name=X[0:N]
From this information, we can see the OpenMP kernel being launched on the CUDA
device with enough threads and blocks for all ``1024`` iterations of the loop in

View file

@ -76,16 +76,20 @@ int DeviceTy::associatePtr(void *HstPtrBegin, void *TgtPtrBegin, int64_t Size) {
}
// Mapping does not exist, allocate it with refCount=INF
auto Res = HostDataToTargetMap.emplace(
(uintptr_t)HstPtrBegin /*HstPtrBase*/,
(uintptr_t)HstPtrBegin /*HstPtrBegin*/,
(uintptr_t)HstPtrBegin + Size /*HstPtrEnd*/,
(uintptr_t)TgtPtrBegin /*TgtPtrBegin*/, nullptr, true /*IsRefCountINF*/);
auto NewEntry = Res.first;
const HostDataToTargetTy &newEntry =
*HostDataToTargetMap
.emplace(
/*HstPtrBase=*/(uintptr_t)HstPtrBegin,
/*HstPtrBegin=*/(uintptr_t)HstPtrBegin,
/*HstPtrEnd=*/(uintptr_t)HstPtrBegin + Size,
/*TgtPtrBegin=*/(uintptr_t)TgtPtrBegin, /*Name=*/nullptr,
/*IsRefCountINF=*/true)
.first;
DP("Creating new map entry: HstBase=" DPxMOD ", HstBegin=" DPxMOD
", HstEnd=" DPxMOD ", TgtBegin=" DPxMOD "\n",
DPxPTR(NewEntry->HstPtrBase), DPxPTR(NewEntry->HstPtrBegin),
DPxPTR(NewEntry->HstPtrEnd), DPxPTR(NewEntry->TgtPtrBegin));
", HstEnd=" DPxMOD ", TgtBegin=" DPxMOD ", RefCount=%s\n",
DPxPTR(newEntry.HstPtrBase), DPxPTR(newEntry.HstPtrBegin),
DPxPTR(newEntry.HstPtrEnd), DPxPTR(newEntry.TgtPtrBegin),
newEntry.refCountToStr().c_str());
DataMapMtx.unlock();
@ -211,18 +215,16 @@ void *DeviceTy::getOrAllocTgtPtr(void *HstPtrBegin, void *HstPtrBase,
((lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter) && IsImplicit)) {
auto &HT = *lr.Entry;
IsNew = false;
if (UpdateRefCount)
HT.incRefCount();
uintptr_t tp = HT.TgtPtrBegin + ((uintptr_t)HstPtrBegin - HT.HstPtrBegin);
INFO(OMP_INFOTYPE_MAPPING_EXISTS, DeviceID,
"Mapping exists%s with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD
", "
"Size=%" PRId64 ",%s RefCount=%s, Name=%s\n",
"Size=%" PRId64 ", RefCount=%s (%s), Name=%s\n",
(IsImplicit ? " (implicit)" : ""), DPxPTR(HstPtrBegin), DPxPTR(tp),
Size, (UpdateRefCount ? " updated" : ""),
HT.isRefCountInf() ? "INF" : std::to_string(HT.getRefCount()).c_str(),
Size, HT.refCountToStr().c_str(),
UpdateRefCount ? "incremented" : "update suppressed",
(HstPtrName) ? getNameFromMapping(HstPtrName).c_str() : "unknown");
rc = (void *)tp;
} else if ((lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter) && !IsImplicit) {
@ -246,9 +248,9 @@ void *DeviceTy::getOrAllocTgtPtr(void *HstPtrBegin, void *HstPtrBase,
// In addition to the mapping rules above, the close map modifier forces the
// mapping of the variable to the device.
if (Size) {
DP("Return HstPtrBegin " DPxMOD " Size=%" PRId64 " RefCount=%s\n",
DPxPTR((uintptr_t)HstPtrBegin), Size,
(UpdateRefCount ? " updated" : ""));
DP("Return HstPtrBegin " DPxMOD " Size=%" PRId64 " for unified shared "
"memory\n",
DPxPTR((uintptr_t)HstPtrBegin), Size);
IsHostPtr = true;
rc = HstPtrBegin;
}
@ -263,13 +265,18 @@ void *DeviceTy::getOrAllocTgtPtr(void *HstPtrBegin, void *HstPtrBase,
// If it is not contained and Size > 0, we should create a new entry for it.
IsNew = true;
uintptr_t tp = (uintptr_t)allocData(Size, HstPtrBegin);
const HostDataToTargetTy &newEntry =
*HostDataToTargetMap
.emplace((uintptr_t)HstPtrBase, (uintptr_t)HstPtrBegin,
(uintptr_t)HstPtrBegin + Size, tp, HstPtrName)
.first;
INFO(OMP_INFOTYPE_MAPPING_CHANGED, DeviceID,
"Creating new map entry with "
"HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", Size=%ld, Name=%s\n",
"HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", Size=%ld, "
"RefCount=%s, Name=%s\n",
DPxPTR(HstPtrBegin), DPxPTR(tp), Size,
newEntry.refCountToStr().c_str(),
(HstPtrName) ? getNameFromMapping(HstPtrName).c_str() : "unknown");
HostDataToTargetMap.emplace((uintptr_t)HstPtrBase, (uintptr_t)HstPtrBegin,
(uintptr_t)HstPtrBegin + Size, tp, HstPtrName);
rc = (void *)tp;
}
@ -292,25 +299,35 @@ void *DeviceTy::getTgtPtrBegin(void *HstPtrBegin, int64_t Size, bool &IsLast,
if (lr.Flags.IsContained ||
(!MustContain && (lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter))) {
auto &HT = *lr.Entry;
// We do not decrement the reference count to zero here. deallocTgtPtr does
// that atomically with removing the mapping. Otherwise, before this thread
// removed the mapping in deallocTgtPtr, another thread could retrieve the
// mapping, increment and decrement back to zero, and then both threads
// would try to remove the mapping, resulting in a double free.
IsLast = HT.getRefCount() == 1;
if (!IsLast && UpdateRefCount)
const char *RefCountAction;
if (!UpdateRefCount)
RefCountAction = "update suppressed";
else if (IsLast)
RefCountAction = "deferred final decrement";
else {
RefCountAction = "decremented";
HT.decRefCount();
}
uintptr_t tp = HT.TgtPtrBegin + ((uintptr_t)HstPtrBegin - HT.HstPtrBegin);
DP("Mapping exists with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", "
"Size=%" PRId64 ",%s RefCount=%s\n",
DPxPTR(HstPtrBegin), DPxPTR(tp), Size,
(UpdateRefCount ? " updated" : ""),
HT.isRefCountInf() ? "INF" : std::to_string(HT.getRefCount()).c_str());
INFO(OMP_INFOTYPE_MAPPING_EXISTS, DeviceID,
"Mapping exists with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", "
"Size=%" PRId64 ", RefCount=%s (%s)\n",
DPxPTR(HstPtrBegin), DPxPTR(tp), Size, HT.refCountToStr().c_str(),
RefCountAction);
rc = (void *)tp;
} else if (PM->RTLs.RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY) {
// If the value isn't found in the mapping and unified shared memory
// is on then it means we have stumbled upon a value which we need to
// use directly from the host.
DP("Get HstPtrBegin " DPxMOD " Size=%" PRId64 " RefCount=%s\n",
DPxPTR((uintptr_t)HstPtrBegin), Size,
(UpdateRefCount ? " updated" : ""));
DP("Get HstPtrBegin " DPxMOD " Size=%" PRId64 " for unified shared "
"memory\n",
DPxPTR((uintptr_t)HstPtrBegin), Size);
IsHostPtr = true;
rc = HstPtrBegin;
}

View file

@ -88,6 +88,10 @@ public:
}
bool isRefCountInf() const { return RefCount == INFRefCount; }
std::string refCountToStr() const {
return isRefCountInf() ? "INF" : std::to_string(getRefCount());
}
};
typedef uintptr_t HstPtrBeginTy;

View file

@ -118,11 +118,11 @@ static inline void dumpTargetPointerMappings(const ident_t *Loc,
for (const auto &HostTargetMap : Device.HostDataToTargetMap) {
SourceInfo Info(HostTargetMap.HstPtrName);
INFO(OMP_INFOTYPE_ALL, Device.DeviceID,
DPxMOD " " DPxMOD " %-8" PRIuPTR " %-8" PRId64 " %s at %s:%d:%d\n",
DPxMOD " " DPxMOD " %-8" PRIuPTR " %-8s %s at %s:%d:%d\n",
DPxPTR(HostTargetMap.HstPtrBegin), DPxPTR(HostTargetMap.TgtPtrBegin),
HostTargetMap.HstPtrEnd - HostTargetMap.HstPtrBegin,
HostTargetMap.getRefCount(), Info.getName(), Info.getFilename(),
Info.getLine(), Info.getColumn());
HostTargetMap.refCountToStr().c_str(), Info.getName(),
Info.getFilename(), Info.getLine(), Info.getColumn());
}
Device.DataMapMtx.unlock();
}

View file

@ -6,6 +6,10 @@
#define N 64
#pragma omp declare target
int global;
#pragma omp end declare target
extern void __tgt_set_info_flag(unsigned);
int main() {
@ -19,10 +23,10 @@ int main() {
// INFO: Libomptarget device 0 info: alloc(A[0:64])[256]
// INFO: Libomptarget device 0 info: tofrom(B[0:64])[256]
// INFO: Libomptarget device 0 info: to(C[0:64])[256]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=A[0:64]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=B[0:64]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, RefCount=1, Name=A[0:64]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, RefCount=1, Name=B[0:64]
// INFO: Libomptarget device 0 info: Copying data from host to device, HstPtr={{.*}}, TgtPtr={{.*}}, Size=256, Name=B[0:64]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=C[0:64]
// INFO: Libomptarget device 0 info: Creating new map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, RefCount=1, Name=C[0:64]
// INFO: Libomptarget device 0 info: Copying data from host to device, HstPtr={{.*}}, TgtPtr={{.*}}, Size=256, Name=C[0:64]
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:{{[0-9]+}}:{{[0-9]+}}:
// INFO: Libomptarget device 0 info: Host Ptr Target Ptr Size (B) RefCount Declaration
@ -45,6 +49,9 @@ int main() {
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=C[0:64]
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=B[0:64]
// INFO: Libomptarget device 0 info: Removing map entry with HstPtrBegin={{.*}}, TgtPtrBegin={{.*}}, Size=256, Name=A[0:64]
// INFO: Libomptarget device 0 info: OpenMP Host-Device pointer mappings after block at info.c:[[#%u,]]:[[#%u,]]:
// INFO: Libomptarget device 0 info: Host Ptr Target Ptr Size (B) RefCount Declaration
// INFO: Libomptarget device 0 info: [[#%#x,]] [[#%#x,]] 4 INF unknown at unknown:0:0
#pragma omp target data map(alloc:A[0:N]) map(tofrom:B[0:N]) map(to:C[0:N])
#pragma omp target firstprivate(val)
{ val = 1; }