[MachO] Don't fold compact unwind entries with LSDA

Folding them will cause the unwinder to compute the incorrect function
start address for the folded entries, which in turn will cause the
personality function to interpret the LSDA incorrectly and break
exception handling.

You can verify the end-to-end flow by creating a simple C++ file:
```
void h();
int main() { h(); }
```

and then linking this file against the liblsda.dylib produced by the
test case added here. Before this change, running the resulting program
would result in a program termination with an uncaught exception.
Afterwards, it works correctly.

Reviewed By: #lld-macho, thevinster

Differential Revision: https://reviews.llvm.org/D132845

(cherry picked from commit 56bd3185cdd8d79731acd6c75bf41869284a12ed)
This commit is contained in:
Shoaib Meenai 2022-08-29 01:09:56 +05:00 committed by Tobias Hieta
parent a5ae700c67
commit 6fe69891d1
2 changed files with 184 additions and 12 deletions

View file

@ -196,13 +196,13 @@ UnwindInfoSection::UnwindInfoSection()
// Record function symbols that may need entries emitted in __unwind_info, which
// stores unwind data for address ranges.
//
// Note that if several adjacent functions have the same unwind encoding, LSDA,
// and personality function, they share one unwind entry. For this to work,
// functions without unwind info need explicit "no unwind info" unwind entries
// -- else the unwinder would think they have the unwind info of the closest
// function with unwind info right before in the image. Thus, we add function
// symbols for each unique address regardless of whether they have associated
// unwind info.
// Note that if several adjacent functions have the same unwind encoding and
// personality function and no LSDA, they share one unwind entry. For this to
// work, functions without unwind info need explicit "no unwind info" unwind
// entries -- else the unwinder would think they have the unwind info of the
// closest function with unwind info right before in the image. Thus, we add
// function symbols for each unique address regardless of whether they have
// associated unwind info.
void UnwindInfoSection::addSymbol(const Defined *d) {
if (d->unwindEntry)
allEntriesAreOmitted = false;
@ -427,9 +427,9 @@ void UnwindInfoSectionImpl::finalize() {
// assigned, so we can relocate the __LD,__compact_unwind entries
// into a temporary buffer. Relocation is necessary in order to sort
// the CU entries by function address. Sorting is necessary so that
// we can fold adjacent CU entries with identical
// encoding+personality+lsda. Folding is necessary because it reduces
// the number of CU entries by as much as 3 orders of magnitude!
// we can fold adjacent CU entries with identical encoding+personality
// and without any LSDA. Folding is necessary because it reduces the
// number of CU entries by as much as 3 orders of magnitude!
cuEntries.resize(symbols.size());
// The "map" part of the symbols MapVector was only needed for deduplication
// in addSymbol(). Now that we are done adding, move the contents to a plain
@ -445,7 +445,7 @@ void UnwindInfoSectionImpl::finalize() {
return cuEntries[a].functionAddress < cuEntries[b].functionAddress;
});
// Fold adjacent entries with matching encoding+personality+lsda
// Fold adjacent entries with matching encoding+personality and without LSDA
// We use three iterators on the same cuIndices to fold in-situ:
// (1) `foldBegin` is the first of a potential sequence of matching entries
// (2) `foldEnd` is the first non-matching entry after `foldBegin`.
@ -455,11 +455,32 @@ void UnwindInfoSectionImpl::finalize() {
auto foldWrite = cuIndices.begin();
for (auto foldBegin = cuIndices.begin(); foldBegin < cuIndices.end();) {
auto foldEnd = foldBegin;
// Common LSDA encodings (e.g. for C++ and Objective-C) contain offsets from
// a base address. The base address is normally not contained directly in
// the LSDA, and in that case, the personality function treats the starting
// address of the function (which is computed by the unwinder) as the base
// address and interprets the LSDA accordingly. The unwinder computes the
// starting address of a function as the address associated with its CU
// entry. For this reason, we cannot fold adjacent entries if they have an
// LSDA, because folding would make the unwinder compute the wrong starting
// address for the functions with the folded entries, which in turn would
// cause the personality function to misinterpret the LSDA for those
// functions. In the very rare case where the base address is encoded
// directly in the LSDA, two functions at different addresses would
// necessarily have different LSDAs, so their CU entries would not have been
// folded anyway.
while (++foldEnd < cuIndices.end() &&
cuEntries[*foldBegin].encoding == cuEntries[*foldEnd].encoding &&
!cuEntries[*foldBegin].lsda && !cuEntries[*foldEnd].lsda &&
// If we've gotten to this point, we don't have an LSDA, which should
// also imply that we don't have a personality function, since in all
// likelihood a personality function needs the LSDA to do anything
// useful. It can be technically valid to have a personality function
// and no LSDA though (e.g. the C++ personality __gxx_personality_v0
// is just a no-op without LSDA), so we still check for personality
// function equivalence to handle that case.
cuEntries[*foldBegin].personality ==
cuEntries[*foldEnd].personality &&
cuEntries[*foldBegin].lsda == cuEntries[*foldEnd].lsda &&
canFoldEncoding(cuEntries[*foldEnd].encoding))
;
*foldWrite++ = *foldBegin;

View file

@ -0,0 +1,151 @@
## Verify that the compact unwind entries for two functions with identical
## unwind information and LSDA aren't folded together; see the comment in
## UnwindInfoSectionImpl::finalize for why.
# REQUIRES: x86
# RUN: rm -rf %t; mkdir %t
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-macos11.0 -o %t/lsda.o %s
# RUN: %lld -dylib --icf=all -lSystem -lc++ -o %t/liblsda.dylib %t/lsda.o
# RUN: llvm-objdump --macho --syms --unwind-info %t/liblsda.dylib | FileCheck %s
## Check that f and g have the same unwind encoding and LSDA offset (we need to
## link with ICF above in order to get the LSDA deduplicated), and that their
## compact unwind entries aren't folded.
# CHECK-LABEL: SYMBOL TABLE:
# CHECK: [[#%x,G_ADDR:]] {{.*}} __Z1gv
# CHECK: [[#%x,H_ADDR:]] {{.*}} __Z1hv
# CHECK-LABEL: Contents of __unwind_info section:
# CHECK: LSDA descriptors
# CHECK-NEXT: [0]: function offset=[[#%#.8x,G_ADDR]], LSDA offset=[[#%#x,LSDA:]]
# CHECK-NEXT: [1]: function offset=[[#%#.8x,H_ADDR]], LSDA offset=[[#%#.8x,LSDA]]
# CHECK-NEXT: Second level indices:
# CHECK: [1]: function offset=[[#%#.8x,G_ADDR]], encoding[0]=[[#%#x,ENCODING:]]
# CHECK: [2]: function offset=[[#%#.8x,H_ADDR]], encoding[0]=[[#%#.8x,ENCODING]]
## Generated from the following C++ code built with:
## clang -target x86_64-apple-macosx11.0 -S -Os -fno-inline -fomit-frame-pointer
## void f(int i) { throw i; }
## void g() { try { f(1); } catch (int) {} }
## void h() { try { f(2); } catch (int) {} }
.section __TEXT,__text,regular,pure_instructions
.globl __Z1fi ## -- Begin function _Z1fi
__Z1fi: ## @_Z1fi
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset %rbx, -16
movl %edi, %ebx
movl $4, %edi
callq ___cxa_allocate_exception
movl %ebx, (%rax)
movq __ZTIi@GOTPCREL(%rip), %rsi
movq %rax, %rdi
xorl %edx, %edx
callq ___cxa_throw
.cfi_endproc
## -- End function
.globl __Z1gv ## -- Begin function _Z1gv
__Z1gv: ## @_Z1gv
Lfunc_begin0:
.cfi_startproc
.cfi_personality 155, ___gxx_personality_v0
.cfi_lsda 16, Lexception0
pushq %rax
.cfi_def_cfa_offset 16
Ltmp0:
movl $1, %edi
callq __Z1fi
Ltmp1:
ud2
LBB1_2: ## %lpad
Ltmp2:
movq %rax, %rdi
callq ___cxa_begin_catch
popq %rax
jmp ___cxa_end_catch ## TAILCALL
Lfunc_end0:
.cfi_endproc
.section __TEXT,__gcc_except_tab
.p2align 2, 0x0
GCC_except_table1:
Lexception0:
.byte 255 ## @LPStart Encoding = omit
.byte 155 ## @TType Encoding = indirect pcrel sdata4
.uleb128 Lttbase0-Lttbaseref0
Lttbaseref0:
.byte 1 ## Call site Encoding = uleb128
.uleb128 Lcst_end0-Lcst_begin0
Lcst_begin0:
.uleb128 Ltmp0-Lfunc_begin0 ## >> Call Site 1 <<
.uleb128 Ltmp1-Ltmp0 ## Call between Ltmp0 and Ltmp1
.uleb128 Ltmp2-Lfunc_begin0 ## jumps to Ltmp2
.byte 1 ## On action: 1
.uleb128 Ltmp1-Lfunc_begin0 ## >> Call Site 2 <<
.uleb128 Lfunc_end0-Ltmp1 ## Call between Ltmp1 and Lfunc_end0
.byte 0 ## has no landing pad
.byte 0 ## On action: cleanup
Lcst_end0:
.byte 1 ## >> Action Record 1 <<
## Catch TypeInfo 1
.byte 0 ## No further actions
.p2align 2, 0x0
## >> Catch TypeInfos <<
.long __ZTIi@GOTPCREL+4 ## TypeInfo 1
Lttbase0:
.p2align 2, 0x0
## -- End function
.section __TEXT,__text,regular,pure_instructions
.globl __Z1hv ## -- Begin function _Z1hv
__Z1hv: ## @_Z1hv
Lfunc_begin1:
.cfi_startproc
.cfi_personality 155, ___gxx_personality_v0
.cfi_lsda 16, Lexception1
pushq %rax
.cfi_def_cfa_offset 16
Ltmp3:
movl $2, %edi
callq __Z1fi
Ltmp4:
ud2
LBB2_2: ## %lpad
Ltmp5:
movq %rax, %rdi
callq ___cxa_begin_catch
popq %rax
jmp ___cxa_end_catch ## TAILCALL
Lfunc_end1:
.cfi_endproc
.section __TEXT,__gcc_except_tab
.p2align 2, 0x0
GCC_except_table2:
Lexception1:
.byte 255 ## @LPStart Encoding = omit
.byte 155 ## @TType Encoding = indirect pcrel sdata4
.uleb128 Lttbase1-Lttbaseref1
Lttbaseref1:
.byte 1 ## Call site Encoding = uleb128
.uleb128 Lcst_end1-Lcst_begin1
Lcst_begin1:
.uleb128 Ltmp3-Lfunc_begin1 ## >> Call Site 1 <<
.uleb128 Ltmp4-Ltmp3 ## Call between Ltmp3 and Ltmp4
.uleb128 Ltmp5-Lfunc_begin1 ## jumps to Ltmp5
.byte 1 ## On action: 1
.uleb128 Ltmp4-Lfunc_begin1 ## >> Call Site 2 <<
.uleb128 Lfunc_end1-Ltmp4 ## Call between Ltmp4 and Lfunc_end1
.byte 0 ## has no landing pad
.byte 0 ## On action: cleanup
Lcst_end1:
.byte 1 ## >> Action Record 1 <<
## Catch TypeInfo 1
.byte 0 ## No further actions
.p2align 2, 0x0
## >> Catch TypeInfos <<
.long __ZTIi@GOTPCREL+4 ## TypeInfo 1
Lttbase1:
.p2align 2, 0x0
## -- End function
.subsections_via_symbols