当前位置:   article > 正文

CoreCLR源码探索(八) JIT的工作原理(详解篇)

coreclr

上一篇我们对CoreCLR中的JIT有了一个基础的了解,
这一篇我们将更详细分析JIT的实现.

JIT的实现代码主要在https://github.com/dotnet/coreclr/tree/master/src/jit下,
要对一个的函数的JIT过程进行详细分析, 最好的办法是查看JitDump.
查看JitDump需要自己编译一个Debug版本的CoreCLR, windows可以看这里, linux可以看这里,
编译完以后定义环境变量COMPlus_JitDump=Main, Main可以换成其他函数的名称, 然后使用该Debug版本的CoreCLR执行程序即可.

JitDump的例子可以看这里, 包含了Debug模式和Release模式的输出.

接下来我们来结合代码一步步的看JIT中的各个过程.
以下的代码基于CoreCLR 1.1.0和x86/x64分析, 新版本可能会有变化.
(为什么是1.1.0? 因为JIT部分我看了半年时间, 开始看的时候2.0还未出来)

JIT的触发

在上一篇中我提到了, 触发JIT编译会在第一次调用函数时, 会从桩(Stub)触发:

881857-20171028110034726-907990171.jpg

这就是JIT Stub实际的样子, 函数第一次调用前Fixup Precode的状态:

  1. Fixup Precode:
  2. (lldb) di --frame --bytes
  3. -> 0x7fff7c21f5a8: e8 2b 6c fe ff callq 0x7fff7c2061d8
  4. 0x7fff7c21f5ad: 5e popq %rsi
  5. 0x7fff7c21f5ae: 19 05 e8 23 6c fe sbbl %eax, -0x193dc18(%rip)
  6. 0x7fff7c21f5b4: ff 5e a8 lcalll *-0x58(%rsi)
  7. 0x7fff7c21f5b7: 04 e8 addb $-0x18, %al
  8. 0x7fff7c21f5b9: 1b 6c fe ff sbbl -0x1(%rsi,%rdi,8), %ebp
  9. 0x7fff7c21f5bd: 5e popq %rsi
  10. 0x7fff7c21f5be: 00 03 addb %al, (%rbx)
  11. 0x7fff7c21f5c0: e8 13 6c fe ff callq 0x7fff7c2061d8
  12. 0x7fff7c21f5c5: 5e popq %rsi
  13. 0x7fff7c21f5c6: b0 02 movb $0x2, %al
  14. (lldb) di --frame --bytes
  15. -> 0x7fff7c2061d8: e9 13 3f 9d 79 jmp 0x7ffff5bda0f0 ; PrecodeFixupThunk
  16. 0x7fff7c2061dd: cc int3
  17. 0x7fff7c2061de: cc int3
  18. 0x7fff7c2061df: cc int3
  19. 0x7fff7c2061e0: 49 ba 00 da d0 7b ff 7f 00 00 movabsq $0x7fff7bd0da00, %r10
  20. 0x7fff7c2061ea: 40 e9 e0 ff ff ff jmp 0x7fff7c2061d0

这两段代码只有第一条指令是相关的, 注意callq后面的5e 19 05, 这些并不是汇编指令而是函数的信息, 下面会提到.
接下来跳转到Fixup Precode Chunk, 从这里开始的代码所有函数都会共用:

  1. Fixup Precode Chunk:
  2. (lldb) di --frame --bytes
  3. -> 0x7ffff5bda0f0 <PrecodeFixupThunk>: 58 popq %rax ; rax = 0x7fff7c21f5ad
  4. 0x7ffff5bda0f1 <PrecodeFixupThunk+1>: 4c 0f b6 50 02 movzbq 0x2(%rax), %r10 ; r10 = 0x05 (precode chunk index)
  5. 0x7ffff5bda0f6 <PrecodeFixupThunk+6>: 4c 0f b6 58 01 movzbq 0x1(%rax), %r11 ; r11 = 0x19 (methoddesc chunk index)
  6. 0x7ffff5bda0fb <PrecodeFixupThunk+11>: 4a 8b 44 d0 03 movq 0x3(%rax,%r10,8), %rax ; rax = 0x7fff7bdd5040 (methoddesc chunk)
  7. 0x7ffff5bda100 <PrecodeFixupThunk+16>: 4e 8d 14 d8 leaq (%rax,%r11,8), %r10 ; r10 = 0x7fff7bdd5108 (methoddesc)
  8. 0x7ffff5bda104 <PrecodeFixupThunk+20>: e9 37 ff ff ff jmp 0x7ffff5bda040 ; ThePreStub

这段代码的源代码在vm\amd64\unixasmhelpers.S:

  1. LEAF_ENTRY PrecodeFixupThunk, _TEXT
  2. pop rax // Pop the return address. It points right after the call instruction in the precode.
  3. // Inline computation done by FixupPrecode::GetMethodDesc()
  4. movzx r10,byte ptr [rax+2] // m_PrecodeChunkIndex
  5. movzx r11,byte ptr [rax+1] // m_MethodDescChunkIndex
  6. mov rax,qword ptr [rax+r10*8+3]
  7. lea METHODDESC_REGISTER,[rax+r11*8]
  8. // Tail call to prestub
  9. jmp C_FUNC(ThePreStub)
  10. LEAF_END PrecodeFixupThunk, _TEXT

popq %rax后rax会指向刚才callq后面的地址, 再根据后面储存的索引值可以得到编译函数的MethodDesc, 接下来跳转到The PreStub:

  1. ThePreStub:
  2. (lldb) di --frame --bytes
  3. -> 0x7ffff5bda040 <ThePreStub>: 55 pushq %rbp
  4. 0x7ffff5bda041 <ThePreStub+1>: 48 89 e5 movq %rsp, %rbp
  5. 0x7ffff5bda044 <ThePreStub+4>: 53 pushq %rbx
  6. 0x7ffff5bda045 <ThePreStub+5>: 41 57 pushq %r15
  7. 0x7ffff5bda047 <ThePreStub+7>: 41 56 pushq %r14
  8. 0x7ffff5bda049 <ThePreStub+9>: 41 55 pushq %r13
  9. 0x7ffff5bda04b <ThePreStub+11>: 41 54 pushq %r12
  10. 0x7ffff5bda04d <ThePreStub+13>: 41 51 pushq %r9
  11. 0x7ffff5bda04f <ThePreStub+15>: 41 50 pushq %r8
  12. 0x7ffff5bda051 <ThePreStub+17>: 51 pushq %rcx
  13. 0x7ffff5bda052 <ThePreStub+18>: 52 pushq %rdx
  14. 0x7ffff5bda053 <ThePreStub+19>: 56 pushq %rsi
  15. 0x7ffff5bda054 <ThePreStub+20>: 57 pushq %rdi
  16. 0x7ffff5bda055 <ThePreStub+21>: 48 8d a4 24 78 ff ff ff leaq -0x88(%rsp), %rsp ; allocate transition block
  17. 0x7ffff5bda05d <ThePreStub+29>: 66 0f 7f 04 24 movdqa %xmm0, (%rsp) ; fill transition block
  18. 0x7ffff5bda062 <ThePreStub+34>: 66 0f 7f 4c 24 10 movdqa %xmm1, 0x10(%rsp) ; fill transition block
  19. 0x7ffff5bda068 <ThePreStub+40>: 66 0f 7f 54 24 20 movdqa %xmm2, 0x20(%rsp) ; fill transition block
  20. 0x7ffff5bda06e <ThePreStub+46>: 66 0f 7f 5c 24 30 movdqa %xmm3, 0x30(%rsp) ; fill transition block
  21. 0x7ffff5bda074 <ThePreStub+52>: 66 0f 7f 64 24 40 movdqa %xmm4, 0x40(%rsp) ; fill transition block
  22. 0x7ffff5bda07a <ThePreStub+58>: 66 0f 7f 6c 24 50 movdqa %xmm5, 0x50(%rsp) ; fill transition block
  23. 0x7ffff5bda080 <ThePreStub+64>: 66 0f 7f 74 24 60 movdqa %xmm6, 0x60(%rsp) ; fill transition block
  24. 0x7ffff5bda086 <ThePreStub+70>: 66 0f 7f 7c 24 70 movdqa %xmm7, 0x70(%rsp) ; fill transition block
  25. 0x7ffff5bda08c <ThePreStub+76>: 48 8d bc 24 88 00 00 00 leaq 0x88(%rsp), %rdi ; arg 1 = transition block*
  26. 0x7ffff5bda094 <ThePreStub+84>: 4c 89 d6 movq %r10, %rsi ; arg 2 = methoddesc
  27. 0x7ffff5bda097 <ThePreStub+87>: e8 44 7e 11 00 callq 0x7ffff5cf1ee0 ; PreStubWorker at prestub.cpp:958
  28. 0x7ffff5bda09c <ThePreStub+92>: 66 0f 6f 04 24 movdqa (%rsp), %xmm0
  29. 0x7ffff5bda0a1 <ThePreStub+97>: 66 0f 6f 4c 24 10 movdqa 0x10(%rsp), %xmm1
  30. 0x7ffff5bda0a7 <ThePreStub+103>: 66 0f 6f 54 24 20 movdqa 0x20(%rsp), %xmm2
  31. 0x7ffff5bda0ad <ThePreStub+109>: 66 0f 6f 5c 24 30 movdqa 0x30(%rsp), %xmm3
  32. 0x7ffff5bda0b3 <ThePreStub+115>: 66 0f 6f 64 24 40 movdqa 0x40(%rsp), %xmm4
  33. 0x7ffff5bda0b9 <ThePreStub+121>: 66 0f 6f 6c 24 50 movdqa 0x50(%rsp), %xmm5
  34. 0x7ffff5bda0bf <ThePreStub+127>: 66 0f 6f 74 24 60 movdqa 0x60(%rsp), %xmm6
  35. 0x7ffff5bda0c5 <ThePreStub+133>: 66 0f 6f 7c 24 70 movdqa 0x70(%rsp), %xmm7
  36. 0x7ffff5bda0cb <ThePreStub+139>: 48 8d a4 24 88 00 00 00 leaq 0x88(%rsp), %rsp
  37. 0x7ffff5bda0d3 <ThePreStub+147>: 5f popq %rdi
  38. 0x7ffff5bda0d4 <ThePreStub+148>: 5e popq %rsi
  39. 0x7ffff5bda0d5 <ThePreStub+149>: 5a popq %rdx
  40. 0x7ffff5bda0d6 <ThePreStub+150>: 59 popq %rcx
  41. 0x7ffff5bda0d7 <ThePreStub+151>: 41 58 popq %r8
  42. 0x7ffff5bda0d9 <ThePreStub+153>: 41 59 popq %r9
  43. 0x7ffff5bda0db <ThePreStub+155>: 41 5c popq %r12
  44. 0x7ffff5bda0dd <ThePreStub+157>: 41 5d popq %r13
  45. 0x7ffff5bda0df <ThePreStub+159>: 41 5e popq %r14
  46. 0x7ffff5bda0e1 <ThePreStub+161>: 41 5f popq %r15
  47. 0x7ffff5bda0e3 <ThePreStub+163>: 5b popq %rbx
  48. 0x7ffff5bda0e4 <ThePreStub+164>: 5d popq %rbp
  49. 0x7ffff5bda0e5 <ThePreStub+165>: 48 ff e0 jmpq *%rax
  50. %rax should be patched fixup precode = 0x7fff7c21f5a8
  51. (%rsp) should be the return address before calling "Fixup Precode"

看上去相当长但做的事情很简单, 它的源代码在vm\amd64\theprestubamd64.S:

  1. NESTED_ENTRY ThePreStub, _TEXT, NoHandler
  2. PROLOG_WITH_TRANSITION_BLOCK 0, 0, 0, 0, 0
  3. //
  4. // call PreStubWorker
  5. //
  6. lea rdi, [rsp + __PWTB_TransitionBlock] // pTransitionBlock*
  7. mov rsi, METHODDESC_REGISTER
  8. call C_FUNC(PreStubWorker)
  9. EPILOG_WITH_TRANSITION_BLOCK_TAILCALL
  10. TAILJMP_RAX
  11. NESTED_END ThePreStub, _TEXT

它会备份寄存器到栈, 然后调用PreStubWorker这个函数, 调用完毕以后恢复栈上的寄存器,
跳转到PreStubWorker的返回结果, 也就是打完补丁后的Fixup Precode的地址(0x7fff7c21f5a8).

PreStubWorker是C编写的函数, 它会调用JIT的编译函数, 然后对Fixup Precode打补丁.
打补丁时会读取前面的5e, 5e代表precode的类型是PRECODE_FIXUP, 打补丁的函数是FixupPrecode::SetTargetInterlocked.
打完补丁以后的Fixup Precode如下:

  1. Fixup Precode:
  2. (lldb) di --bytes -s 0x7fff7c21f5a8
  3. 0x7fff7c21f5a8: e9 a3 87 3a 00 jmp 0x7fff7c5c7d50
  4. 0x7fff7c21f5ad: 5f popq %rdi
  5. 0x7fff7c21f5ae: 19 05 e8 23 6c fe sbbl %eax, -0x193dc18(%rip)
  6. 0x7fff7c21f5b4: ff 5e a8 lcalll *-0x58(%rsi)
  7. 0x7fff7c21f5b7: 04 e8 addb $-0x18, %al
  8. 0x7fff7c21f5b9: 1b 6c fe ff sbbl -0x1(%rsi,%rdi,8), %ebp
  9. 0x7fff7c21f5bd: 5e popq %rsi
  10. 0x7fff7c21f5be: 00 03 addb %al, (%rbx)
  11. 0x7fff7c21f5c0: e8 13 6c fe ff callq 0x7fff7c2061d8
  12. 0x7fff7c21f5c5: 5e popq %rsi
  13. 0x7fff7c21f5c6: b0 02 movb $0x2, %al

下次再调用函数时就可以直接jmp到编译结果了.
JIT Stub的实现可以让运行时只编译实际会运行的函数, 这样可以大幅减少程序的启动时间, 第二次调用时的消耗(1个jmp)也非常的小.

注意调用虚方法时的流程跟上面的流程有一点不同, 虚方法的地址会保存在函数表中,
打补丁时会对函数表而不是Precode打补丁, 下次调用时函数表中指向的地址是编译后的地址, 有兴趣可以自己试试分析.

接下来我们看看PreStubWorker的内部处理.

JIT的入口点

PreStubWorker的源代码如下:

  1. extern "C" PCODE STDCALL PreStubWorker(TransitionBlock * pTransitionBlock, MethodDesc * pMD)
  2. {
  3. PCODE pbRetVal = NULL;
  4. BEGIN_PRESERVE_LAST_ERROR;
  5. STATIC_CONTRACT_THROWS;
  6. STATIC_CONTRACT_GC_TRIGGERS;
  7. STATIC_CONTRACT_MODE_COOPERATIVE;
  8. STATIC_CONTRACT_ENTRY_POINT;
  9. MAKE_CURRENT_THREAD_AVAILABLE();
  10. #ifdef _DEBUG
  11. Thread::ObjectRefFlush(CURRENT_THREAD);
  12. #endif
  13. FrameWithCookie<PrestubMethodFrame> frame(pTransitionBlock, pMD);
  14. PrestubMethodFrame * pPFrame = &frame;
  15. pPFrame->Push(CURRENT_THREAD);
  16. INSTALL_MANAGED_EXCEPTION_DISPATCHER;
  17. INSTALL_UNWIND_AND_CONTINUE_HANDLER;
  18. ETWOnStartup (PrestubWorker_V1,PrestubWorkerEnd_V1);
  19. _ASSERTE(!NingenEnabled() && "You cannot invoke managed code inside the ngen compilation process.");
  20. // Running the PreStubWorker on a method causes us to access its MethodTable
  21. g_IBCLogger.LogMethodDescAccess(pMD);
  22. // Make sure the method table is restored, and method instantiation if present
  23. pMD->CheckRestore();
  24. CONSISTENCY_CHECK(GetAppDomain()->CheckCanExecuteManagedCode(pMD));
  25. // Note this is redundant with the above check but we do it anyway for safety
  26. //
  27. // This has been disabled so we have a better chance of catching these. Note that this check is
  28. // NOT sufficient for domain neutral and ngen cases.
  29. //
  30. // pMD->EnsureActive();
  31. MethodTable *pDispatchingMT = NULL;
  32. if (pMD->IsVtableMethod())
  33. {
  34. OBJECTREF curobj = pPFrame->GetThis();
  35. if (curobj != NULL) // Check for virtual function called non-virtually on a NULL object
  36. {
  37. pDispatchingMT = curobj->GetTrueMethodTable();
  38. #ifdef FEATURE_ICASTABLE
  39. if (pDispatchingMT->IsICastable())
  40. {
  41. MethodTable *pMDMT = pMD->GetMethodTable();
  42. TypeHandle objectType(pDispatchingMT);
  43. TypeHandle methodType(pMDMT);
  44. GCStress<cfg_any>::MaybeTrigger();
  45. INDEBUG(curobj = NULL); // curobj is unprotected and CanCastTo() can trigger GC
  46. if (!objectType.CanCastTo(methodType))
  47. {
  48. // Apperantly ICastable magic was involved when we chose this method to be called
  49. // that's why we better stick to the MethodTable it belongs to, otherwise
  50. // DoPrestub() will fail not being able to find implementation for pMD in pDispatchingMT.
  51. pDispatchingMT = pMDMT;
  52. }
  53. }
  54. #endif // FEATURE_ICASTABLE
  55. // For value types, the only virtual methods are interface implementations.
  56. // Thus pDispatching == pMT because there
  57. // is no inheritance in value types. Note the BoxedEntryPointStubs are shared
  58. // between all sharable generic instantiations, so the == test is on
  59. // canonical method tables.
  60. #ifdef _DEBUG
  61. MethodTable *pMDMT = pMD->GetMethodTable(); // put this here to see what the MT is in debug mode
  62. _ASSERTE(!pMD->GetMethodTable()->IsValueType() ||
  63. (pMD->IsUnboxingStub() && (pDispatchingMT->GetCanonicalMethodTable() == pMDMT->GetCanonicalMethodTable())));
  64. #endif // _DEBUG
  65. }
  66. }
  67. GCX_PREEMP_THREAD_EXISTS(CURRENT_THREAD);
  68. pbRetVal = pMD->DoPrestub(pDispatchingMT);
  69. UNINSTALL_UNWIND_AND_CONTINUE_HANDLER;
  70. UNINSTALL_MANAGED_EXCEPTION_DISPATCHER;
  71. {
  72. HardwareExceptionHolder
  73. // Give debugger opportunity to stop here
  74. ThePreStubPatch();
  75. }
  76. pPFrame->Pop(CURRENT_THREAD);
  77. POSTCONDITION(pbRetVal != NULL);
  78. END_PRESERVE_LAST_ERROR;
  79. return pbRetVal;
  80. }

这个函数接收了两个参数,
第一个是TransitionBlock, 其实就是一个指向栈的指针, 里面保存了备份的寄存器,
第二个是MethodDesc, 是当前编译函数的信息, lldb中使用dumpmd pMD即可看到具体信息.

之后会调用MethodDesc::DoPrestub, 如果函数是虚方法则传入this对象类型的MethodTable.
MethodDesc::DoPrestub的源代码如下:

  1. PCODE MethodDesc::DoPrestub(MethodTable *pDispatchingMT)
  2. {
  3. CONTRACT(PCODE)
  4. {
  5. STANDARD_VM_CHECK;
  6. POSTCONDITION(RETVAL != NULL);
  7. }
  8. CONTRACT_END;
  9. Stub *pStub = NULL;
  10. PCODE pCode = NULL;
  11. Thread *pThread = GetThread();
  12. MethodTable *pMT = GetMethodTable();
  13. // Running a prestub on a method causes us to access its MethodTable
  14. g_IBCLogger.LogMethodDescAccess(this);
  15. // A secondary layer of defense against executing code in inspection-only assembly.
  16. // This should already have been taken care of by not allowing inspection assemblies
  17. // to be activated. However, this is a very inexpensive piece of insurance in the name
  18. // of security.
  19. if (IsIntrospectionOnly())
  20. {
  21. _ASSERTE(!"A ReflectionOnly assembly reached the prestub. This should not have happened.");
  22. COMPlusThrow(kInvalidOperationException, IDS_EE_CODEEXECUTION_IN_INTROSPECTIVE_ASSEMBLY);
  23. }
  24. if (ContainsGenericVariables())
  25. {
  26. COMPlusThrow(kInvalidOperationException, IDS_EE_CODEEXECUTION_CONTAINSGENERICVAR);
  27. }
  28. /************************** DEBUG CHECKS *************************/
  29. /*-----------------------------------------------------------------
  30. // Halt if needed, GC stress, check the sharing count etc.
  31. */
  32. #ifdef _DEBUG
  33. static unsigned ctr = 0;
  34. ctr++;
  35. if (g_pConfig->ShouldPrestubHalt(this))
  36. {
  37. _ASSERTE(!"PreStubHalt");
  38. }
  39. LOG((LF_CLASSLOADER, LL_INFO10000, "In PreStubWorker for %s::%s\n",
  40. m_pszDebugClassName, m_pszDebugMethodName));
  41. // This is a nice place to test out having some fatal EE errors. We do this only in a checked build, and only
  42. // under the InjectFatalError key.
  43. if (g_pConfig->InjectFatalError() == 1)
  44. {
  45. EEPOLICY_HANDLE_FATAL_ERROR(COR_E_EXECUTIONENGINE);
  46. }
  47. else if (g_pConfig->InjectFatalError() == 2)
  48. {
  49. EEPOLICY_HANDLE_FATAL_ERROR(COR_E_STACKOVERFLOW);
  50. }
  51. else if (g_pConfig->InjectFatalError() == 3)
  52. {
  53. TestSEHGuardPageRestore();
  54. }
  55. // Useful to test GC with the prestub on the call stack
  56. if (g_pConfig->ShouldPrestubGC(this))
  57. {
  58. GCX_COOP();
  59. GCHeap::GetGCHeap()->GarbageCollect(-1);
  60. }
  61. #endif // _DEBUG
  62. STRESS_LOG1(LF_CLASSLOADER, LL_INFO10000, "Prestubworker: method %pM\n", this);
  63. GCStress<cfg_any, EeconfigFastGcSPolicy, CoopGcModePolicy>::MaybeTrigger();
  64. // Are we in the prestub because of a rejit request? If so, let the ReJitManager
  65. // take it from here.
  66. pCode = ReJitManager::DoReJitIfNecessary(this);
  67. if (pCode != NULL)
  68. {
  69. // A ReJIT was performed, so nothing left for DoPrestub() to do. Return now.
  70. //
  71. // The stable entrypoint will either be a pointer to the original JITted code
  72. // (with a jmp at the top to jump to the newly-rejitted code) OR a pointer to any
  73. // stub code that must be executed first (e.g., a remoting stub), which in turn
  74. // will call the original JITted code (which then jmps to the newly-rejitted
  75. // code).
  76. RETURN GetStableEntryPoint();
  77. }
  78. #ifdef FEATURE_PREJIT
  79. // If this method is the root of a CER call graph and we've recorded this fact in the ngen image then we're in the prestub in
  80. // order to trip any runtime level preparation needed for this graph (P/Invoke stub generation/library binding, generic
  81. // dictionary prepopulation etc.).
  82. GetModule()->RestoreCer(this);
  83. #endif // FEATURE_PREJIT
  84. #ifdef FEATURE_COMINTEROP
  85. /************************** INTEROP *************************/
  86. /*-----------------------------------------------------------------
  87. // Some method descriptors are COMPLUS-to-COM call descriptors
  88. // they are not your every day method descriptors, for example
  89. // they don't have an IL or code.
  90. */
  91. if (IsComPlusCall() || IsGenericComPlusCall())
  92. {
  93. pCode = GetStubForInteropMethod(this);
  94. GetPrecode()->SetTargetInterlocked(pCode);
  95. RETURN GetStableEntryPoint();
  96. }
  97. #endif // FEATURE_COMINTEROP
  98. // workaround: This is to handle a punted work item dealing with a skipped module constructor
  99. // due to appdomain unload. Basically shared code was JITted in domain A, and then
  100. // this caused a link to another shared module with a module CCTOR, which was skipped
  101. // or aborted in another appdomain we were trying to propagate the activation to.
  102. //
  103. // Note that this is not a fix, but that it just minimizes the window in which the
  104. // issue can occur.
  105. if (pThread->IsAbortRequested())
  106. {
  107. pThread->HandleThreadAbort();
  108. }
  109. /************************** CLASS CONSTRUCTOR ********************/
  110. // Make sure .cctor has been run
  111. if (IsClassConstructorTriggeredViaPrestub())
  112. {
  113. pMT->CheckRunClassInitThrowing();
  114. }
  115. /************************** BACKPATCHING *************************/
  116. // See if the addr of code has changed from the pre-stub
  117. #ifdef FEATURE_INTERPRETER
  118. if (!IsReallyPointingToPrestub())
  119. #else
  120. if (!IsPointingToPrestub())
  121. #endif
  122. {
  123. LOG((LF_CLASSLOADER, LL_INFO10000,
  124. " In PreStubWorker, method already jitted, backpatching call point\n"));
  125. RETURN DoBackpatch(pMT, pDispatchingMT, TRUE);
  126. }
  127. // record if remoting needs to intercept this call
  128. BOOL fRemotingIntercepted = IsRemotingInterceptedViaPrestub();
  129. BOOL fReportCompilationFinished = FALSE;
  130. /************************** CODE CREATION *************************/
  131. if (IsUnboxingStub())
  132. {
  133. pStub = MakeUnboxingStubWorker(this);
  134. }
  135. #ifdef FEATURE_REMOTING
  136. else if (pMT->IsInterface() && !IsStatic() && !IsFCall())
  137. {
  138. pCode = CRemotingServices::GetDispatchInterfaceHelper(this);
  139. GetOrCreatePrecode();
  140. }
  141. #endif // FEATURE_REMOTING
  142. #if defined(FEATURE_SHARE_GENERIC_CODE)
  143. else if (IsInstantiatingStub())
  144. {
  145. pStub = MakeInstantiatingStubWorker(this);
  146. }
  147. #endif // defined(FEATURE_SHARE_GENERIC_CODE)
  148. else if (IsIL() || IsNoMetadata())
  149. {
  150. // remember if we need to backpatch the MethodTable slot
  151. BOOL fBackpatch = !fRemotingIntercepted
  152. && !IsEnCMethod();
  153. #ifdef FEATURE_PREJIT
  154. //
  155. // See if we have any prejitted code to use.
  156. //
  157. pCode = GetPreImplementedCode();
  158. #ifdef PROFILING_SUPPORTED
  159. if (pCode != NULL)
  160. {
  161. BOOL fShouldSearchCache = TRUE;
  162. {
  163. BEGIN_PIN_PROFILER(CORProfilerTrackCacheSearches());
  164. g_profControlBlock.pProfInterface->
  165. JITCachedFunctionSearchStarted((FunctionID) this,
  166. &fShouldSearchCache);
  167. END_PIN_PROFILER();
  168. }
  169. if (!fShouldSearchCache)
  170. {
  171. #ifdef FEATURE_INTERPRETER
  172. SetNativeCodeInterlocked(NULL, pCode, FALSE);
  173. #else
  174. SetNativeCodeInterlocked(NULL, pCode);
  175. #endif
  176. _ASSERTE(!IsPreImplemented());
  177. pCode = NULL;
  178. }
  179. }
  180. #endif // PROFILING_SUPPORTED
  181. if (pCode != NULL)
  182. {
  183. LOG((LF_ZAP, LL_INFO10000,
  184. "ZAP: Using code" FMT_ADDR "for %s.%s sig=\"%s\" (token %x).\n",
  185. DBG_ADDR(pCode),
  186. m_pszDebugClassName,
  187. m_pszDebugMethodName,
  188. m_pszDebugMethodSignature,
  189. GetMemberDef()));
  190. TADDR pFixupList = GetFixupList();
  191. if (pFixupList != NULL)
  192. {
  193. Module *pZapModule = GetZapModule();
  194. _ASSERTE(pZapModule != NULL);
  195. if (!pZapModule->FixupDelayList(pFixupList))
  196. {
  197. _ASSERTE(!"FixupDelayList failed");
  198. ThrowHR(COR_E_BADIMAGEFORMAT);
  199. }
  200. }
  201. #ifdef HAVE_GCCOVER
  202. if (GCStress<cfg_instr_ngen>::IsEnabled())
  203. SetupGcCoverage(this, (BYTE*) pCode);
  204. #endif // HAVE_GCCOVER
  205. #ifdef PROFILING_SUPPORTED
  206. /*
  207. * This notifies the profiler that a search to find a
  208. * cached jitted function has been made.
  209. */
  210. {
  211. BEGIN_PIN_PROFILER(CORProfilerTrackCacheSearches());
  212. g_profControlBlock.pProfInterface->
  213. JITCachedFunctionSearchFinished((FunctionID) this, COR_PRF_CACHED_FUNCTION_FOUND);
  214. END_PIN_PROFILER();
  215. }
  216. #endif // PROFILING_SUPPORTED
  217. }
  218. //
  219. // If not, try to jit it
  220. //
  221. #endif // FEATURE_PREJIT
  222. #ifdef FEATURE_READYTORUN
  223. if (pCode == NULL)
  224. {
  225. Module * pModule = GetModule();
  226. if (pModule->IsReadyToRun())
  227. {
  228. pCode = pModule->GetReadyToRunInfo()->GetEntryPoint(this);
  229. if (pCode != NULL)
  230. fReportCompilationFinished = TRUE;
  231. }
  232. }
  233. #endif // FEATURE_READYTORUN
  234. if (pCode == NULL)
  235. {
  236. NewHolder<COR_ILMETHOD_DECODER> pHeader(NULL);
  237. // Get the information on the method
  238. if (!IsNoMetadata())
  239. {
  240. COR_ILMETHOD* ilHeader = GetILHeader(TRUE);
  241. if(ilHeader == NULL)
  242. {
  243. #ifdef FEATURE_COMINTEROP
  244. // Abstract methods can be called through WinRT derivation if the deriving type
  245. // is not implemented in managed code, and calls through the CCW to the abstract
  246. // method. Throw a sensible exception in that case.
  247. if (pMT->IsExportedToWinRT() && IsAbstract())
  248. {
  249. COMPlusThrowHR(E_NOTIMPL);
  250. }
  251. #endif // FEATURE_COMINTEROP
  252. COMPlusThrowHR(COR_E_BADIMAGEFORMAT, BFA_BAD_IL);
  253. }
  254. COR_ILMETHOD_DECODER::DecoderStatus status = COR_ILMETHOD_DECODER::FORMAT_ERROR;
  255. {
  256. // Decoder ctor can AV on a malformed method header
  257. AVInRuntimeImplOkayHolder AVOkay;
  258. pHeader = new COR_ILMETHOD_DECODER(ilHeader, GetMDImport(), &status);
  259. if(pHeader == NULL)
  260. status = COR_ILMETHOD_DECODER::FORMAT_ERROR;
  261. }
  262. if (status == COR_ILMETHOD_DECODER::VERIFICATION_ERROR &&
  263. Security::CanSkipVerification(GetModule()->GetDomainAssembly()))
  264. {
  265. status = COR_ILMETHOD_DECODER::SUCCESS;
  266. }
  267. if (status != COR_ILMETHOD_DECODER::SUCCESS)
  268. {
  269. if (status == COR_ILMETHOD_DECODER::VERIFICATION_ERROR)
  270. {
  271. // Throw a verification HR
  272. COMPlusThrowHR(COR_E_VERIFICATION);
  273. }
  274. else
  275. {
  276. COMPlusThrowHR(COR_E_BADIMAGEFORMAT, BFA_BAD_IL);
  277. }
  278. }
  279. #ifdef _VER_EE_VERIFICATION_ENABLED
  280. static ConfigDWORD peVerify;
  281. if (peVerify.val(CLRConfig::EXTERNAL_PEVerify))
  282. Verify(pHeader, TRUE, FALSE); // Throws a VerifierException if verification fails
  283. #endif // _VER_EE_VERIFICATION_ENABLED
  284. } // end if (!IsNoMetadata())
  285. // JIT it
  286. LOG((LF_CLASSLOADER, LL_INFO1000000,
  287. " In PreStubWorker, calling MakeJitWorker\n"));
  288. // Create the precode eagerly if it is going to be needed later.
  289. if (!fBackpatch)
  290. {
  291. GetOrCreatePrecode();
  292. }
  293. // Mark the code as hot in case the method ends up in the native image
  294. g_IBCLogger.LogMethodCodeAccess(this);
  295. pCode = MakeJitWorker(pHeader, 0, 0);
  296. #ifdef FEATURE_INTERPRETER
  297. if ((pCode != NULL) && !HasStableEntryPoint())
  298. {
  299. // We don't yet have a stable entry point, so don't do backpatching yet.
  300. // But we do have to handle some extra cases that occur in backpatching.
  301. // (Perhaps I *should* get to the backpatching code, but in a mode where we know
  302. // we're not dealing with the stable entry point...)
  303. if (HasNativeCodeSlot())
  304. {
  305. // We called "SetNativeCodeInterlocked" in MakeJitWorker, which updated the native
  306. // code slot, but I think we also want to update the regular slot...
  307. PCODE tmpEntry = GetTemporaryEntryPoint();
  308. PCODE pFound = FastInterlockCompareExchangePointer(GetAddrOfSlot(), pCode, tmpEntry);
  309. // Doesn't matter if we failed -- if we did, it's because somebody else made progress.
  310. if (pFound != tmpEntry) pCode = pFound;
  311. }
  312. // Now we handle the case of a FuncPtrPrecode.
  313. FuncPtrStubs * pFuncPtrStubs = GetLoaderAllocator()->GetFuncPtrStubsNoCreate();
  314. if (pFuncPtrStubs != NULL)
  315. {
  316. Precode* pFuncPtrPrecode = pFuncPtrStubs->Lookup(this);
  317. if (pFuncPtrPrecode != NULL)
  318. {
  319. // If there is a funcptr precode to patch, attempt to patch it. If we lose, that's OK,
  320. // somebody else made progress.
  321. pFuncPtrPrecode->SetTargetInterlocked(pCode);
  322. }
  323. }
  324. }
  325. #endif // FEATURE_INTERPRETER
  326. } // end if (pCode == NULL)
  327. } // end else if (IsIL() || IsNoMetadata())
  328. else if (IsNDirect())
  329. {
  330. if (!GetModule()->GetSecurityDescriptor()->CanCallUnmanagedCode())
  331. Security::ThrowSecurityException(g_SecurityPermissionClassName, SPFLAGSUNMANAGEDCODE);
  332. pCode = GetStubForInteropMethod(this);
  333. GetOrCreatePrecode();
  334. }
  335. else if (IsFCall())
  336. {
  337. // Get the fcall implementation
  338. BOOL fSharedOrDynamicFCallImpl;
  339. pCode = ECall::GetFCallImpl(this, &fSharedOrDynamicFCallImpl);
  340. if (fSharedOrDynamicFCallImpl)
  341. {
  342. // Fake ctors share one implementation that has to be wrapped by prestub
  343. GetOrCreatePrecode();
  344. }
  345. }
  346. else if (IsArray())
  347. {
  348. pStub = GenerateArrayOpStub((ArrayMethodDesc*)this);
  349. }
  350. else if (IsEEImpl())
  351. {
  352. _ASSERTE(GetMethodTable()->IsDelegate());
  353. pCode = COMDelegate::GetInvokeMethodStub((EEImplMethodDesc*)this);
  354. GetOrCreatePrecode();
  355. }
  356. else
  357. {
  358. // This is a method type we don't handle yet
  359. _ASSERTE(!"Unknown Method Type");
  360. }
  361. /************************** POSTJIT *************************/
  362. #ifndef FEATURE_INTERPRETER
  363. _ASSERTE(pCode == NULL || GetNativeCode() == NULL || pCode == GetNativeCode());
  364. #else // FEATURE_INTERPRETER
  365. // Interpreter adds a new possiblity == someone else beat us to installing an intepreter stub.
  366. _ASSERTE(pCode == NULL || GetNativeCode() == NULL || pCode == GetNativeCode()
  367. || Interpreter::InterpretationStubToMethodInfo(pCode) == this);
  368. #endif // FEATURE_INTERPRETER
  369. // At this point we must have either a pointer to managed code or to a stub. All of the above code
  370. // should have thrown an exception if it couldn't make a stub.
  371. _ASSERTE((pStub != NULL) ^ (pCode != NULL));
  372. /************************** SECURITY *************************/
  373. // Lets check to see if we need declarative security on this stub, If we have
  374. // security checks on this method or class then we need to add an intermediate
  375. // stub that performs declarative checks prior to calling the real stub.
  376. // record if security needs to intercept this call (also depends on whether we plan to use stubs for declarative security)
  377. #if !defined( HAS_REMOTING_PRECODE) && defined (FEATURE_REMOTING)
  378. /************************** REMOTING *************************/
  379. // check for MarshalByRef scenarios ... we need to intercept
  380. // Non-virtual calls on MarshalByRef types
  381. if (fRemotingIntercepted)
  382. {
  383. // let us setup a remoting stub to intercept all the calls
  384. Stub *pRemotingStub = CRemotingServices::GetStubForNonVirtualMethod(this,
  385. (pStub != NULL) ? (LPVOID)pStub->GetEntryPoint() : (LPVOID)pCode, pStub);
  386. if (pRemotingStub != NULL)
  387. {
  388. pStub = pRemotingStub;
  389. pCode = NULL;
  390. }
  391. }
  392. #endif // HAS_REMOTING_PRECODE
  393. _ASSERTE((pStub != NULL) ^ (pCode != NULL));
  394. #if defined(_TARGET_X86_) || defined(_TARGET_AMD64_)
  395. //
  396. // We are seeing memory reordering race around fixups (see DDB 193514 and related bugs). We get into
  397. // situation where the patched precode is visible by other threads, but the resolved fixups
  398. // are not. IT SHOULD NEVER HAPPEN according to our current understanding of x86/x64 memory model.
  399. // (see email thread attached to the bug for details).
  400. //
  401. // We suspect that there may be bug in the hardware or that hardware may have shortcuts that may be
  402. // causing grief. We will try to avoid the race by executing an extra memory barrier.
  403. //
  404. MemoryBarrier();
  405. #endif
  406. if (pCode != NULL)
  407. {
  408. if (HasPrecode())
  409. GetPrecode()->SetTargetInterlocked(pCode);
  410. else
  411. if (!HasStableEntryPoint())
  412. {
  413. // Is the result an interpreter stub?
  414. #ifdef FEATURE_INTERPRETER
  415. if (Interpreter::InterpretationStubToMethodInfo(pCode) == this)
  416. {
  417. SetEntryPointInterlocked(pCode);
  418. }
  419. else
  420. #endif // FEATURE_INTERPRETER
  421. {
  422. SetStableEntryPointInterlocked(pCode);
  423. }
  424. }
  425. }
  426. else
  427. {
  428. if (!GetOrCreatePrecode()->SetTargetInterlocked(pStub->GetEntryPoint()))
  429. {
  430. pStub->DecRef();
  431. }
  432. else
  433. if (pStub->HasExternalEntryPoint())
  434. {
  435. // If the Stub wraps code that is outside of the Stub allocation, then we
  436. // need to free the Stub allocation now.
  437. pStub->DecRef();
  438. }
  439. }
  440. #ifdef FEATURE_INTERPRETER
  441. _ASSERTE(!IsReallyPointingToPrestub());
  442. #else // FEATURE_INTERPRETER
  443. _ASSERTE(!IsPointingToPrestub());
  444. _ASSERTE(HasStableEntryPoint());
  445. #endif // FEATURE_INTERPRETER
  446. if (fReportCompilationFinished)
  447. DACNotifyCompilationFinished(this);
  448. RETURN DoBackpatch(pMT, pDispatchingMT, FALSE);
  449. }

这个函数比较长, 我们只需要关注两个地方:

pCode = MakeJitWorker(pHeader, 0, 0);

MakeJitWorker会调用JIT编译函数, pCode是编译后的机器代码地址.

  1. if (HasPrecode())
  2. GetPrecode()->SetTargetInterlocked(pCode);

SetTargetInterlocked会对Precode打补丁, 第二次调用函数时会直接跳转到编译结果.

MakeJitWorker的源代码如下:

  1. PCODE MethodDesc::MakeJitWorker(COR_ILMETHOD_DECODER* ILHeader, DWORD flags, DWORD flags2)
  2. {
  3. STANDARD_VM_CONTRACT;
  4. BOOL fIsILStub = IsILStub(); // @TODO: understand the need for this special case
  5. LOG((LF_JIT, LL_INFO1000000,
  6. "MakeJitWorker(" FMT_ADDR ", %s) for %s:%s\n",
  7. DBG_ADDR(this),
  8. fIsILStub ? " TRUE" : "FALSE",
  9. GetMethodTable()->GetDebugClassName(),
  10. m_pszDebugMethodName));
  11. PCODE pCode = NULL;
  12. ULONG sizeOfCode = 0;
  13. #ifdef FEATURE_INTERPRETER
  14. PCODE pPreviousInterpStub = NULL;
  15. BOOL fInterpreted = FALSE;
  16. BOOL fStable = TRUE; // True iff the new code address (to be stored in pCode), is a stable entry point.
  17. #endif
  18. #ifdef FEATURE_MULTICOREJIT
  19. MulticoreJitManager & mcJitManager = GetAppDomain()->GetMulticoreJitManager();
  20. bool fBackgroundThread = (flags & CORJIT_FLG_MCJIT_BACKGROUND) != 0;
  21. #endif
  22. {
  23. // Enter the global lock which protects the list of all functions being JITd
  24. ListLockHolder pJitLock (GetDomain()->GetJitLock());
  25. // It is possible that another thread stepped in before we entered the global lock for the first time.
  26. pCode = GetNativeCode();
  27. if (pCode != NULL)
  28. {
  29. #ifdef FEATURE_INTERPRETER
  30. if (Interpreter::InterpretationStubToMethodInfo(pCode) == this)
  31. {
  32. pPreviousInterpStub = pCode;
  33. }
  34. else
  35. #endif // FEATURE_INTERPRETER
  36. goto Done;
  37. }
  38. const char *description = "jit lock";
  39. INDEBUG(description = m_pszDebugMethodName;)
  40. ListLockEntryHolder pEntry(ListLockEntry::Find(pJitLock, this, description));
  41. // We have an entry now, we can release the global lock
  42. pJitLock.Release();
  43. // Take the entry lock
  44. {
  45. ListLockEntryLockHolder pEntryLock(pEntry, FALSE);
  46. if (pEntryLock.DeadlockAwareAcquire())
  47. {
  48. if (pEntry->m_hrResultCode == S_FALSE)
  49. {
  50. // Nobody has jitted the method yet
  51. }
  52. else
  53. {
  54. // We came in to jit but someone beat us so return the
  55. // jitted method!
  56. // We can just fall through because we will notice below that
  57. // the method has code.
  58. // @todo: Note that we may have a failed HRESULT here -
  59. // we might want to return an early error rather than
  60. // repeatedly failing the jit.
  61. }
  62. }
  63. else
  64. {
  65. // Taking this lock would cause a deadlock (presumably because we
  66. // are involved in a class constructor circular dependency.) For
  67. // instance, another thread may be waiting to run the class constructor
  68. // that we are jitting, but is currently jitting this function.
  69. //
  70. // To remedy this, we want to go ahead and do the jitting anyway.
  71. // The other threads contending for the lock will then notice that
  72. // the jit finished while they were running class constructors, and abort their
  73. // current jit effort.
  74. //
  75. // We don't have to do anything special right here since we
  76. // can check HasNativeCode() to detect this case later.
  77. //
  78. // Note that at this point we don't have the lock, but that's OK because the
  79. // thread which does have the lock is blocked waiting for us.
  80. }
  81. // It is possible that another thread stepped in before we entered the lock.
  82. pCode = GetNativeCode();
  83. #ifdef FEATURE_INTERPRETER
  84. if (pCode != NULL && (pCode != pPreviousInterpStub))
  85. #else
  86. if (pCode != NULL)
  87. #endif // FEATURE_INTERPRETER
  88. {
  89. goto Done;
  90. }
  91. SString namespaceOrClassName, methodName, methodSignature;
  92. PCODE pOtherCode = NULL; // Need to move here due to 'goto GotNewCode'
  93. #ifdef FEATURE_MULTICOREJIT
  94. bool fCompiledInBackground = false;
  95. // If not called from multi-core JIT thread,
  96. if (! fBackgroundThread)
  97. {
  98. // Quick check before calling expensive out of line function on this method's domain has code JITted by background thread
  99. if (mcJitManager.GetMulticoreJitCodeStorage().GetRemainingMethodCount() > 0)
  100. {
  101. if (MulticoreJitManager::IsMethodSupported(this))
  102. {
  103. pCode = mcJitManager.RequestMethodCode(this); // Query multi-core JIT manager for compiled code
  104. // Multicore JIT manager starts background thread to pre-compile methods, but it does not back-patch it/notify profiler/notify DAC,
  105. // Jumtp to GotNewCode to do so
  106. if (pCode != NULL)
  107. {
  108. fCompiledInBackground = true;
  109. #ifdef DEBUGGING_SUPPORTED
  110. // Notify the debugger of the jitted function
  111. if (g_pDebugInterface != NULL)
  112. {
  113. g_pDebugInterface->JITComplete(this, pCode);
  114. }
  115. #endif
  116. goto GotNewCode;
  117. }
  118. }
  119. }
  120. }
  121. #endif
  122. if (fIsILStub)
  123. {
  124. // we race with other threads to JIT the code for an IL stub and the
  125. // IL header is released once one of the threads completes. As a result
  126. // we must be inside the lock to reliably get the IL header for the
  127. // stub.
  128. ILStubResolver* pResolver = AsDynamicMethodDesc()->GetILStubResolver();
  129. ILHeader = pResolver->GetILHeader();
  130. }
  131. #ifdef MDA_SUPPORTED
  132. MdaJitCompilationStart* pProbe = MDA_GET_ASSISTANT(JitCompilationStart);
  133. if (pProbe)
  134. pProbe->NowCompiling(this);
  135. #endif // MDA_SUPPORTED
  136. #ifdef PROFILING_SUPPORTED
  137. // If profiling, need to give a chance for a tool to examine and modify
  138. // the IL before it gets to the JIT. This allows one to add probe calls for
  139. // things like code coverage, performance, or whatever.
  140. {
  141. BEGIN_PIN_PROFILER(CORProfilerTrackJITInfo());
  142. // Multicore JIT should be disabled when CORProfilerTrackJITInfo is on
  143. // But there could be corner case in which profiler is attached when multicore background thread is calling MakeJitWorker
  144. // Disable this block when calling from multicore JIT background thread
  145. if (!IsNoMetadata()
  146. #ifdef FEATURE_MULTICOREJIT
  147. && (! fBackgroundThread)
  148. #endif
  149. )
  150. {
  151. g_profControlBlock.pProfInterface->JITCompilationStarted((FunctionID) this, TRUE);
  152. // The profiler may have changed the code on the callback. Need to
  153. // pick up the new code. Note that you have to be fully trusted in
  154. // this mode and the code will not be verified.
  155. COR_ILMETHOD *pilHeader = GetILHeader(TRUE);
  156. new (ILHeader) COR_ILMETHOD_DECODER(pilHeader, GetMDImport(), NULL);
  157. }
  158. END_PIN_PROFILER();
  159. }
  160. #endif // PROFILING_SUPPORTED
  161. #ifdef FEATURE_INTERPRETER
  162. // We move the ETW event for start of JITting inward, after we make the decision
  163. // to JIT rather than interpret.
  164. #else // FEATURE_INTERPRETER
  165. // Fire an ETW event to mark the beginning of JIT'ing
  166. ETW::MethodLog::MethodJitting(this, &namespaceOrClassName, &methodName, &methodSignature);
  167. #endif // FEATURE_INTERPRETER
  168. #ifdef FEATURE_STACK_SAMPLING
  169. #ifdef FEATURE_MULTICOREJIT
  170. if (!fBackgroundThread)
  171. #endif // FEATURE_MULTICOREJIT
  172. {
  173. StackSampler::RecordJittingInfo(this, flags, flags2);
  174. }
  175. #endif // FEATURE_STACK_SAMPLING
  176. EX_TRY
  177. {
  178. pCode = UnsafeJitFunction(this, ILHeader, flags, flags2, &sizeOfCode);
  179. }
  180. EX_CATCH
  181. {
  182. // If the current thread threw an exception, but a competing thread
  183. // somehow succeeded at JITting the same function (e.g., out of memory
  184. // encountered on current thread but not competing thread), then go ahead
  185. // and swallow this current thread's exception, since we somehow managed
  186. // to successfully JIT the code on the other thread.
  187. //
  188. // Note that if a deadlock cycle is broken, that does not result in an
  189. // exception--the thread would just pass through the lock and JIT the
  190. // function in competition with the other thread (with the winner of the
  191. // race decided later on when we do SetNativeCodeInterlocked). This
  192. // try/catch is purely to deal with the (unusual) case where a competing
  193. // thread succeeded where we aborted.
  194. pOtherCode = GetNativeCode();
  195. if (pOtherCode == NULL)
  196. {
  197. pEntry->m_hrResultCode = E_FAIL;
  198. EX_RETHROW;
  199. }
  200. }
  201. EX_END_CATCH(RethrowTerminalExceptions)
  202. if (pOtherCode != NULL)
  203. {
  204. // Somebody finished jitting recursively while we were jitting the method.
  205. // Just use their method & leak the one we finished. (Normally we hope
  206. // not to finish our JIT in this case, as we will abort early if we notice
  207. // a reentrant jit has occurred. But we may not catch every place so we
  208. // do a definitive final check here.
  209. pCode = pOtherCode;
  210. goto Done;
  211. }
  212. _ASSERTE(pCode != NULL);
  213. #ifdef HAVE_GCCOVER
  214. if (GCStress<cfg_instr_jit>::IsEnabled())
  215. {
  216. SetupGcCoverage(this, (BYTE*) pCode);
  217. }
  218. #endif // HAVE_GCCOVER
  219. #ifdef FEATURE_INTERPRETER
  220. // Determine whether the new code address is "stable"...= is not an interpreter stub.
  221. fInterpreted = (Interpreter::InterpretationStubToMethodInfo(pCode) == this);
  222. fStable = !fInterpreted;
  223. #endif // FEATURE_INTERPRETER
  224. #ifdef FEATURE_MULTICOREJIT
  225. // If called from multi-core JIT background thread, store code under lock, delay patching until code is queried from application threads
  226. if (fBackgroundThread)
  227. {
  228. // Fire an ETW event to mark the end of JIT'ing
  229. ETW::MethodLog::MethodJitted(this, &namespaceOrClassName, &methodName, &methodSignature, pCode, 0 /* ReJITID */);
  230. #ifdef FEATURE_PERFMAP
  231. // Save the JIT'd method information so that perf can resolve JIT'd call frames.
  232. PerfMap::LogJITCompiledMethod(this, pCode, sizeOfCode);
  233. #endif
  234. mcJitManager.GetMulticoreJitCodeStorage().StoreMethodCode(this, pCode);
  235. goto Done;
  236. }
  237. GotNewCode:
  238. #endif
  239. // If this function had already been requested for rejit (before its original
  240. // code was jitted), then give the rejit manager a chance to jump-stamp the
  241. // code we just compiled so the first thread entering the function will jump
  242. // to the prestub and trigger the rejit. Note that the PublishMethodHolder takes
  243. // a lock to avoid a particular kind of rejit race. See
  244. // code:ReJitManager::PublishMethodHolder::PublishMethodHolder#PublishCode for
  245. // details on the rejit race.
  246. //
  247. // Aside from rejit, performing a SetNativeCodeInterlocked at this point
  248. // generally ensures that there is only one winning version of the native
  249. // code. This also avoid races with profiler overriding ngened code (see
  250. // matching SetNativeCodeInterlocked done after
  251. // JITCachedFunctionSearchStarted)
  252. #ifdef FEATURE_INTERPRETER
  253. PCODE pExpected = pPreviousInterpStub;
  254. if (pExpected == NULL) pExpected = GetTemporaryEntryPoint();
  255. #endif
  256. {
  257. ReJitPublishMethodHolder publishWorker(this, pCode);
  258. if (!SetNativeCodeInterlocked(pCode
  259. #ifdef FEATURE_INTERPRETER
  260. , pExpected, fStable
  261. #endif
  262. ))
  263. {
  264. // Another thread beat us to publishing its copy of the JITted code.
  265. pCode = GetNativeCode();
  266. goto Done;
  267. }
  268. }
  269. #ifdef FEATURE_INTERPRETER
  270. // State for dynamic methods cannot be freed if the method was ever interpreted,
  271. // since there is no way to ensure that it is not in use at the moment.
  272. if (IsDynamicMethod() && !fInterpreted && (pPreviousInterpStub == NULL))
  273. {
  274. AsDynamicMethodDesc()->GetResolver()->FreeCompileTimeState();
  275. }
  276. #endif // FEATURE_INTERPRETER
  277. // We succeeded in jitting the code, and our jitted code is the one that's going to run now.
  278. pEntry->m_hrResultCode = S_OK;
  279. #ifdef PROFILING_SUPPORTED
  280. // Notify the profiler that JIT completed.
  281. // Must do this after the address has been set.
  282. // @ToDo: Why must we set the address before notifying the profiler ??
  283. // Note that if IsInterceptedForDeclSecurity is set no one should access the jitted code address anyway.
  284. {
  285. BEGIN_PIN_PROFILER(CORProfilerTrackJITInfo());
  286. if (!IsNoMetadata())
  287. {
  288. g_profControlBlock.pProfInterface->
  289. JITCompilationFinished((FunctionID) this,
  290. pEntry->m_hrResultCode,
  291. TRUE);
  292. }
  293. END_PIN_PROFILER();
  294. }
  295. #endif // PROFILING_SUPPORTED
  296. #ifdef FEATURE_MULTICOREJIT
  297. if (! fCompiledInBackground)
  298. #endif
  299. #ifdef FEATURE_INTERPRETER
  300. // If we didn't JIT, but rather, created an interpreter stub (i.e., fStable is false), don't tell ETW that we did.
  301. if (fStable)
  302. #endif // FEATURE_INTERPRETER
  303. {
  304. // Fire an ETW event to mark the end of JIT'ing
  305. ETW::MethodLog::MethodJitted(this, &namespaceOrClassName, &methodName, &methodSignature, pCode, 0 /* ReJITID */);
  306. #ifdef FEATURE_PERFMAP
  307. // Save the JIT'd method information so that perf can resolve JIT'd call frames.
  308. PerfMap::LogJITCompiledMethod(this, pCode, sizeOfCode);
  309. #endif
  310. }
  311. #ifdef FEATURE_MULTICOREJIT
  312. // If not called from multi-core JIT thread, not got code from storage, quick check before calling out of line function
  313. if (! fBackgroundThread && ! fCompiledInBackground && mcJitManager.IsRecorderActive())
  314. {
  315. if (MulticoreJitManager::IsMethodSupported(this))
  316. {
  317. mcJitManager.RecordMethodJit(this); // Tell multi-core JIT manager to record method on successful JITting
  318. }
  319. }
  320. #endif
  321. if (!fIsILStub)
  322. {
  323. // The notification will only occur if someone has registered for this method.
  324. DACNotifyCompilationFinished(this);
  325. }
  326. }
  327. }
  328. Done:
  329. // We must have a code by now.
  330. _ASSERTE(pCode != NULL);
  331. LOG((LF_CORDB, LL_EVERYTHING, "MethodDesc::MakeJitWorker finished. Stub is" FMT_ADDR "\n",
  332. DBG_ADDR(pCode)));
  333. return pCode;
  334. }

这个函数是线程安全的JIT函数,
如果多个线程编译同一个函数, 其中一个线程会执行编译, 其他线程会等待编译完成.
每个AppDomain会有一个锁的集合, 一个正在编译的函数拥有一个ListLockEntry对象,
函数首先会对集合上锁, 获取或者创建函数对应的ListLockEntry, 然后释放对集合的锁,
这个时候所有线程对同一个函数都会获取到同一个ListLockEntry, 然后再对ListLockEntry上锁.
上锁后调用非线程安全的JIT函数:

pCode = UnsafeJitFunction(this, ILHeader, flags, flags2, &sizeOfCode)

接下来还有几层调用才会到JIT主函数, 我只简单说明他们的处理:

UnsafeJitFunction

这个函数会创建CEEJitInfo(JIT层给EE层反馈使用的类)的实例, 从函数信息中获取编译标志(是否以Debug模式编译),
调用CallCompileMethodWithSEHWrapper, 并且在相对地址溢出时禁止使用相对地址(fAllowRel32)然后重试编译.

CallCompileMethodWithSEHWrapper

这个函数会在try中调用invokeCompileMethod.

invokeCompileMethod

这个函数让当前线程进入Preemptive模式(GC可以不用挂起当前线程), 然后调用invokeCompileMethodHelper.

invokeCompileMethodHelper

这个函数一般情况下会调用jitMgr->m_jit->compileMethod.

CILJit::compileMethod

这个函数一般情况下会调用jitNativeCode.

jitNativeCode

创建和初始化Compiler的实例, 并调用pParam->pComp->compCompile(7参数版).
内联时也会从这个函数开始调用, 如果是内联则Compiler实例会在第一次创建后复用.
Compiler负责单个函数的整个JIT过程.

Compiler::compCompile(7参数版)

这个函数会对Compiler实例做出一些初始化处理, 然后调用Compiler::compCompileHelper.

compCompileHelper

这个函数会先创建本地变量表lvaTableBasicBlock的链表,
必要时添加一个内部使用的block(BB01), 然后解析IL代码添加更多的block, 具体将在下面说明.
然后调用compCompile(3参数版).

compCompile(3参数版)

这就是JIT的主函数, 这个函数负责调用JIT各个阶段的工作, 具体将在下面说明.

创建本地变量表

compCompileHelper会调用lvaInitTypeRef,
lvaInitTypeRef会创建本地变量表, 源代码如下:

  1. void Compiler::lvaInitTypeRef()
  2. {
  3. /* x86 args look something like this:
  4. [this ptr] [hidden return buffer] [declared arguments]* [generic context] [var arg cookie]
  5. x64 is closer to the native ABI:
  6. [this ptr] [hidden return buffer] [generic context] [var arg cookie] [declared arguments]*
  7. (Note: prior to .NET Framework 4.5.1 for Windows 8.1 (but not .NET Framework 4.5.1 "downlevel"),
  8. the "hidden return buffer" came before the "this ptr". Now, the "this ptr" comes first. This
  9. is different from the C++ order, where the "hidden return buffer" always comes first.)
  10. ARM and ARM64 are the same as the current x64 convention:
  11. [this ptr] [hidden return buffer] [generic context] [var arg cookie] [declared arguments]*
  12. Key difference:
  13. The var arg cookie and generic context are swapped with respect to the user arguments
  14. */
  15. /* Set compArgsCount and compLocalsCount */
  16. info.compArgsCount = info.compMethodInfo->args.numArgs;
  17. // Is there a 'this' pointer
  18. if (!info.compIsStatic)
  19. {
  20. info.compArgsCount++;
  21. }
  22. else
  23. {
  24. info.compThisArg = BAD_VAR_NUM;
  25. }
  26. info.compILargsCount = info.compArgsCount;
  27. #ifdef FEATURE_SIMD
  28. if (featureSIMD && (info.compRetNativeType == TYP_STRUCT))
  29. {
  30. var_types structType = impNormStructType(info.compMethodInfo->args.retTypeClass);
  31. info.compRetType = structType;
  32. }
  33. #endif // FEATURE_SIMD
  34. // Are we returning a struct using a return buffer argument?
  35. //
  36. const bool hasRetBuffArg = impMethodInfo_hasRetBuffArg(info.compMethodInfo);
  37. // Possibly change the compRetNativeType from TYP_STRUCT to a "primitive" type
  38. // when we are returning a struct by value and it fits in one register
  39. //
  40. if (!hasRetBuffArg && varTypeIsStruct(info.compRetNativeType))
  41. {
  42. CORINFO_CLASS_HANDLE retClsHnd = info.compMethodInfo->args.retTypeClass;
  43. Compiler::structPassingKind howToReturnStruct;
  44. var_types returnType = getReturnTypeForStruct(retClsHnd, &howToReturnStruct);
  45. if (howToReturnStruct == SPK_PrimitiveType)
  46. {
  47. assert(returnType != TYP_UNKNOWN);
  48. assert(returnType != TYP_STRUCT);
  49. info.compRetNativeType = returnType;
  50. // ToDo: Refactor this common code sequence into its own method as it is used 4+ times
  51. if ((returnType == TYP_LONG) && (compLongUsed == false))
  52. {
  53. compLongUsed = true;
  54. }
  55. else if (((returnType == TYP_FLOAT) || (returnType == TYP_DOUBLE)) && (compFloatingPointUsed == false))
  56. {
  57. compFloatingPointUsed = true;
  58. }
  59. }
  60. }
  61. // Do we have a RetBuffArg?
  62. if (hasRetBuffArg)
  63. {
  64. info.compArgsCount++;
  65. }
  66. else
  67. {
  68. info.compRetBuffArg = BAD_VAR_NUM;
  69. }
  70. /* There is a 'hidden' cookie pushed last when the
  71. calling convention is varargs */
  72. if (info.compIsVarArgs)
  73. {
  74. info.compArgsCount++;
  75. }
  76. // Is there an extra parameter used to pass instantiation info to
  77. // shared generic methods and shared generic struct instance methods?
  78. if (info.compMethodInfo->args.callConv & CORINFO_CALLCONV_PARAMTYPE)
  79. {
  80. info.compArgsCount++;
  81. }
  82. else
  83. {
  84. info.compTypeCtxtArg = BAD_VAR_NUM;
  85. }
  86. lvaCount = info.compLocalsCount = info.compArgsCount + info.compMethodInfo->locals.numArgs;
  87. info.compILlocalsCount = info.compILargsCount + info.compMethodInfo->locals.numArgs;
  88. /* Now allocate the variable descriptor table */
  89. if (compIsForInlining())
  90. {
  91. lvaTable = impInlineInfo->InlinerCompiler->lvaTable;
  92. lvaCount = impInlineInfo->InlinerCompiler->lvaCount;
  93. lvaTableCnt = impInlineInfo->InlinerCompiler->lvaTableCnt;
  94. // No more stuff needs to be done.
  95. return;
  96. }
  97. lvaTableCnt = lvaCount * 2;
  98. if (lvaTableCnt < 16)
  99. {
  100. lvaTableCnt = 16;
  101. }
  102. lvaTable = (LclVarDsc*)compGetMemArray(lvaTableCnt, sizeof(*lvaTable), CMK_LvaTable);
  103. size_t tableSize = lvaTableCnt * sizeof(*lvaTable);
  104. memset(lvaTable, 0, tableSize);
  105. for (unsigned i = 0; i < lvaTableCnt; i++)
  106. {
  107. new (&lvaTable[i], jitstd::placement_t()) LclVarDsc(this); // call the constructor.
  108. }
  109. //-------------------------------------------------------------------------
  110. // Count the arguments and initialize the respective lvaTable[] entries
  111. //
  112. // First the implicit arguments
  113. //-------------------------------------------------------------------------
  114. InitVarDscInfo varDscInfo;
  115. varDscInfo.Init(lvaTable, hasRetBuffArg);
  116. lvaInitArgs(&varDscInfo);
  117. //-------------------------------------------------------------------------
  118. // Finally the local variables
  119. //-------------------------------------------------------------------------
  120. unsigned varNum = varDscInfo.varNum;
  121. LclVarDsc* varDsc = varDscInfo.varDsc;
  122. CORINFO_ARG_LIST_HANDLE localsSig = info.compMethodInfo->locals.args;
  123. for (unsigned i = 0; i < info.compMethodInfo->locals.numArgs;
  124. i++, varNum++, varDsc++, localsSig = info.compCompHnd->getArgNext(localsSig))
  125. {
  126. CORINFO_CLASS_HANDLE typeHnd;
  127. CorInfoTypeWithMod corInfoType =
  128. info.compCompHnd->getArgType(&info.compMethodInfo->locals, localsSig, &typeHnd);
  129. lvaInitVarDsc(varDsc, varNum, strip(corInfoType), typeHnd, localsSig, &info.compMethodInfo->locals);
  130. varDsc->lvPinned = ((corInfoType & CORINFO_TYPE_MOD_PINNED) != 0);
  131. varDsc->lvOnFrame = true; // The final home for this local variable might be our local stack frame
  132. }
  133. if ( // If there already exist unsafe buffers, don't mark more structs as unsafe
  134. // as that will cause them to be placed along with the real unsafe buffers,
  135. // unnecessarily exposing them to overruns. This can affect GS tests which
  136. // intentionally do buffer-overruns.
  137. !getNeedsGSSecurityCookie() &&
  138. // GS checks require the stack to be re-ordered, which can't be done with EnC
  139. !opts.compDbgEnC && compStressCompile(STRESS_UNSAFE_BUFFER_CHECKS, 25))
  140. {
  141. setNeedsGSSecurityCookie();
  142. compGSReorderStackLayout = true;
  143. for (unsigned i = 0; i < lvaCount; i++)
  144. {
  145. if ((lvaTable[i].lvType == TYP_STRUCT) && compStressCompile(STRESS_GENERIC_VARN, 60))
  146. {
  147. lvaTable[i].lvIsUnsafeBuffer = true;
  148. }
  149. }
  150. }
  151. if (getNeedsGSSecurityCookie())
  152. {
  153. // Ensure that there will be at least one stack variable since
  154. // we require that the GSCookie does not have a 0 stack offset.
  155. unsigned dummy = lvaGrabTempWithImplicitUse(false DEBUGARG("GSCookie dummy"));
  156. lvaTable[dummy].lvType = TYP_INT;
  157. }
  158. #ifdef DEBUG
  159. if (verbose)
  160. {
  161. lvaTableDump(INITIAL_FRAME_LAYOUT);
  162. }
  163. #endif
  164. }

初始的本地变量数量是info.compArgsCount + info.compMethodInfo->locals.numArgs, 也就是IL中的参数数量+IL中的本地变量数量.
因为后面可能会添加更多的临时变量, 本地变量表的储存采用了length+capacity的方式,
本地变量表的指针是lvaTable, 当前长度是lvaCount, 最大长度是lvaTableCnt.
本地变量表的开头部分会先保存IL中的参数变量, 随后才是IL中的本地变量,
例如有3个参数, 2个本地变量时, 本地变量表是[参数0, 参数1, 参数2, 变量0, 变量1, 空, 空, 空, ... ].

此外如果对当前函数的编译是为了内联, 本地变量表会使用调用端(callsite)的对象.

根据IL创建BasicBlock

在进入JIT的主函数之前, compCompileHelper会先解析IL并且根据指令创建BasicBlock.
上一篇中也提到过,
BasicBlock是内部不包含跳转的逻辑块, 跳转指令原则只出现在block的最后, 同时跳转目标只能是block的开头.

创建BasicBlock的逻辑在函数fgFindBasicBlocks, 我们来看看它的源代码:

  1. /*****************************************************************************
  2. *
  3. * Main entry point to discover the basic blocks for the current function.
  4. */
  5. void Compiler::fgFindBasicBlocks()
  6. {
  7. #ifdef DEBUG
  8. if (verbose)
  9. {
  10. printf("*************** In fgFindBasicBlocks() for %s\n", info.compFullName);
  11. }
  12. #endif
  13. /* Allocate the 'jump target' vector
  14. *
  15. * We need one extra byte as we mark
  16. * jumpTarget[info.compILCodeSize] with JT_ADDR
  17. * when we need to add a dummy block
  18. * to record the end of a try or handler region.
  19. */
  20. BYTE* jumpTarget = new (this, CMK_Unknown) BYTE[info.compILCodeSize + 1];
  21. memset(jumpTarget, JT_NONE, info.compILCodeSize + 1);
  22. noway_assert(JT_NONE == 0);
  23. /* Walk the instrs to find all jump targets */
  24. fgFindJumpTargets(info.compCode, info.compILCodeSize, jumpTarget);
  25. if (compDonotInline())
  26. {
  27. return;
  28. }
  29. unsigned XTnum;
  30. /* Are there any exception handlers? */
  31. if (info.compXcptnsCount > 0)
  32. {
  33. noway_assert(!compIsForInlining());
  34. /* Check and mark all the exception handlers */
  35. for (XTnum = 0; XTnum < info.compXcptnsCount; XTnum++)
  36. {
  37. DWORD tmpOffset;
  38. CORINFO_EH_CLAUSE clause;
  39. info.compCompHnd->getEHinfo(info.compMethodHnd, XTnum, &clause);
  40. noway_assert(clause.HandlerLength != (unsigned)-1);
  41. if (clause.TryLength <= 0)
  42. {
  43. BADCODE("try block length <=0");
  44. }
  45. /* Mark the 'try' block extent and the handler itself */
  46. if (clause.TryOffset > info.compILCodeSize)
  47. {
  48. BADCODE("try offset is > codesize");
  49. }
  50. if (jumpTarget[clause.TryOffset] == JT_NONE)
  51. {
  52. jumpTarget[clause.TryOffset] = JT_ADDR;
  53. }
  54. tmpOffset = clause.TryOffset + clause.TryLength;
  55. if (tmpOffset > info.compILCodeSize)
  56. {
  57. BADCODE("try end is > codesize");
  58. }
  59. if (jumpTarget[tmpOffset] == JT_NONE)
  60. {
  61. jumpTarget[tmpOffset] = JT_ADDR;
  62. }
  63. if (clause.HandlerOffset > info.compILCodeSize)
  64. {
  65. BADCODE("handler offset > codesize");
  66. }
  67. if (jumpTarget[clause.HandlerOffset] == JT_NONE)
  68. {
  69. jumpTarget[clause.HandlerOffset] = JT_ADDR;
  70. }
  71. tmpOffset = clause.HandlerOffset + clause.HandlerLength;
  72. if (tmpOffset > info.compILCodeSize)
  73. {
  74. BADCODE("handler end > codesize");
  75. }
  76. if (jumpTarget[tmpOffset] == JT_NONE)
  77. {
  78. jumpTarget[tmpOffset] = JT_ADDR;
  79. }
  80. if (clause.Flags & CORINFO_EH_CLAUSE_FILTER)
  81. {
  82. if (clause.FilterOffset > info.compILCodeSize)
  83. {
  84. BADCODE("filter offset > codesize");
  85. }
  86. if (jumpTarget[clause.FilterOffset] == JT_NONE)
  87. {
  88. jumpTarget[clause.FilterOffset] = JT_ADDR;
  89. }
  90. }
  91. }
  92. }
  93. #ifdef DEBUG
  94. if (verbose)
  95. {
  96. bool anyJumpTargets = false;
  97. printf("Jump targets:\n");
  98. for (unsigned i = 0; i < info.compILCodeSize + 1; i++)
  99. {
  100. if (jumpTarget[i] == JT_NONE)
  101. {
  102. continue;
  103. }
  104. anyJumpTargets = true;
  105. printf(" IL_%04x", i);
  106. if (jumpTarget[i] & JT_ADDR)
  107. {
  108. printf(" addr");
  109. }
  110. if (jumpTarget[i] & JT_MULTI)
  111. {
  112. printf(" multi");
  113. }
  114. printf("\n");
  115. }
  116. if (!anyJumpTargets)
  117. {
  118. printf(" none\n");
  119. }
  120. }
  121. #endif // DEBUG
  122. /* Now create the basic blocks */
  123. fgMakeBasicBlocks(info.compCode, info.compILCodeSize, jumpTarget);
  124. if (compIsForInlining())
  125. {
  126. if (compInlineResult->IsFailure())
  127. {
  128. return;
  129. }
  130. bool hasReturnBlocks = false;
  131. bool hasMoreThanOneReturnBlock = false;
  132. for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
  133. {
  134. if (block->bbJumpKind == BBJ_RETURN)
  135. {
  136. if (hasReturnBlocks)
  137. {
  138. hasMoreThanOneReturnBlock = true;
  139. break;
  140. }
  141. hasReturnBlocks = true;
  142. }
  143. }
  144. if (!hasReturnBlocks && !compInlineResult->UsesLegacyPolicy())
  145. {
  146. //
  147. // Mark the call node as "no return". The inliner might ignore CALLEE_DOES_NOT_RETURN and
  148. // fail inline for a different reasons. In that case we still want to make the "no return"
  149. // information available to the caller as it can impact caller's code quality.
  150. //
  151. impInlineInfo->iciCall->gtCallMoreFlags |= GTF_CALL_M_DOES_NOT_RETURN;
  152. }
  153. compInlineResult->NoteBool(InlineObservation::CALLEE_DOES_NOT_RETURN, !hasReturnBlocks);
  154. if (compInlineResult->IsFailure())
  155. {
  156. return;
  157. }
  158. noway_assert(info.compXcptnsCount == 0);
  159. compHndBBtab = impInlineInfo->InlinerCompiler->compHndBBtab;
  160. compHndBBtabAllocCount =
  161. impInlineInfo->InlinerCompiler->compHndBBtabAllocCount; // we probably only use the table, not add to it.
  162. compHndBBtabCount = impInlineInfo->InlinerCompiler->compHndBBtabCount;
  163. info.compXcptnsCount = impInlineInfo->InlinerCompiler->info.compXcptnsCount;
  164. if (info.compRetNativeType != TYP_VOID && hasMoreThanOneReturnBlock)
  165. {
  166. // The lifetime of this var might expand multiple BBs. So it is a long lifetime compiler temp.
  167. lvaInlineeReturnSpillTemp = lvaGrabTemp(false DEBUGARG("Inline candidate multiple BBJ_RETURN spill temp"));
  168. lvaTable[lvaInlineeReturnSpillTemp].lvType = info.compRetNativeType;
  169. }
  170. return;
  171. }
  172. /* Mark all blocks within 'try' blocks as such */
  173. if (info.compXcptnsCount == 0)
  174. {
  175. return;
  176. }
  177. if (info.compXcptnsCount > MAX_XCPTN_INDEX)
  178. {
  179. IMPL_LIMITATION("too many exception clauses");
  180. }
  181. /* Allocate the exception handler table */
  182. fgAllocEHTable();
  183. /* Assume we don't need to sort the EH table (such that nested try/catch
  184. * appear before their try or handler parent). The EH verifier will notice
  185. * when we do need to sort it.
  186. */
  187. fgNeedToSortEHTable = false;
  188. verInitEHTree(info.compXcptnsCount);
  189. EHNodeDsc* initRoot = ehnNext; // remember the original root since
  190. // it may get modified during insertion
  191. // Annotate BBs with exception handling information required for generating correct eh code
  192. // as well as checking for correct IL
  193. EHblkDsc* HBtab;
  194. for (XTnum = 0, HBtab = compHndBBtab; XTnum < compHndBBtabCount; XTnum++, HBtab++)
  195. {
  196. CORINFO_EH_CLAUSE clause;
  197. info.compCompHnd->getEHinfo(info.compMethodHnd, XTnum, &clause);
  198. noway_assert(clause.HandlerLength != (unsigned)-1); // @DEPRECATED
  199. #ifdef DEBUG
  200. if (verbose)
  201. {
  202. dispIncomingEHClause(XTnum, clause);
  203. }
  204. #endif // DEBUG
  205. IL_OFFSET tryBegOff = clause.TryOffset;
  206. IL_OFFSET tryEndOff = tryBegOff + clause.TryLength;
  207. IL_OFFSET filterBegOff = 0;
  208. IL_OFFSET hndBegOff = clause.HandlerOffset;
  209. IL_OFFSET hndEndOff = hndBegOff + clause.HandlerLength;
  210. if (clause.Flags & CORINFO_EH_CLAUSE_FILTER)
  211. {
  212. filterBegOff = clause.FilterOffset;
  213. }
  214. if (tryEndOff > info.compILCodeSize)
  215. {
  216. BADCODE3("end of try block beyond end of method for try", " at offset %04X", tryBegOff);
  217. }
  218. if (hndEndOff > info.compILCodeSize)
  219. {
  220. BADCODE3("end of hnd block beyond end of method for try", " at offset %04X", tryBegOff);
  221. }
  222. HBtab->ebdTryBegOffset = tryBegOff;
  223. HBtab->ebdTryEndOffset = tryEndOff;
  224. HBtab->ebdFilterBegOffset = filterBegOff;
  225. HBtab->ebdHndBegOffset = hndBegOff;
  226. HBtab->ebdHndEndOffset = hndEndOff;
  227. /* Convert the various addresses to basic blocks */
  228. BasicBlock* tryBegBB = fgLookupBB(tryBegOff);
  229. BasicBlock* tryEndBB =
  230. fgLookupBB(tryEndOff); // note: this can be NULL if the try region is at the end of the function
  231. BasicBlock* hndBegBB = fgLookupBB(hndBegOff);
  232. BasicBlock* hndEndBB = nullptr;
  233. BasicBlock* filtBB = nullptr;
  234. BasicBlock* block;
  235. //
  236. // Assert that the try/hnd beginning blocks are set up correctly
  237. //
  238. if (tryBegBB == nullptr)
  239. {
  240. BADCODE("Try Clause is invalid");
  241. }
  242. if (hndBegBB == nullptr)
  243. {
  244. BADCODE("Handler Clause is invalid");
  245. }
  246. tryBegBB->bbFlags |= BBF_HAS_LABEL;
  247. hndBegBB->bbFlags |= BBF_HAS_LABEL | BBF_JMP_TARGET;
  248. #if HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION
  249. // This will change the block weight from 0 to 1
  250. // and clear the rarely run flag
  251. hndBegBB->makeBlockHot();
  252. #else
  253. hndBegBB->bbSetRunRarely(); // handler entry points are rarely executed
  254. #endif
  255. if (hndEndOff < info.compILCodeSize)
  256. {
  257. hndEndBB = fgLookupBB(hndEndOff);
  258. }
  259. if (clause.Flags & CORINFO_EH_CLAUSE_FILTER)
  260. {
  261. filtBB = HBtab->ebdFilter = fgLookupBB(clause.FilterOffset);
  262. filtBB->bbCatchTyp = BBCT_FILTER;
  263. filtBB->bbFlags |= BBF_HAS_LABEL | BBF_JMP_TARGET;
  264. hndBegBB->bbCatchTyp = BBCT_FILTER_HANDLER;
  265. #if HANDLER_ENTRY_MUST_BE_IN_HOT_SECTION
  266. // This will change the block weight from 0 to 1
  267. // and clear the rarely run flag
  268. filtBB->makeBlockHot();
  269. #else
  270. filtBB->bbSetRunRarely(); // filter entry points are rarely executed
  271. #endif
  272. // Mark all BBs that belong to the filter with the XTnum of the corresponding handler
  273. for (block = filtBB; /**/; block = block->bbNext)
  274. {
  275. if (block == nullptr)
  276. {
  277. BADCODE3("Missing endfilter for filter", " at offset %04X", filtBB->bbCodeOffs);
  278. return;
  279. }
  280. // Still inside the filter
  281. block->setHndIndex(XTnum);
  282. if (block->bbJumpKind == BBJ_EHFILTERRET)
  283. {
  284. // Mark catch handler as successor.
  285. block->bbJumpDest = hndBegBB;
  286. assert(block->bbJumpDest->bbCatchTyp == BBCT_FILTER_HANDLER);
  287. break;
  288. }
  289. }
  290. if (!block->bbNext || block->bbNext != hndBegBB)
  291. {
  292. BADCODE3("Filter does not immediately precede handler for filter", " at offset %04X",
  293. filtBB->bbCodeOffs);
  294. }
  295. }
  296. else
  297. {
  298. HBtab->ebdTyp = clause.ClassToken;
  299. /* Set bbCatchTyp as appropriate */
  300. if (clause.Flags & CORINFO_EH_CLAUSE_FINALLY)
  301. {
  302. hndBegBB->bbCatchTyp = BBCT_FINALLY;
  303. }
  304. else
  305. {
  306. if (clause.Flags & CORINFO_EH_CLAUSE_FAULT)
  307. {
  308. hndBegBB->bbCatchTyp = BBCT_FAULT;
  309. }
  310. else
  311. {
  312. hndBegBB->bbCatchTyp = clause.ClassToken;
  313. // These values should be non-zero value that will
  314. // not collide with real tokens for bbCatchTyp
  315. if (clause.ClassToken == 0)
  316. {
  317. BADCODE("Exception catch type is Null");
  318. }
  319. noway_assert(clause.ClassToken != BBCT_FAULT);
  320. noway_assert(clause.ClassToken != BBCT_FINALLY);
  321. noway_assert(clause.ClassToken != BBCT_FILTER);
  322. noway_assert(clause.ClassToken != BBCT_FILTER_HANDLER);
  323. }
  324. }
  325. }
  326. /* Mark the initial block and last blocks in the 'try' region */
  327. tryBegBB->bbFlags |= BBF_TRY_BEG | BBF_HAS_LABEL;
  328. /* Prevent future optimizations of removing the first block */
  329. /* of a TRY block and the first block of an exception handler */
  330. tryBegBB->bbFlags |= BBF_DONT_REMOVE;
  331. hndBegBB->bbFlags |= BBF_DONT_REMOVE;
  332. hndBegBB->bbRefs++; // The first block of a handler gets an extra, "artificial" reference count.
  333. if (clause.Flags & CORINFO_EH_CLAUSE_FILTER)
  334. {
  335. filtBB->bbFlags |= BBF_DONT_REMOVE;
  336. filtBB->bbRefs++; // The first block of a filter gets an extra, "artificial" reference count.
  337. }
  338. tryBegBB->bbFlags |= BBF_DONT_REMOVE;
  339. hndBegBB->bbFlags |= BBF_DONT_REMOVE;
  340. //
  341. // Store the info to the table of EH block handlers
  342. //
  343. HBtab->ebdHandlerType = ToEHHandlerType(clause.Flags);
  344. HBtab->ebdTryBeg = tryBegBB;
  345. HBtab->ebdTryLast = (tryEndBB == nullptr) ? fgLastBB : tryEndBB->bbPrev;
  346. HBtab->ebdHndBeg = hndBegBB;
  347. HBtab->ebdHndLast = (hndEndBB == nullptr) ? fgLastBB : hndEndBB->bbPrev;
  348. //
  349. // Assert that all of our try/hnd blocks are setup correctly.
  350. //
  351. if (HBtab->ebdTryLast == nullptr)
  352. {
  353. BADCODE("Try Clause is invalid");
  354. }
  355. if (HBtab->ebdHndLast == nullptr)
  356. {
  357. BADCODE("Handler Clause is invalid");
  358. }
  359. //
  360. // Verify that it's legal
  361. //
  362. verInsertEhNode(&clause, HBtab);
  363. } // end foreach handler table entry
  364. fgSortEHTable();
  365. // Next, set things related to nesting that depend on the sorting being complete.
  366. for (XTnum = 0, HBtab = compHndBBtab; XTnum < compHndBBtabCount; XTnum++, HBtab++)
  367. {
  368. /* Mark all blocks in the finally/fault or catch clause */
  369. BasicBlock* tryBegBB = HBtab->ebdTryBeg;
  370. BasicBlock* hndBegBB = HBtab->ebdHndBeg;
  371. IL_OFFSET tryBegOff = HBtab->ebdTryBegOffset;
  372. IL_OFFSET tryEndOff = HBtab->ebdTryEndOffset;
  373. IL_OFFSET hndBegOff = HBtab->ebdHndBegOffset;
  374. IL_OFFSET hndEndOff = HBtab->ebdHndEndOffset;
  375. BasicBlock* block;
  376. for (block = hndBegBB; block && (block->bbCodeOffs < hndEndOff); block = block->bbNext)
  377. {
  378. if (!block->hasHndIndex())
  379. {
  380. block->setHndIndex(XTnum);
  381. }
  382. // All blocks in a catch handler or filter are rarely run, except the entry
  383. if ((block != hndBegBB) && (hndBegBB->bbCatchTyp != BBCT_FINALLY))
  384. {
  385. block->bbSetRunRarely();
  386. }
  387. }
  388. /* Mark all blocks within the covered range of the try */
  389. for (block = tryBegBB; block && (block->bbCodeOffs < tryEndOff); block = block->bbNext)
  390. {
  391. /* Mark this BB as belonging to a 'try' block */
  392. if (!block->hasTryIndex())
  393. {
  394. block->setTryIndex(XTnum);
  395. }
  396. #ifdef DEBUG
  397. /* Note: the BB can't span the 'try' block */
  398. if (!(block->bbFlags & BBF_INTERNAL))
  399. {
  400. noway_assert(tryBegOff <= block->bbCodeOffs);
  401. noway_assert(tryEndOff >= block->bbCodeOffsEnd || tryEndOff == tryBegOff);
  402. }
  403. #endif
  404. }
  405. /* Init ebdHandlerNestingLevel of current clause, and bump up value for all
  406. * enclosed clauses (which have to be before it in the table).
  407. * Innermost try-finally blocks must precede outermost
  408. * try-finally blocks.
  409. */
  410. #if !FEATURE_EH_FUNCLETS
  411. HBtab->ebdHandlerNestingLevel = 0;
  412. #endif // !FEATURE_EH_FUNCLETS
  413. HBtab->ebdEnclosingTryIndex = EHblkDsc::NO_ENCLOSING_INDEX;
  414. HBtab->ebdEnclosingHndIndex = EHblkDsc::NO_ENCLOSING_INDEX;
  415. noway_assert(XTnum < compHndBBtabCount);
  416. noway_assert(XTnum == ehGetIndex(HBtab));
  417. for (EHblkDsc* xtab = compHndBBtab; xtab < HBtab; xtab++)
  418. {
  419. #if !FEATURE_EH_FUNCLETS
  420. if (jitIsBetween(xtab->ebdHndBegOffs(), hndBegOff, hndEndOff))
  421. {
  422. xtab->ebdHandlerNestingLevel++;
  423. }
  424. #endif // !FEATURE_EH_FUNCLETS
  425. /* If we haven't recorded an enclosing try index for xtab then see
  426. * if this EH region should be recorded. We check if the
  427. * first offset in the xtab lies within our region. If so,
  428. * the last offset also must lie within the region, due to
  429. * nesting rules. verInsertEhNode(), below, will check for proper nesting.
  430. */
  431. if (xtab->ebdEnclosingTryIndex == EHblkDsc::NO_ENCLOSING_INDEX)
  432. {
  433. bool begBetween = jitIsBetween(xtab->ebdTryBegOffs(), tryBegOff, tryEndOff);
  434. if (begBetween)
  435. {
  436. // Record the enclosing scope link
  437. xtab->ebdEnclosingTryIndex = (unsigned short)XTnum;
  438. }
  439. }
  440. /* Do the same for the enclosing handler index.
  441. */
  442. if (xtab->ebdEnclosingHndIndex == EHblkDsc::NO_ENCLOSING_INDEX)
  443. {
  444. bool begBetween = jitIsBetween(xtab->ebdTryBegOffs(), hndBegOff, hndEndOff);
  445. if (begBetween)
  446. {
  447. // Record the enclosing scope link
  448. xtab->ebdEnclosingHndIndex = (unsigned short)XTnum;
  449. }
  450. }
  451. }
  452. } // end foreach handler table entry
  453. #if !FEATURE_EH_FUNCLETS
  454. EHblkDsc* HBtabEnd;
  455. for (HBtab = compHndBBtab, HBtabEnd = compHndBBtab + compHndBBtabCount; HBtab < HBtabEnd; HBtab++)
  456. {
  457. if (ehMaxHndNestingCount <= HBtab->ebdHandlerNestingLevel)
  458. ehMaxHndNestingCount = HBtab->ebdHandlerNestingLevel + 1;
  459. }
  460. #endif // !FEATURE_EH_FUNCLETS
  461. #ifndef DEBUG
  462. if (tiVerificationNeeded)
  463. #endif
  464. {
  465. // always run these checks for a debug build
  466. verCheckNestingLevel(initRoot);
  467. }
  468. #ifndef DEBUG
  469. // fgNormalizeEH assumes that this test has been passed. And Ssa assumes that fgNormalizeEHTable
  470. // has been run. So do this unless we're in minOpts mode (and always in debug).
  471. if (tiVerificationNeeded || !opts.MinOpts())
  472. #endif
  473. {
  474. fgCheckBasicBlockControlFlow();
  475. }
  476. #ifdef DEBUG
  477. if (verbose)
  478. {
  479. JITDUMP("*************** After fgFindBasicBlocks() has created the EH table\n");
  480. fgDispHandlerTab();
  481. }
  482. // We can't verify the handler table until all the IL legality checks have been done (above), since bad IL
  483. // (such as illegal nesting of regions) will trigger asserts here.
  484. fgVerifyHandlerTab();
  485. #endif
  486. fgNormalizeEH();
  487. }

fgFindBasicBlocks首先创建了一个byte数组, 长度跟IL长度一样(也就是一个IL偏移值会对应一个byte),
然后调用fgFindJumpTargets查找跳转目标, 以这段IL为例:

  1. IL_0000 00 nop
  2. IL_0001 16 ldc.i4.0
  3. IL_0002 0a stloc.0
  4. IL_0003 2b 0d br.s 13 (IL_0012)
  5. IL_0005 00 nop
  6. IL_0006 06 ldloc.0
  7. IL_0007 28 0c 00 00 0a call 0xA00000C
  8. IL_000c 00 nop
  9. IL_000d 00 nop
  10. IL_000e 06 ldloc.0
  11. IL_000f 17 ldc.i4.1
  12. IL_0010 58 add
  13. IL_0011 0a stloc.0
  14. IL_0012 06 ldloc.0
  15. IL_0013 19 ldc.i4.3
  16. IL_0014 fe 04 clt
  17. IL_0016 0b stloc.1
  18. IL_0017 07 ldloc.1
  19. IL_0018 2d eb brtrue.s -21 (IL_0005)
  20. IL_001a 2a ret

这段IL可以找到两个跳转目标:

  1. Jump targets:
  2. IL_0005
  3. IL_0012

然后fgFindBasicBlocks会根据函数的例外信息找到更多的跳转目标, 例如try的开始和catch的开始都会被视为跳转目标.
注意fgFindJumpTargets在解析IL的后会判断是否值得内联, 内联相关的处理将在下面说明.

之后调用fgMakeBasicBlocks创建BasicBlock, fgMakeBasicBlocks在遇到跳转指令或者跳转目标时会开始一个新的block.
调用fgMakeBasicBlocks后, compiler中就有了BasicBlock的链表(从fgFirstBB开始), 每个节点对应IL中的一段范围.

在创建完BasicBlock后还会根据例外信息创建一个例外信息表compHndBBtab(也称EH表), 长度是compHndBBtabCount.
表中每条记录都有try开始的block, handler(catch, finally, fault)开始的block, 和外层的try序号(如果try嵌套了).

如下图所示:

881857-20171028110045851-188393709.jpg

JIT主函数

compCompileHelperBasicBlock划分好以后, 就会调用3参数版的Compiler::compCompile, 这个函数就是JIT的主函数.

Compiler::compCompile的源代码如下:

  1. //*********************************************************************************************
  2. // #Phases
  3. //
  4. // This is the most interesting 'toplevel' function in the JIT. It goes through the operations of
  5. // importing, morphing, optimizations and code generation. This is called from the EE through the
  6. // code:CILJit::compileMethod function.
  7. //
  8. // For an overview of the structure of the JIT, see:
  9. // https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md
  10. //
  11. void Compiler::compCompile(void** methodCodePtr, ULONG* methodCodeSize, CORJIT_FLAGS* compileFlags)
  12. {
  13. if (compIsForInlining())
  14. {
  15. // Notify root instance that an inline attempt is about to import IL
  16. impInlineRoot()->m_inlineStrategy->NoteImport();
  17. }
  18. hashBv::Init(this);
  19. VarSetOps::AssignAllowUninitRhs(this, compCurLife, VarSetOps::UninitVal());
  20. /* The temp holding the secret stub argument is used by fgImport() when importing the intrinsic. */
  21. if (info.compPublishStubParam)
  22. {
  23. assert(lvaStubArgumentVar == BAD_VAR_NUM);
  24. lvaStubArgumentVar = lvaGrabTempWithImplicitUse(false DEBUGARG("stub argument"));
  25. lvaTable[lvaStubArgumentVar].lvType = TYP_I_IMPL;
  26. }
  27. EndPhase(PHASE_PRE_IMPORT);
  28. compFunctionTraceStart();
  29. /* Convert the instrs in each basic block to a tree based intermediate representation */
  30. fgImport();
  31. assert(!fgComputePredsDone);
  32. if (fgCheapPredsValid)
  33. {
  34. // Remove cheap predecessors before inlining; allowing the cheap predecessor lists to be inserted
  35. // with inlined blocks causes problems.
  36. fgRemovePreds();
  37. }
  38. if (compIsForInlining())
  39. {
  40. /* Quit inlining if fgImport() failed for any reason. */
  41. if (compDonotInline())
  42. {
  43. return;
  44. }
  45. /* Filter out unimported BBs */
  46. fgRemoveEmptyBlocks();
  47. return;
  48. }
  49. assert(!compDonotInline());
  50. EndPhase(PHASE_IMPORTATION);
  51. // Maybe the caller was not interested in generating code
  52. if (compIsForImportOnly())
  53. {
  54. compFunctionTraceEnd(nullptr, 0, false);
  55. return;
  56. }
  57. #if !FEATURE_EH
  58. // If we aren't yet supporting EH in a compiler bring-up, remove as many EH handlers as possible, so
  59. // we can pass tests that contain try/catch EH, but don't actually throw any exceptions.
  60. fgRemoveEH();
  61. #endif // !FEATURE_EH
  62. if (compileFlags->corJitFlags & CORJIT_FLG_BBINSTR)
  63. {
  64. fgInstrumentMethod();
  65. }
  66. // We could allow ESP frames. Just need to reserve space for
  67. // pushing EBP if the method becomes an EBP-frame after an edit.
  68. // Note that requiring a EBP Frame disallows double alignment. Thus if we change this
  69. // we either have to disallow double alignment for E&C some other way or handle it in EETwain.
  70. if (opts.compDbgEnC)
  71. {
  72. codeGen->setFramePointerRequired(true);
  73. // Since we need a slots for security near ebp, its not possible
  74. // to do this after an Edit without shifting all the locals.
  75. // So we just always reserve space for these slots in case an Edit adds them
  76. opts.compNeedSecurityCheck = true;
  77. // We don't care about localloc right now. If we do support it,
  78. // EECodeManager::FixContextForEnC() needs to handle it smartly
  79. // in case the localloc was actually executed.
  80. //
  81. // compLocallocUsed = true;
  82. }
  83. EndPhase(PHASE_POST_IMPORT);
  84. /* Initialize the BlockSet epoch */
  85. NewBasicBlockEpoch();
  86. /* Massage the trees so that we can generate code out of them */
  87. fgMorph();
  88. EndPhase(PHASE_MORPH);
  89. /* GS security checks for unsafe buffers */
  90. if (getNeedsGSSecurityCookie())
  91. {
  92. #ifdef DEBUG
  93. if (verbose)
  94. {
  95. printf("\n*************** -GS checks for unsafe buffers \n");
  96. }
  97. #endif
  98. gsGSChecksInitCookie();
  99. if (compGSReorderStackLayout)
  100. {
  101. gsCopyShadowParams();
  102. }
  103. #ifdef DEBUG
  104. if (verbose)
  105. {
  106. fgDispBasicBlocks(true);
  107. printf("\n");
  108. }
  109. #endif
  110. }
  111. EndPhase(PHASE_GS_COOKIE);
  112. /* Compute bbNum, bbRefs and bbPreds */
  113. JITDUMP("\nRenumbering the basic blocks for fgComputePred\n");
  114. fgRenumberBlocks();
  115. noway_assert(!fgComputePredsDone); // This is the first time full (not cheap) preds will be computed.
  116. fgComputePreds();
  117. EndPhase(PHASE_COMPUTE_PREDS);
  118. /* If we need to emit GC Poll calls, mark the blocks that need them now. This is conservative and can
  119. * be optimized later. */
  120. fgMarkGCPollBlocks();
  121. EndPhase(PHASE_MARK_GC_POLL_BLOCKS);
  122. /* From this point on the flowgraph information such as bbNum,
  123. * bbRefs or bbPreds has to be kept updated */
  124. // Compute the edge weights (if we have profile data)
  125. fgComputeEdgeWeights();
  126. EndPhase(PHASE_COMPUTE_EDGE_WEIGHTS);
  127. #if FEATURE_EH_FUNCLETS
  128. /* Create funclets from the EH handlers. */
  129. fgCreateFunclets();
  130. EndPhase(PHASE_CREATE_FUNCLETS);
  131. #endif // FEATURE_EH_FUNCLETS
  132. if (!opts.MinOpts() && !opts.compDbgCode)
  133. {
  134. optOptimizeLayout();
  135. EndPhase(PHASE_OPTIMIZE_LAYOUT);
  136. // Compute reachability sets and dominators.
  137. fgComputeReachability();
  138. }
  139. // Transform each GT_ALLOCOBJ node into either an allocation helper call or
  140. // local variable allocation on the stack.
  141. ObjectAllocator objectAllocator(this);
  142. objectAllocator.Run();
  143. if (!opts.MinOpts() && !opts.compDbgCode)
  144. {
  145. /* Perform loop inversion (i.e. transform "while" loops into
  146. "repeat" loops) and discover and classify natural loops
  147. (e.g. mark iterative loops as such). Also marks loop blocks
  148. and sets bbWeight to the loop nesting levels
  149. */
  150. optOptimizeLoops();
  151. EndPhase(PHASE_OPTIMIZE_LOOPS);
  152. // Clone loops with optimization opportunities, and
  153. // choose the one based on dynamic condition evaluation.
  154. optCloneLoops();
  155. EndPhase(PHASE_CLONE_LOOPS);
  156. /* Unroll loops */
  157. optUnrollLoops();
  158. EndPhase(PHASE_UNROLL_LOOPS);
  159. }
  160. #ifdef DEBUG
  161. fgDebugCheckLinks();
  162. #endif
  163. /* Create the variable table (and compute variable ref counts) */
  164. lvaMarkLocalVars();
  165. EndPhase(PHASE_MARK_LOCAL_VARS);
  166. // IMPORTANT, after this point, every place where trees are modified or cloned
  167. // the local variable reference counts must be updated
  168. // You can test the value of the following variable to see if
  169. // the local variable ref counts must be updated
  170. //
  171. assert(lvaLocalVarRefCounted == true);
  172. if (!opts.MinOpts() && !opts.compDbgCode)
  173. {
  174. /* Optimize boolean conditions */
  175. optOptimizeBools();
  176. EndPhase(PHASE_OPTIMIZE_BOOLS);
  177. // optOptimizeBools() might have changed the number of blocks; the dominators/reachability might be bad.
  178. }
  179. /* Figure out the order in which operators are to be evaluated */
  180. fgFindOperOrder();
  181. EndPhase(PHASE_FIND_OPER_ORDER);
  182. // Weave the tree lists. Anyone who modifies the tree shapes after
  183. // this point is responsible for calling fgSetStmtSeq() to keep the
  184. // nodes properly linked.
  185. // This can create GC poll calls, and create new BasicBlocks (without updating dominators/reachability).
  186. fgSetBlockOrder();
  187. EndPhase(PHASE_SET_BLOCK_ORDER);
  188. // IMPORTANT, after this point, every place where tree topology changes must redo evaluation
  189. // order (gtSetStmtInfo) and relink nodes (fgSetStmtSeq) if required.
  190. CLANG_FORMAT_COMMENT_ANCHOR;
  191. #ifdef DEBUG
  192. // Now we have determined the order of evaluation and the gtCosts for every node.
  193. // If verbose, dump the full set of trees here before the optimization phases mutate them
  194. //
  195. if (verbose)
  196. {
  197. fgDispBasicBlocks(true); // 'true' will call fgDumpTrees() after dumping the BasicBlocks
  198. printf("\n");
  199. }
  200. #endif
  201. // At this point we know if we are fully interruptible or not
  202. if (!opts.MinOpts() && !opts.compDbgCode)
  203. {
  204. bool doSsa = true;
  205. bool doEarlyProp = true;
  206. bool doValueNum = true;
  207. bool doLoopHoisting = true;
  208. bool doCopyProp = true;
  209. bool doAssertionProp = true;
  210. bool doRangeAnalysis = true;
  211. #ifdef DEBUG
  212. doSsa = (JitConfig.JitDoSsa() != 0);
  213. doEarlyProp = doSsa && (JitConfig.JitDoEarlyProp() != 0);
  214. doValueNum = doSsa && (JitConfig.JitDoValueNumber() != 0);
  215. doLoopHoisting = doValueNum && (JitConfig.JitDoLoopHoisting() != 0);
  216. doCopyProp = doValueNum && (JitConfig.JitDoCopyProp() != 0);
  217. doAssertionProp = doValueNum && (JitConfig.JitDoAssertionProp() != 0);
  218. doRangeAnalysis = doAssertionProp && (JitConfig.JitDoRangeAnalysis() != 0);
  219. #endif
  220. if (doSsa)
  221. {
  222. fgSsaBuild();
  223. EndPhase(PHASE_BUILD_SSA);
  224. }
  225. if (doEarlyProp)
  226. {
  227. /* Propagate array length and rewrite getType() method call */
  228. optEarlyProp();
  229. EndPhase(PHASE_EARLY_PROP);
  230. }
  231. if (doValueNum)
  232. {
  233. fgValueNumber();
  234. EndPhase(PHASE_VALUE_NUMBER);
  235. }
  236. if (doLoopHoisting)
  237. {
  238. /* Hoist invariant code out of loops */
  239. optHoistLoopCode();
  240. EndPhase(PHASE_HOIST_LOOP_CODE);
  241. }
  242. if (doCopyProp)
  243. {
  244. /* Perform VN based copy propagation */
  245. optVnCopyProp();
  246. EndPhase(PHASE_VN_COPY_PROP);
  247. }
  248. #if FEATURE_ANYCSE
  249. /* Remove common sub-expressions */
  250. optOptimizeCSEs();
  251. #endif // FEATURE_ANYCSE
  252. #if ASSERTION_PROP
  253. if (doAssertionProp)
  254. {
  255. /* Assertion propagation */
  256. optAssertionPropMain();
  257. EndPhase(PHASE_ASSERTION_PROP_MAIN);
  258. }
  259. if (doRangeAnalysis)
  260. {
  261. /* Optimize array index range checks */
  262. RangeCheck rc(this);
  263. rc.OptimizeRangeChecks();
  264. EndPhase(PHASE_OPTIMIZE_INDEX_CHECKS);
  265. }
  266. #endif // ASSERTION_PROP
  267. /* update the flowgraph if we modified it during the optimization phase*/
  268. if (fgModified)
  269. {
  270. fgUpdateFlowGraph();
  271. EndPhase(PHASE_UPDATE_FLOW_GRAPH);
  272. // Recompute the edge weight if we have modified the flow graph
  273. fgComputeEdgeWeights();
  274. EndPhase(PHASE_COMPUTE_EDGE_WEIGHTS2);
  275. }
  276. }
  277. #ifdef _TARGET_AMD64_
  278. // Check if we need to add the Quirk for the PPP backward compat issue
  279. compQuirkForPPPflag = compQuirkForPPP();
  280. #endif
  281. fgDetermineFirstColdBlock();
  282. EndPhase(PHASE_DETERMINE_FIRST_COLD_BLOCK);
  283. #ifdef DEBUG
  284. fgDebugCheckLinks(compStressCompile(STRESS_REMORPH_TREES, 50));
  285. // Stash the current estimate of the function's size if necessary.
  286. if (verbose)
  287. {
  288. compSizeEstimate = 0;
  289. compCycleEstimate = 0;
  290. for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
  291. {
  292. for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->getNextStmt())
  293. {
  294. compSizeEstimate += stmt->GetCostSz();
  295. compCycleEstimate += stmt->GetCostEx();
  296. }
  297. }
  298. }
  299. #endif
  300. #ifndef LEGACY_BACKEND
  301. // rationalize trees
  302. Rationalizer rat(this); // PHASE_RATIONALIZE
  303. rat.Run();
  304. #endif // !LEGACY_BACKEND
  305. // Here we do "simple lowering". When the RyuJIT backend works for all
  306. // platforms, this will be part of the more general lowering phase. For now, though, we do a separate
  307. // pass of "final lowering." We must do this before (final) liveness analysis, because this creates
  308. // range check throw blocks, in which the liveness must be correct.
  309. fgSimpleLowering();
  310. EndPhase(PHASE_SIMPLE_LOWERING);
  311. #ifdef LEGACY_BACKEND
  312. /* Local variable liveness */
  313. fgLocalVarLiveness();
  314. EndPhase(PHASE_LCLVARLIVENESS);
  315. #endif // !LEGACY_BACKEND
  316. #ifdef DEBUG
  317. fgDebugCheckBBlist();
  318. fgDebugCheckLinks();
  319. #endif
  320. /* Enable this to gather statistical data such as
  321. * call and register argument info, flowgraph and loop info, etc. */
  322. compJitStats();
  323. #ifdef _TARGET_ARM_
  324. if (compLocallocUsed)
  325. {
  326. // We reserve REG_SAVED_LOCALLOC_SP to store SP on entry for stack unwinding
  327. codeGen->regSet.rsMaskResvd |= RBM_SAVED_LOCALLOC_SP;
  328. }
  329. #endif // _TARGET_ARM_
  330. #ifdef _TARGET_ARMARCH_
  331. if (compRsvdRegCheck(PRE_REGALLOC_FRAME_LAYOUT))
  332. {
  333. // We reserve R10/IP1 in this case to hold the offsets in load/store instructions
  334. codeGen->regSet.rsMaskResvd |= RBM_OPT_RSVD;
  335. assert(REG_OPT_RSVD != REG_FP);
  336. }
  337. #ifdef DEBUG
  338. //
  339. // Display the pre-regalloc frame offsets that we have tentatively decided upon
  340. //
  341. if (verbose)
  342. lvaTableDump();
  343. #endif
  344. #endif // _TARGET_ARMARCH_
  345. /* Assign registers to variables, etc. */
  346. CLANG_FORMAT_COMMENT_ANCHOR;
  347. #ifndef LEGACY_BACKEND
  348. ///
  349. // Dominator and reachability sets are no longer valid. They haven't been
  350. // maintained up to here, and shouldn't be used (unless recomputed).
  351. ///
  352. fgDomsComputed = false;
  353. /* Create LSRA before Lowering, this way Lowering can initialize the TreeNode Map */
  354. m_pLinearScan = getLinearScanAllocator(this);
  355. /* Lower */
  356. Lowering lower(this, m_pLinearScan); // PHASE_LOWERING
  357. lower.Run();
  358. assert(lvaSortAgain == false); // We should have re-run fgLocalVarLiveness() in lower.Run()
  359. lvaTrackedFixed = true; // We can not add any new tracked variables after this point.
  360. /* Now that lowering is completed we can proceed to perform register allocation */
  361. m_pLinearScan->doLinearScan();
  362. EndPhase(PHASE_LINEAR_SCAN);
  363. // Copied from rpPredictRegUse()
  364. genFullPtrRegMap = (codeGen->genInterruptible || !codeGen->isFramePointerUsed());
  365. #else // LEGACY_BACKEND
  366. lvaTrackedFixed = true; // We cannot add any new tracked variables after this point.
  367. // For the classic JIT32 at this point lvaSortAgain can be set and raAssignVars() will call lvaSortOnly()
  368. // Now do "classic" register allocation.
  369. raAssignVars();
  370. EndPhase(PHASE_RA_ASSIGN_VARS);
  371. #endif // LEGACY_BACKEND
  372. #ifdef DEBUG
  373. fgDebugCheckLinks();
  374. #endif
  375. /* Generate code */
  376. codeGen->genGenerateCode(methodCodePtr, methodCodeSize);
  377. #ifdef FEATURE_JIT_METHOD_PERF
  378. if (pCompJitTimer)
  379. pCompJitTimer->Terminate(this, CompTimeSummaryInfo::s_compTimeSummary);
  380. #endif
  381. RecordStateAtEndOfCompilation();
  382. #ifdef FEATURE_TRACELOGGING
  383. compJitTelemetry.NotifyEndOfCompilation();
  384. #endif
  385. #if defined(DEBUG)
  386. ++Compiler::jitTotalMethodCompiled;
  387. #endif // defined(DEBUG)
  388. compFunctionTraceEnd(*methodCodePtr, *methodCodeSize, false);
  389. #if FUNC_INFO_LOGGING
  390. if (compJitFuncInfoFile != nullptr)
  391. {
  392. assert(!compIsForInlining());
  393. #ifdef DEBUG // We only have access to info.compFullName in DEBUG builds.
  394. fprintf(compJitFuncInfoFile, "%s\n", info.compFullName);
  395. #elif FEATURE_SIMD
  396. fprintf(compJitFuncInfoFile, " %s\n", eeGetMethodFullName(info.compMethodHnd));
  397. #endif
  398. fprintf(compJitFuncInfoFile, ""); // in our logic this causes a flush
  399. }
  400. #endif // FUNC_INFO_LOGGING
  401. }

JIT主函数中包含了对各个阶段的调用, 例如EndPhase(PHASE_PRE_IMPORT)表示这个阶段的结束.

这里的阶段比微软列出的阶段要多出来一些:

881857-20171028110106930-46112063.png

接下来我们逐个分析这些阶段.

PHASE_PRE_IMPORT

这个阶段负责从IL导入HIR(GenTree)前的一些工作, 包含以下的代码:

  1. if (compIsForInlining())
  2. {
  3. // Notify root instance that an inline attempt is about to import IL
  4. impInlineRoot()->m_inlineStrategy->NoteImport();
  5. }
  6. hashBv::Init(this);
  7. VarSetOps::AssignAllowUninitRhs(this, compCurLife, VarSetOps::UninitVal());
  8. /* The temp holding the secret stub argument is used by fgImport() when importing the intrinsic. */
  9. if (info.compPublishStubParam)
  10. {
  11. assert(lvaStubArgumentVar == BAD_VAR_NUM);
  12. lvaStubArgumentVar = lvaGrabTempWithImplicitUse(false DEBUGARG("stub argument"));
  13. lvaTable[lvaStubArgumentVar].lvType = TYP_I_IMPL;
  14. }
  15. EndPhase(PHASE_PRE_IMPORT);

执行了import前的一些初始化工作,
hashBv::InitCompiler创建一个bitvector的分配器(allocator),
VarSetOps::AssignAllowUninitRhs设置compCurLife的值为未初始化(这个变量会用于保存当前活动的本地变量集合),
compPublishStubParam选项开启时会添加一个额外的本地变量(这个变量会保存函数进入时的rax值).

PHASE_IMPORTATION

这个阶段负责从IL导入HIR(GenTree), 包含以下的代码:

  1. compFunctionTraceStart();
  2. /* Convert the instrs in each basic block to a tree based intermediate representation */
  3. fgImport();
  4. assert(!fgComputePredsDone);
  5. if (fgCheapPredsValid)
  6. {
  7. // Remove cheap predecessors before inlining; allowing the cheap predecessor lists to be inserted
  8. // with inlined blocks causes problems.
  9. fgRemovePreds();
  10. }
  11. if (compIsForInlining())
  12. {
  13. /* Quit inlining if fgImport() failed for any reason. */
  14. if (compDonotInline())
  15. {
  16. return;
  17. }
  18. /* Filter out unimported BBs */
  19. fgRemoveEmptyBlocks();
  20. return;
  21. }
  22. assert(!compDonotInline());
  23. EndPhase(PHASE_IMPORTATION);

compFunctionTraceStart会打印一些除错信息.

fgImport会解析IL并添加GenTree节点, 因为此前已经创建了BasicBlock, 根据IL创建的GenTree会分别添加到对应的BasicBlock中.
BasicBlock + GenTree就是我们通常说的IR, IR有两种形式, 树形式的叫HIR(用于JIT前端), 列表形式的叫LIR(用于JIT后端), 这里构建的是HIR.

fgImport的源代码如下:

  1. void Compiler::fgImport()
  2. {
  3. fgHasPostfix = false;
  4. impImport(fgFirstBB);
  5. if (!(opts.eeFlags & CORJIT_FLG_SKIP_VERIFICATION))
  6. {
  7. CorInfoMethodRuntimeFlags verFlag;
  8. verFlag = tiIsVerifiableCode ? CORINFO_FLG_VERIFIABLE : CORINFO_FLG_UNVERIFIABLE;
  9. info.compCompHnd->setMethodAttribs(info.compMethodHnd, verFlag);
  10. }
  11. }

对第一个BasicBlock调用了impImport.

impImport的源代码如下:

  1. /*****************************************************************************
  2. *
  3. * Convert the instrs ("import") into our internal format (trees). The
  4. * basic flowgraph has already been constructed and is passed in.
  5. */
  6. void Compiler::impImport(BasicBlock* method)
  7. {
  8. #ifdef DEBUG
  9. if (verbose)
  10. {
  11. printf("*************** In impImport() for %s\n", info.compFullName);
  12. }
  13. #endif
  14. /* Allocate the stack contents */
  15. if (info.compMaxStack <= sizeof(impSmallStack) / sizeof(impSmallStack[0]))
  16. {
  17. /* Use local variable, don't waste time allocating on the heap */
  18. impStkSize = sizeof(impSmallStack) / sizeof(impSmallStack[0]);
  19. verCurrentState.esStack = impSmallStack;
  20. }
  21. else
  22. {
  23. impStkSize = info.compMaxStack;
  24. verCurrentState.esStack = new (this, CMK_ImpStack) StackEntry[impStkSize];
  25. }
  26. // initialize the entry state at start of method
  27. verInitCurrentState();
  28. // Initialize stuff related to figuring "spill cliques" (see spec comment for impGetSpillTmpBase).
  29. Compiler* inlineRoot = impInlineRoot();
  30. if (this == inlineRoot) // These are only used on the root of the inlining tree.
  31. {
  32. // We have initialized these previously, but to size 0. Make them larger.
  33. impPendingBlockMembers.Init(getAllocator(), fgBBNumMax * 2);
  34. impSpillCliquePredMembers.Init(getAllocator(), fgBBNumMax * 2);
  35. impSpillCliqueSuccMembers.Init(getAllocator(), fgBBNumMax * 2);
  36. }
  37. inlineRoot->impPendingBlockMembers.Reset(fgBBNumMax * 2);
  38. inlineRoot->impSpillCliquePredMembers.Reset(fgBBNumMax * 2);
  39. inlineRoot->impSpillCliqueSuccMembers.Reset(fgBBNumMax * 2);
  40. impBlockListNodeFreeList = nullptr;
  41. #ifdef DEBUG
  42. impLastILoffsStmt = nullptr;
  43. impNestedStackSpill = false;
  44. #endif
  45. impBoxTemp = BAD_VAR_NUM;
  46. impPendingList = impPendingFree = nullptr;
  47. /* Add the entry-point to the worker-list */
  48. // Skip leading internal blocks. There can be one as a leading scratch BB, and more
  49. // from EH normalization.
  50. // NOTE: It might be possible to always just put fgFirstBB on the pending list, and let everything else just fall
  51. // out.
  52. for (; method->bbFlags & BBF_INTERNAL; method = method->bbNext)
  53. {
  54. // Treat these as imported.
  55. assert(method->bbJumpKind == BBJ_NONE); // We assume all the leading ones are fallthrough.
  56. JITDUMP("Marking leading BBF_INTERNAL block BB%02u as BBF_IMPORTED\n", method->bbNum);
  57. method->bbFlags |= BBF_IMPORTED;
  58. }
  59. impImportBlockPending(method);
  60. /* Import blocks in the worker-list until there are no more */
  61. while (impPendingList)
  62. {
  63. /* Remove the entry at the front of the list */
  64. PendingDsc* dsc = impPendingList;
  65. impPendingList = impPendingList->pdNext;
  66. impSetPendingBlockMember(dsc->pdBB, 0);
  67. /* Restore the stack state */
  68. verCurrentState.thisInitialized = dsc->pdThisPtrInit;
  69. verCurrentState.esStackDepth = dsc->pdSavedStack.ssDepth;
  70. if (verCurrentState.esStackDepth)
  71. {
  72. impRestoreStackState(&dsc->pdSavedStack);
  73. }
  74. /* Add the entry to the free list for reuse */
  75. dsc->pdNext = impPendingFree;
  76. impPendingFree = dsc;
  77. /* Now import the block */
  78. if (dsc->pdBB->bbFlags & BBF_FAILED_VERIFICATION)
  79. {
  80. #ifdef _TARGET_64BIT_
  81. // On AMD64, during verification we have to match JIT64 behavior since the VM is very tighly
  82. // coupled with the JIT64 IL Verification logic. Look inside verHandleVerificationFailure
  83. // method for further explanation on why we raise this exception instead of making the jitted
  84. // code throw the verification exception during execution.
  85. if (tiVerificationNeeded && (opts.eeFlags & CORJIT_FLG_IMPORT_ONLY) != 0)
  86. {
  87. BADCODE("Basic block marked as not verifiable");
  88. }
  89. else
  90. #endif // _TARGET_64BIT_
  91. {
  92. verConvertBBToThrowVerificationException(dsc->pdBB DEBUGARG(true));
  93. impEndTreeList(dsc->pdBB);
  94. }
  95. }
  96. else
  97. {
  98. impImportBlock(dsc->pdBB);
  99. if (compDonotInline())
  100. {
  101. return;
  102. }
  103. if (compIsForImportOnly() && !tiVerificationNeeded)
  104. {
  105. return;
  106. }
  107. }
  108. }
  109. #ifdef DEBUG
  110. if (verbose && info.compXcptnsCount)
  111. {
  112. printf("\nAfter impImport() added block for try,catch,finally");
  113. fgDispBasicBlocks();
  114. printf("\n");
  115. }
  116. // Used in impImportBlockPending() for STRESS_CHK_REIMPORT
  117. for (BasicBlock* block = fgFirstBB; block; block = block->bbNext)
  118. {
  119. block->bbFlags &= ~BBF_VISITED;
  120. }
  121. #endif
  122. assert(!compIsForInlining() || !tiVerificationNeeded);
  123. }

首先初始化运行堆栈(execution stack)verCurrentState.esStack, maxstack小于16时使用SmallStack, 否则new.
然后初始化记录"Spill Cliques"(Spill Temps的群体, 用于保存从运行堆栈spill出来的值的临时变量)所需的成员.
之后标记内部添加的(BBF_INTERNAL)BasicBlock为已导入(BBF_IMPORTED), 因为这些block并无对应的IL范围.
接下来会添加第一个非内部的BasicBlock到队列impPendingList, 然后一直处理这个队列直到它为空.
处理队列中的BasicBlock会调用函数impImportBlock(dsc->pdBB).

impImportBlock的源代码如下:

  1. //***************************************************************
  2. // Import the instructions for the given basic block. Perform
  3. // verification, throwing an exception on failure. Push any successor blocks that are enabled for the first
  4. // time, or whose verification pre-state is changed.
  5. #ifdef _PREFAST_
  6. #pragma warning(push)
  7. #pragma warning(disable : 21000) // Suppress PREFast warning about overly large function
  8. #endif
  9. void Compiler::impImportBlock(BasicBlock* block)
  10. {
  11. // BBF_INTERNAL blocks only exist during importation due to EH canonicalization. We need to
  12. // handle them specially. In particular, there is no IL to import for them, but we do need
  13. // to mark them as imported and put their successors on the pending import list.
  14. if (block->bbFlags & BBF_INTERNAL)
  15. {
  16. JITDUMP("Marking BBF_INTERNAL block BB%02u as BBF_IMPORTED\n", block->bbNum);
  17. block->bbFlags |= BBF_IMPORTED;
  18. for (unsigned i = 0; i < block->NumSucc(); i++)
  19. {
  20. impImportBlockPending(block->GetSucc(i));
  21. }
  22. return;
  23. }
  24. bool markImport;
  25. assert(block);
  26. /* Make the block globaly available */
  27. compCurBB = block;
  28. #ifdef DEBUG
  29. /* Initialize the debug variables */
  30. impCurOpcName = "unknown";
  31. impCurOpcOffs = block->bbCodeOffs;
  32. #endif
  33. /* Set the current stack state to the merged result */
  34. verResetCurrentState(block, &verCurrentState);
  35. /* Now walk the code and import the IL into GenTrees */
  36. struct FilterVerificationExceptionsParam
  37. {
  38. Compiler* pThis;
  39. BasicBlock* block;
  40. };
  41. FilterVerificationExceptionsParam param;
  42. param.pThis = this;
  43. param.block = block;
  44. PAL_TRY(FilterVerificationExceptionsParam*, pParam, &param)
  45. {
  46. /* @VERIFICATION : For now, the only state propagation from try
  47. to it's handler is "thisInit" state (stack is empty at start of try).
  48. In general, for state that we track in verification, we need to
  49. model the possibility that an exception might happen at any IL
  50. instruction, so we really need to merge all states that obtain
  51. between IL instructions in a try block into the start states of
  52. all handlers.
  53. However we do not allow the 'this' pointer to be uninitialized when
  54. entering most kinds try regions (only try/fault are allowed to have
  55. an uninitialized this pointer on entry to the try)
  56. Fortunately, the stack is thrown away when an exception
  57. leads to a handler, so we don't have to worry about that.
  58. We DO, however, have to worry about the "thisInit" state.
  59. But only for the try/fault case.
  60. The only allowed transition is from TIS_Uninit to TIS_Init.
  61. So for a try/fault region for the fault handler block
  62. we will merge the start state of the try begin
  63. and the post-state of each block that is part of this try region
  64. */
  65. // merge the start state of the try begin
  66. //
  67. if (pParam->block->bbFlags & BBF_TRY_BEG)
  68. {
  69. pParam->pThis->impVerifyEHBlock(pParam->block, true);
  70. }
  71. pParam->pThis->impImportBlockCode(pParam->block);
  72. // As discussed above:
  73. // merge the post-state of each block that is part of this try region
  74. //
  75. if (pParam->block->hasTryIndex())
  76. {
  77. pParam->pThis->impVerifyEHBlock(pParam->block, false);
  78. }
  79. }
  80. PAL_EXCEPT_FILTER(FilterVerificationExceptions)
  81. {
  82. verHandleVerificationFailure(block DEBUGARG(false));
  83. }
  84. PAL_ENDTRY
  85. if (compDonotInline())
  86. {
  87. return;
  88. }
  89. assert(!compDonotInline());
  90. markImport = false;
  91. SPILLSTACK:
  92. unsigned baseTmp = NO_BASE_TMP; // input temps assigned to successor blocks
  93. bool reimportSpillClique = false;
  94. BasicBlock* tgtBlock = nullptr;
  95. /* If the stack is non-empty, we might have to spill its contents */
  96. if (verCurrentState.esStackDepth != 0)
  97. {
  98. impBoxTemp = BAD_VAR_NUM; // if a box temp is used in a block that leaves something
  99. // on the stack, its lifetime is hard to determine, simply
  100. // don't reuse such temps.
  101. GenTreePtr addStmt = nullptr;
  102. /* Do the successors of 'block' have any other predecessors ?
  103. We do not want to do some of the optimizations related to multiRef
  104. if we can reimport blocks */
  105. unsigned multRef = impCanReimport ? unsigned(~0) : 0;
  106. switch (block->bbJumpKind)
  107. {
  108. case BBJ_COND:
  109. /* Temporarily remove the 'jtrue' from the end of the tree list */
  110. assert(impTreeLast);
  111. assert(impTreeLast->gtOper == GT_STMT);
  112. assert(impTreeLast->gtStmt.gtStmtExpr->gtOper == GT_JTRUE);
  113. addStmt = impTreeLast;
  114. impTreeLast = impTreeLast->gtPrev;
  115. /* Note if the next block has more than one ancestor */
  116. multRef |= block->bbNext->bbRefs;
  117. /* Does the next block have temps assigned? */
  118. baseTmp = block->bbNext->bbStkTempsIn;
  119. tgtBlock = block->bbNext;
  120. if (baseTmp != NO_BASE_TMP)
  121. {
  122. break;
  123. }
  124. /* Try the target of the jump then */
  125. multRef |= block->bbJumpDest->bbRefs;
  126. baseTmp = block->bbJumpDest->bbStkTempsIn;
  127. tgtBlock = block->bbJumpDest;
  128. break;
  129. case BBJ_ALWAYS:
  130. multRef |= block->bbJumpDest->bbRefs;
  131. baseTmp = block->bbJumpDest->bbStkTempsIn;
  132. tgtBlock = block->bbJumpDest;
  133. break;
  134. case BBJ_NONE:
  135. multRef |= block->bbNext->bbRefs;
  136. baseTmp = block->bbNext->bbStkTempsIn;
  137. tgtBlock = block->bbNext;
  138. break;
  139. case BBJ_SWITCH:
  140. BasicBlock** jmpTab;
  141. unsigned jmpCnt;
  142. /* Temporarily remove the GT_SWITCH from the end of the tree list */
  143. assert(impTreeLast);
  144. assert(impTreeLast->gtOper == GT_STMT);
  145. assert(impTreeLast->gtStmt.gtStmtExpr->gtOper == GT_SWITCH);
  146. addStmt = impTreeLast;
  147. impTreeLast = impTreeLast->gtPrev;
  148. jmpCnt = block->bbJumpSwt->bbsCount;
  149. jmpTab = block->bbJumpSwt->bbsDstTab;
  150. do
  151. {
  152. tgtBlock = (*jmpTab);
  153. multRef |= tgtBlock->bbRefs;
  154. // Thanks to spill cliques, we should have assigned all or none
  155. assert((baseTmp == NO_BASE_TMP) || (baseTmp == tgtBlock->bbStkTempsIn));
  156. baseTmp = tgtBlock->bbStkTempsIn;
  157. if (multRef > 1)
  158. {
  159. break;
  160. }
  161. } while (++jmpTab, --jmpCnt);
  162. break;
  163. case BBJ_CALLFINALLY:
  164. case BBJ_EHCATCHRET:
  165. case BBJ_RETURN:
  166. case BBJ_EHFINALLYRET:
  167. case BBJ_EHFILTERRET:
  168. case BBJ_THROW:
  169. NO_WAY("can't have 'unreached' end of BB with non-empty stack");
  170. break;
  171. default:
  172. noway_assert(!"Unexpected bbJumpKind");
  173. break;
  174. }
  175. assert(multRef >= 1);
  176. /* Do we have a base temp number? */
  177. bool newTemps = (baseTmp == NO_BASE_TMP);
  178. if (newTemps)
  179. {
  180. /* Grab enough temps for the whole stack */
  181. baseTmp = impGetSpillTmpBase(block);
  182. }
  183. /* Spill all stack entries into temps */
  184. unsigned level, tempNum;
  185. JITDUMP("\nSpilling stack entries into temps\n");
  186. for (level = 0, tempNum = baseTmp; level < verCurrentState.esStackDepth; level++, tempNum++)
  187. {
  188. GenTreePtr tree = verCurrentState.esStack[level].val;
  189. /* VC generates code where it pushes a byref from one branch, and an int (ldc.i4 0) from
  190. the other. This should merge to a byref in unverifiable code.
  191. However, if the branch which leaves the TYP_I_IMPL on the stack is imported first, the
  192. successor would be imported assuming there was a TYP_I_IMPL on
  193. the stack. Thus the value would not get GC-tracked. Hence,
  194. change the temp to TYP_BYREF and reimport the successors.
  195. Note: We should only allow this in unverifiable code.
  196. */
  197. if (tree->gtType == TYP_BYREF && lvaTable[tempNum].lvType == TYP_I_IMPL && !verNeedsVerification())
  198. {
  199. lvaTable[tempNum].lvType = TYP_BYREF;
  200. impReimportMarkSuccessors(block);
  201. markImport = true;
  202. }
  203. #ifdef _TARGET_64BIT_
  204. if (genActualType(tree->gtType) == TYP_I_IMPL && lvaTable[tempNum].lvType == TYP_INT)
  205. {
  206. if (tiVerificationNeeded && tgtBlock->bbEntryState != nullptr &&
  207. (tgtBlock->bbFlags & BBF_FAILED_VERIFICATION) == 0)
  208. {
  209. // Merge the current state into the entry state of block;
  210. // the call to verMergeEntryStates must have changed
  211. // the entry state of the block by merging the int local var
  212. // and the native-int stack entry.
  213. bool changed = false;
  214. if (verMergeEntryStates(tgtBlock, &changed))
  215. {
  216. impRetypeEntryStateTemps(tgtBlock);
  217. impReimportBlockPending(tgtBlock);
  218. assert(changed);
  219. }
  220. else
  221. {
  222. tgtBlock->bbFlags |= BBF_FAILED_VERIFICATION;
  223. break;
  224. }
  225. }
  226. // Some other block in the spill clique set this to "int", but now we have "native int".
  227. // Change the type and go back to re-import any blocks that used the wrong type.
  228. lvaTable[tempNum].lvType = TYP_I_IMPL;
  229. reimportSpillClique = true;
  230. }
  231. else if (genActualType(tree->gtType) == TYP_INT && lvaTable[tempNum].lvType == TYP_I_IMPL)
  232. {
  233. // Spill clique has decided this should be "native int", but this block only pushes an "int".
  234. // Insert a sign-extension to "native int" so we match the clique.
  235. verCurrentState.esStack[level].val = gtNewCastNode(TYP_I_IMPL, tree, TYP_I_IMPL);
  236. }
  237. // Consider the case where one branch left a 'byref' on the stack and the other leaves
  238. // an 'int'. On 32-bit, this is allowed (in non-verifiable code) since they are the same
  239. // size. JIT64 managed to make this work on 64-bit. For compatibility, we support JIT64
  240. // behavior instead of asserting and then generating bad code (where we save/restore the
  241. // low 32 bits of a byref pointer to an 'int' sized local). If the 'int' side has been
  242. // imported already, we need to change the type of the local and reimport the spill clique.
  243. // If the 'byref' side has imported, we insert a cast from int to 'native int' to match
  244. // the 'byref' size.
  245. if (!tiVerificationNeeded)
  246. {
  247. if (genActualType(tree->gtType) == TYP_BYREF && lvaTable[tempNum].lvType == TYP_INT)
  248. {
  249. // Some other block in the spill clique set this to "int", but now we have "byref".
  250. // Change the type and go back to re-import any blocks that used the wrong type.
  251. lvaTable[tempNum].lvType = TYP_BYREF;
  252. reimportSpillClique = true;
  253. }
  254. else if (genActualType(tree->gtType) == TYP_INT && lvaTable[tempNum].lvType == TYP_BYREF)
  255. {
  256. // Spill clique has decided this should be "byref", but this block only pushes an "int".
  257. // Insert a sign-extension to "native int" so we match the clique size.
  258. verCurrentState.esStack[level].val = gtNewCastNode(TYP_I_IMPL, tree, TYP_I_IMPL);
  259. }
  260. }
  261. #endif // _TARGET_64BIT_
  262. #if FEATURE_X87_DOUBLES
  263. // X87 stack doesn't differentiate between float/double
  264. // so promoting is no big deal.
  265. // For everybody else keep it as float until we have a collision and then promote
  266. // Just like for x64's TYP_INT<->TYP_I_IMPL
  267. if (multRef > 1 && tree->gtType == TYP_FLOAT)
  268. {
  269. verCurrentState.esStack[level].val = gtNewCastNode(TYP_DOUBLE, tree, TYP_DOUBLE);
  270. }
  271. #else // !FEATURE_X87_DOUBLES
  272. if (tree->gtType == TYP_DOUBLE && lvaTable[tempNum].lvType == TYP_FLOAT)
  273. {
  274. // Some other block in the spill clique set this to "float", but now we have "double".
  275. // Change the type and go back to re-import any blocks that used the wrong type.
  276. lvaTable[tempNum].lvType = TYP_DOUBLE;
  277. reimportSpillClique = true;
  278. }
  279. else if (tree->gtType == TYP_FLOAT && lvaTable[tempNum].lvType == TYP_DOUBLE)
  280. {
  281. // Spill clique has decided this should be "double", but this block only pushes a "float".
  282. // Insert a cast to "double" so we match the clique.
  283. verCurrentState.esStack[level].val = gtNewCastNode(TYP_DOUBLE, tree, TYP_DOUBLE);
  284. }
  285. #endif // FEATURE_X87_DOUBLES
  286. /* If addStmt has a reference to tempNum (can only happen if we
  287. are spilling to the temps already used by a previous block),
  288. we need to spill addStmt */
  289. if (addStmt && !newTemps && gtHasRef(addStmt->gtStmt.gtStmtExpr, tempNum, false))
  290. {
  291. GenTreePtr addTree = addStmt->gtStmt.gtStmtExpr;
  292. if (addTree->gtOper == GT_JTRUE)
  293. {
  294. GenTreePtr relOp = addTree->gtOp.gtOp1;
  295. assert(relOp->OperIsCompare());
  296. var_types type = genActualType(relOp->gtOp.gtOp1->TypeGet());
  297. if (gtHasRef(relOp->gtOp.gtOp1, tempNum, false))
  298. {
  299. unsigned temp = lvaGrabTemp(true DEBUGARG("spill addStmt JTRUE ref Op1"));
  300. impAssignTempGen(temp, relOp->gtOp.gtOp1, level);
  301. type = genActualType(lvaTable[temp].TypeGet());
  302. relOp->gtOp.gtOp1 = gtNewLclvNode(temp, type);
  303. }
  304. if (gtHasRef(relOp->gtOp.gtOp2, tempNum, false))
  305. {
  306. unsigned temp = lvaGrabTemp(true DEBUGARG("spill addStmt JTRUE ref Op2"));
  307. impAssignTempGen(temp, relOp->gtOp.gtOp2, level);
  308. type = genActualType(lvaTable[temp].TypeGet());
  309. relOp->gtOp.gtOp2 = gtNewLclvNode(temp, type);
  310. }
  311. }
  312. else
  313. {
  314. assert(addTree->gtOper == GT_SWITCH && genActualType(addTree->gtOp.gtOp1->gtType) == TYP_I_IMPL);
  315. unsigned temp = lvaGrabTemp(true DEBUGARG("spill addStmt SWITCH"));
  316. impAssignTempGen(temp, addTree->gtOp.gtOp1, level);
  317. addTree->gtOp.gtOp1 = gtNewLclvNode(temp, TYP_I_IMPL);
  318. }
  319. }
  320. /* Spill the stack entry, and replace with the temp */
  321. if (!impSpillStackEntry(level, tempNum
  322. #ifdef DEBUG
  323. ,
  324. true, "Spill Stack Entry"
  325. #endif
  326. ))
  327. {
  328. if (markImport)
  329. {
  330. BADCODE("bad stack state");
  331. }
  332. // Oops. Something went wrong when spilling. Bad code.
  333. verHandleVerificationFailure(block DEBUGARG(true));
  334. goto SPILLSTACK;
  335. }
  336. }
  337. /* Put back the 'jtrue'/'switch' if we removed it earlier */
  338. if (addStmt)
  339. {
  340. impAppendStmt(addStmt, (unsigned)CHECK_SPILL_NONE);
  341. }
  342. }
  343. // Some of the append/spill logic works on compCurBB
  344. assert(compCurBB == block);
  345. /* Save the tree list in the block */
  346. impEndTreeList(block);
  347. // impEndTreeList sets BBF_IMPORTED on the block
  348. // We do *NOT* want to set it later than this because
  349. // impReimportSpillClique might clear it if this block is both a
  350. // predecessor and successor in the current spill clique
  351. assert(block->bbFlags & BBF_IMPORTED);
  352. // If we had a int/native int, or float/double collision, we need to re-import
  353. if (reimportSpillClique)
  354. {
  355. // This will re-import all the successors of block (as well as each of their predecessors)
  356. impReimportSpillClique(block);
  357. // For blocks that haven't been imported yet, we still need to mark them as pending import.
  358. for (unsigned i = 0; i < block->NumSucc(); i++)
  359. {
  360. BasicBlock* succ = block->GetSucc(i);
  361. if ((succ->bbFlags & BBF_IMPORTED) == 0)
  362. {
  363. impImportBlockPending(succ);
  364. }
  365. }
  366. }
  367. else // the normal case
  368. {
  369. // otherwise just import the successors of block
  370. /* Does this block jump to any other blocks? */
  371. for (unsigned i = 0; i < block->NumSucc(); i++)
  372. {
  373. impImportBlockPending(block->GetSucc(i));
  374. }
  375. }
  376. }
  377. #ifdef _PREFAST_
  378. #pragma warning(pop)
  379. #endif

这个函数首先会调用impImportBlockCode, impImportBlockCode负责根据IL生成GenTree的主要处理.
导入block后, 如果运行堆栈不为空(跳转后的指令需要跳转前push进去的参数), 需要把运行堆栈中的值spill到临时变量.
block结束后spill的临时变量的索引开始值会保存在bbStkTempsOut, block开始时需要读取的临时变量的索引开始值保存在bbStkTempsIn.
因为运行堆栈中的值基本上不会跨越BasicBlock(从C#编译出来的IL), 就不详细分析这里的逻辑了.
接下来看impImportBlockCode.

impImportBlockCode的源代码如下:
这个函数有5000多行, 这里我只截取一部分.

  1. #ifdef _PREFAST_
  2. #pragma warning(push)
  3. #pragma warning(disable : 21000) // Suppress PREFast warning about overly large function
  4. #endif
  5. /*****************************************************************************
  6. * Import the instr for the given basic block
  7. */
  8. void Compiler::impImportBlockCode(BasicBlock* block)
  9. {
  10. #define _impResolveToken(kind) impResolveToken(codeAddr, &resolvedToken, kind)
  11. #ifdef DEBUG
  12. if (verbose)
  13. {
  14. printf("\nImporting BB%02u (PC=%03u) of '%s'", block->bbNum, block->bbCodeOffs, info.compFullName);
  15. }
  16. #endif
  17. unsigned nxtStmtIndex = impInitBlockLineInfo();
  18. IL_OFFSET nxtStmtOffs;
  19. GenTreePtr arrayNodeFrom, arrayNodeTo, arrayNodeToIndex;
  20. bool expandInline;
  21. CorInfoHelpFunc helper;
  22. CorInfoIsAccessAllowedResult accessAllowedResult;
  23. CORINFO_HELPER_DESC calloutHelper;
  24. const BYTE* lastLoadToken = nullptr;
  25. // reject cyclic constraints
  26. if (tiVerificationNeeded)
  27. {
  28. Verify(!info.hasCircularClassConstraints, "Method parent has circular class type parameter constraints.");
  29. Verify(!info.hasCircularMethodConstraints, "Method has circular method type parameter constraints.");
  30. }
  31. /* Get the tree list started */
  32. impBeginTreeList();
  33. /* Walk the opcodes that comprise the basic block */
  34. const BYTE* codeAddr = info.compCode + block->bbCodeOffs;
  35. const BYTE* codeEndp = info.compCode + block->bbCodeOffsEnd;
  36. IL_OFFSET opcodeOffs = block->bbCodeOffs;
  37. IL_OFFSET lastSpillOffs = opcodeOffs;
  38. signed jmpDist;
  39. /* remember the start of the delegate creation sequence (used for verification) */
  40. const BYTE* delegateCreateStart = nullptr;
  41. int prefixFlags = 0;
  42. bool explicitTailCall, constraintCall, readonlyCall;
  43. bool insertLdloc = false; // set by CEE_DUP and cleared by following store
  44. typeInfo tiRetVal;
  45. unsigned numArgs = info.compArgsCount;
  46. /* Now process all the opcodes in the block */
  47. var_types callTyp = TYP_COUNT;
  48. OPCODE prevOpcode = CEE_ILLEGAL;
  49. if (block->bbCatchTyp)
  50. {
  51. if (info.compStmtOffsetsImplicit & ICorDebugInfo::CALL_SITE_BOUNDARIES)
  52. {
  53. impCurStmtOffsSet(block->bbCodeOffs);
  54. }
  55. // We will spill the GT_CATCH_ARG and the input of the BB_QMARK block
  56. // to a temp. This is a trade off for code simplicity
  57. impSpillSpecialSideEff();
  58. }
  59. while (codeAddr < codeEndp)
  60. {
  61. bool usingReadyToRunHelper = false;
  62. CORINFO_RESOLVED_TOKEN resolvedToken;
  63. CORINFO_RESOLVED_TOKEN constrainedResolvedToken;
  64. CORINFO_CALL_INFO callInfo;
  65. CORINFO_FIELD_INFO fieldInfo;
  66. tiRetVal = typeInfo(); // Default type info
  67. //---------------------------------------------------------------------
  68. /* We need to restrict the max tree depth as many of the Compiler
  69. functions are recursive. We do this by spilling the stack */
  70. if (verCurrentState.esStackDepth)
  71. {
  72. /* Has it been a while since we last saw a non-empty stack (which
  73. guarantees that the tree depth isnt accumulating. */
  74. if ((opcodeOffs - lastSpillOffs) > 200)
  75. {
  76. impSpillStackEnsure();
  77. lastSpillOffs = opcodeOffs;
  78. }
  79. }
  80. else
  81. {
  82. lastSpillOffs = opcodeOffs;
  83. impBoxTempInUse = false; // nothing on the stack, box temp OK to use again
  84. }
  85. /* Compute the current instr offset */
  86. opcodeOffs = (IL_OFFSET)(codeAddr - info.compCode);
  87. #if defined(DEBUGGING_SUPPORT) || defined(DEBUG)
  88. #ifndef DEBUG
  89. if (opts.compDbgInfo)
  90. #endif
  91. {
  92. if (!compIsForInlining())
  93. {
  94. nxtStmtOffs =
  95. (nxtStmtIndex < info.compStmtOffsetsCount) ? info.compStmtOffsets[nxtStmtIndex] : BAD_IL_OFFSET;
  96. /* Have we reached the next stmt boundary ? */
  97. if (nxtStmtOffs != BAD_IL_OFFSET && opcodeOffs >= nxtStmtOffs)
  98. {
  99. assert(nxtStmtOffs == info.compStmtOffsets[nxtStmtIndex]);
  100. if (verCurrentState.esStackDepth != 0 && opts.compDbgCode)
  101. {
  102. /* We need to provide accurate IP-mapping at this point.
  103. So spill anything on the stack so that it will form
  104. gtStmts with the correct stmt offset noted */
  105. impSpillStackEnsure(true);
  106. }
  107. // Has impCurStmtOffs been reported in any tree?
  108. if (impCurStmtOffs != BAD_IL_OFFSET && opts.compDbgCode)
  109. {
  110. GenTreePtr placeHolder = new (this, GT_NO_OP) GenTree(GT_NO_OP, TYP_VOID);
  111. impAppendTree(placeHolder, (unsigned)CHECK_SPILL_NONE, impCurStmtOffs);
  112. assert(impCurStmtOffs == BAD_IL_OFFSET);
  113. }
  114. if (impCurStmtOffs == BAD_IL_OFFSET)
  115. {
  116. /* Make sure that nxtStmtIndex is in sync with opcodeOffs.
  117. If opcodeOffs has gone past nxtStmtIndex, catch up */
  118. while ((nxtStmtIndex + 1) < info.compStmtOffsetsCount &&
  119. info.compStmtOffsets[nxtStmtIndex + 1] <= opcodeOffs)
  120. {
  121. nxtStmtIndex++;
  122. }
  123. /* Go to the new stmt */
  124. impCurStmtOffsSet(info.compStmtOffsets[nxtStmtIndex]);
  125. /* Update the stmt boundary index */
  126. nxtStmtIndex++;
  127. assert(nxtStmtIndex <= info.compStmtOffsetsCount);
  128. /* Are there any more line# entries after this one? */
  129. if (nxtStmtIndex < info.compStmtOffsetsCount)
  130. {
  131. /* Remember where the next line# starts */
  132. nxtStmtOffs = info.compStmtOffsets[nxtStmtIndex];
  133. }
  134. else
  135. {
  136. /* No more line# entries */
  137. nxtStmtOffs = BAD_IL_OFFSET;
  138. }
  139. }
  140. }
  141. else if ((info.compStmtOffsetsImplicit & ICorDebugInfo::STACK_EMPTY_BOUNDARIES) &&
  142. (verCurrentState.esStackDepth == 0))
  143. {
  144. /* At stack-empty locations, we have already added the tree to
  145. the stmt list with the last offset. We just need to update
  146. impCurStmtOffs
  147. */
  148. impCurStmtOffsSet(opcodeOffs);
  149. }
  150. else if ((info.compStmtOffsetsImplicit & ICorDebugInfo::CALL_SITE_BOUNDARIES) &&
  151. impOpcodeIsCallSiteBoundary(prevOpcode))
  152. {
  153. /* Make sure we have a type cached */
  154. assert(callTyp != TYP_COUNT);
  155. if (callTyp == TYP_VOID)
  156. {
  157. impCurStmtOffsSet(opcodeOffs);
  158. }
  159. else if (opts.compDbgCode)
  160. {
  161. impSpillStackEnsure(true);
  162. impCurStmtOffsSet(opcodeOffs);
  163. }
  164. }
  165. else if ((info.compStmtOffsetsImplicit & ICorDebugInfo::NOP_BOUNDARIES) && (prevOpcode == CEE_NOP))
  166. {
  167. if (opts.compDbgCode)
  168. {
  169. impSpillStackEnsure(true);
  170. }
  171. impCurStmtOffsSet(opcodeOffs);
  172. }
  173. assert(impCurStmtOffs == BAD_IL_OFFSET || nxtStmtOffs == BAD_IL_OFFSET ||
  174. jitGetILoffs(impCurStmtOffs) <= nxtStmtOffs);
  175. }
  176. }
  177. #endif // defined(DEBUGGING_SUPPORT) || defined(DEBUG)
  178. CORINFO_CLASS_HANDLE clsHnd = DUMMY_INIT(NULL);
  179. CORINFO_CLASS_HANDLE ldelemClsHnd = DUMMY_INIT(NULL);
  180. CORINFO_CLASS_HANDLE stelemClsHnd = DUMMY_INIT(NULL);
  181. var_types lclTyp, ovflType = TYP_UNKNOWN;
  182. GenTreePtr op1 = DUMMY_INIT(NULL);
  183. GenTreePtr op2 = DUMMY_INIT(NULL);
  184. GenTreeArgList* args = nullptr; // What good do these "DUMMY_INIT"s do?
  185. GenTreePtr newObjThisPtr = DUMMY_INIT(NULL);
  186. bool uns = DUMMY_INIT(false);
  187. /* Get the next opcode and the size of its parameters */
  188. OPCODE opcode = (OPCODE)getU1LittleEndian(codeAddr);
  189. codeAddr += sizeof(__int8);
  190. #ifdef DEBUG
  191. impCurOpcOffs = (IL_OFFSET)(codeAddr - info.compCode - 1);
  192. JITDUMP("\n [%2u] %3u (0x%03x) ", verCurrentState.esStackDepth, impCurOpcOffs, impCurOpcOffs);
  193. #endif
  194. DECODE_OPCODE:
  195. // Return if any previous code has caused inline to fail.
  196. if (compDonotInline())
  197. {
  198. return;
  199. }
  200. /* Get the size of additional parameters */
  201. signed int sz = opcodeSizes[opcode];
  202. #ifdef DEBUG
  203. clsHnd = NO_CLASS_HANDLE;
  204. lclTyp = TYP_COUNT;
  205. callTyp = TYP_COUNT;
  206. impCurOpcOffs = (IL_OFFSET)(codeAddr - info.compCode - 1);
  207. impCurOpcName = opcodeNames[opcode];
  208. if (verbose && (opcode != CEE_PREFIX1))
  209. {
  210. printf("%s", impCurOpcName);
  211. }
  212. /* Use assertImp() to display the opcode */
  213. op1 = op2 = nullptr;
  214. #endif
  215. /* See what kind of an opcode we have, then */
  216. unsigned mflags = 0;
  217. unsigned clsFlags = 0;
  218. switch (opcode)
  219. {
  220. unsigned lclNum;
  221. var_types type;
  222. GenTreePtr op3;
  223. genTreeOps oper;
  224. unsigned size;
  225. int val;
  226. CORINFO_SIG_INFO sig;
  227. unsigned flags;
  228. IL_OFFSET jmpAddr;
  229. bool ovfl, unordered, callNode;
  230. bool ldstruct;
  231. CORINFO_CLASS_HANDLE tokenType;
  232. union {
  233. int intVal;
  234. float fltVal;
  235. __int64 lngVal;
  236. double dblVal;
  237. } cval;
  238. case CEE_PREFIX1:
  239. opcode = (OPCODE)(getU1LittleEndian(codeAddr) + 256);
  240. codeAddr += sizeof(__int8);
  241. opcodeOffs = (IL_OFFSET)(codeAddr - info.compCode);
  242. goto DECODE_OPCODE;
  243. SPILL_APPEND:
  244. /* Append 'op1' to the list of statements */
  245. impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
  246. goto DONE_APPEND;
  247. APPEND:
  248. /* Append 'op1' to the list of statements */
  249. impAppendTree(op1, (unsigned)CHECK_SPILL_NONE, impCurStmtOffs);
  250. goto DONE_APPEND;
  251. DONE_APPEND:
  252. #ifdef DEBUG
  253. // Remember at which BC offset the tree was finished
  254. impNoteLastILoffs();
  255. #endif
  256. break;
  257. case CEE_LDNULL:
  258. impPushNullObjRefOnStack();
  259. break;
  260. case CEE_LDC_I4_M1:
  261. case CEE_LDC_I4_0:
  262. case CEE_LDC_I4_1:
  263. case CEE_LDC_I4_2:
  264. case CEE_LDC_I4_3:
  265. case CEE_LDC_I4_4:
  266. case CEE_LDC_I4_5:
  267. case CEE_LDC_I4_6:
  268. case CEE_LDC_I4_7:
  269. case CEE_LDC_I4_8:
  270. cval.intVal = (opcode - CEE_LDC_I4_0);
  271. assert(-1 <= cval.intVal && cval.intVal <= 8);
  272. goto PUSH_I4CON;
  273. case CEE_LDC_I4_S:
  274. cval.intVal = getI1LittleEndian(codeAddr);
  275. goto PUSH_I4CON;
  276. case CEE_LDC_I4:
  277. cval.intVal = getI4LittleEndian(codeAddr);
  278. goto PUSH_I4CON;
  279. PUSH_I4CON:
  280. JITDUMP(" %d", cval.intVal);
  281. impPushOnStack(gtNewIconNode(cval.intVal), typeInfo(TI_INT));
  282. break;
  283. case CEE_LDC_I8:
  284. cval.lngVal = getI8LittleEndian(codeAddr);
  285. JITDUMP(" 0x%016llx", cval.lngVal);
  286. impPushOnStack(gtNewLconNode(cval.lngVal), typeInfo(TI_LONG));
  287. break;
  288. case CEE_LDC_R8:
  289. cval.dblVal = getR8LittleEndian(codeAddr);
  290. JITDUMP(" %#.17g", cval.dblVal);
  291. impPushOnStack(gtNewDconNode(cval.dblVal), typeInfo(TI_DOUBLE));
  292. break;
  293. case CEE_LDC_R4:
  294. cval.dblVal = getR4LittleEndian(codeAddr);
  295. JITDUMP(" %#.17g", cval.dblVal);
  296. {
  297. GenTreePtr cnsOp = gtNewDconNode(cval.dblVal);
  298. #if !FEATURE_X87_DOUBLES
  299. // X87 stack doesn't differentiate between float/double
  300. // so R4 is treated as R8, but everybody else does
  301. cnsOp->gtType = TYP_FLOAT;
  302. #endif // FEATURE_X87_DOUBLES
  303. impPushOnStack(cnsOp, typeInfo(TI_DOUBLE));
  304. }
  305. break;
  306. case CEE_LDSTR:
  307. if (compIsForInlining())
  308. {
  309. if (impInlineInfo->inlineCandidateInfo->dwRestrictions & INLINE_NO_CALLEE_LDSTR)
  310. {
  311. compInlineResult->NoteFatal(InlineObservation::CALLSITE_HAS_LDSTR_RESTRICTION);
  312. return;
  313. }
  314. }
  315. val = getU4LittleEndian(codeAddr);
  316. JITDUMP(" %08X", val);
  317. if (tiVerificationNeeded)
  318. {
  319. Verify(info.compCompHnd->isValidStringRef(info.compScopeHnd, val), "bad string");
  320. tiRetVal = typeInfo(TI_REF, impGetStringClass());
  321. }
  322. impPushOnStack(gtNewSconNode(val, info.compScopeHnd), tiRetVal);
  323. break;
  324. case CEE_LDARG:
  325. lclNum = getU2LittleEndian(codeAddr);
  326. JITDUMP(" %u", lclNum);
  327. impLoadArg(lclNum, opcodeOffs + sz + 1);
  328. break;
  329. case CEE_LDARG_S:
  330. lclNum = getU1LittleEndian(codeAddr);
  331. JITDUMP(" %u", lclNum);
  332. impLoadArg(lclNum, opcodeOffs + sz + 1);
  333. break;
  334. case CEE_LDARG_0:
  335. case CEE_LDARG_1:
  336. case CEE_LDARG_2:
  337. case CEE_LDARG_3:
  338. lclNum = (opcode - CEE_LDARG_0);
  339. assert(lclNum >= 0 && lclNum < 4);
  340. impLoadArg(lclNum, opcodeOffs + sz + 1);
  341. break;
  342. case CEE_LDLOC:
  343. lclNum = getU2LittleEndian(codeAddr);
  344. JITDUMP(" %u", lclNum);
  345. impLoadLoc(lclNum, opcodeOffs + sz + 1);
  346. break;
  347. case CEE_LDLOC_S:
  348. lclNum = getU1LittleEndian(codeAddr);
  349. JITDUMP(" %u", lclNum);
  350. impLoadLoc(lclNum, opcodeOffs + sz + 1);
  351. break;
  352. case CEE_LDLOC_0:
  353. case CEE_LDLOC_1:
  354. case CEE_LDLOC_2:
  355. case CEE_LDLOC_3:
  356. lclNum = (opcode - CEE_LDLOC_0);
  357. assert(lclNum >= 0 && lclNum < 4);
  358. impLoadLoc(lclNum, opcodeOffs + sz + 1);
  359. break;
  360. case CEE_STARG:
  361. lclNum = getU2LittleEndian(codeAddr);
  362. goto STARG;
  363. case CEE_STARG_S:
  364. lclNum = getU1LittleEndian(codeAddr);
  365. STARG:
  366. JITDUMP(" %u", lclNum);
  367. if (tiVerificationNeeded)
  368. {
  369. Verify(lclNum < info.compILargsCount, "bad arg num");
  370. }
  371. if (compIsForInlining())
  372. {
  373. op1 = impInlineFetchArg(lclNum, impInlineInfo->inlArgInfo, impInlineInfo->lclVarInfo);
  374. noway_assert(op1->gtOper == GT_LCL_VAR);
  375. lclNum = op1->AsLclVar()->gtLclNum;
  376. goto VAR_ST_VALID;
  377. }
  378. lclNum = compMapILargNum(lclNum); // account for possible hidden param
  379. assertImp(lclNum < numArgs);
  380. if (lclNum == info.compThisArg)
  381. {
  382. lclNum = lvaArg0Var;
  383. }
  384. lvaTable[lclNum].lvArgWrite = 1;
  385. if (tiVerificationNeeded)
  386. {
  387. typeInfo& tiLclVar = lvaTable[lclNum].lvVerTypeInfo;
  388. Verify(tiCompatibleWith(impStackTop().seTypeInfo, NormaliseForStack(tiLclVar), true),
  389. "type mismatch");
  390. if (verTrackObjCtorInitState && (verCurrentState.thisInitialized != TIS_Init))
  391. {
  392. Verify(!tiLclVar.IsThisPtr(), "storing to uninit this ptr");
  393. }
  394. }
  395. goto VAR_ST;
  396. case CEE_STLOC:
  397. lclNum = getU2LittleEndian(codeAddr);
  398. JITDUMP(" %u", lclNum);
  399. goto LOC_ST;
  400. case CEE_STLOC_S:
  401. lclNum = getU1LittleEndian(codeAddr);
  402. JITDUMP(" %u", lclNum);
  403. goto LOC_ST;
  404. case CEE_STLOC_0:
  405. case CEE_STLOC_1:
  406. case CEE_STLOC_2:
  407. case CEE_STLOC_3:
  408. lclNum = (opcode - CEE_STLOC_0);
  409. assert(lclNum >= 0 && lclNum < 4);
  410. LOC_ST:
  411. if (tiVerificationNeeded)
  412. {
  413. Verify(lclNum < info.compMethodInfo->locals.numArgs, "bad local num");
  414. Verify(tiCompatibleWith(impStackTop().seTypeInfo,
  415. NormaliseForStack(lvaTable[lclNum + numArgs].lvVerTypeInfo), true),
  416. "type mismatch");
  417. }
  418. if (compIsForInlining())
  419. {
  420. lclTyp = impInlineInfo->lclVarInfo[lclNum + impInlineInfo->argCnt].lclTypeInfo;
  421. /* Have we allocated a temp for this local? */
  422. lclNum = impInlineFetchLocal(lclNum DEBUGARG("Inline stloc first use temp"));
  423. goto _PopValue;
  424. }
  425. lclNum += numArgs;
  426. VAR_ST:
  427. if (lclNum >= info.compLocalsCount && lclNum != lvaArg0Var)
  428. {
  429. assert(!tiVerificationNeeded); // We should have thrown the VerificationException before.
  430. BADCODE("Bad IL");
  431. }
  432. VAR_ST_VALID:
  433. /* if it is a struct assignment, make certain we don't overflow the buffer */
  434. assert(lclTyp != TYP_STRUCT || lvaLclSize(lclNum) >= info.compCompHnd->getClassSize(clsHnd));
  435. if (lvaTable[lclNum].lvNormalizeOnLoad())
  436. {
  437. lclTyp = lvaGetRealType(lclNum);
  438. }
  439. else
  440. {
  441. lclTyp = lvaGetActualType(lclNum);
  442. }
  443. _PopValue:
  444. /* Pop the value being assigned */
  445. {
  446. StackEntry se = impPopStack(clsHnd);
  447. op1 = se.val;
  448. tiRetVal = se.seTypeInfo;
  449. }
  450. #ifdef FEATURE_SIMD
  451. if (varTypeIsSIMD(lclTyp) && (lclTyp != op1->TypeGet()))
  452. {
  453. assert(op1->TypeGet() == TYP_STRUCT);
  454. op1->gtType = lclTyp;
  455. }
  456. #endif // FEATURE_SIMD
  457. op1 = impImplicitIorI4Cast(op1, lclTyp);
  458. #ifdef _TARGET_64BIT_
  459. // Downcast the TYP_I_IMPL into a 32-bit Int for x86 JIT compatiblity
  460. if (varTypeIsI(op1->TypeGet()) && (genActualType(lclTyp) == TYP_INT))
  461. {
  462. assert(!tiVerificationNeeded); // We should have thrown the VerificationException before.
  463. op1 = gtNewCastNode(TYP_INT, op1, TYP_INT);
  464. }
  465. #endif // _TARGET_64BIT_
  466. // We had better assign it a value of the correct type
  467. assertImp(
  468. genActualType(lclTyp) == genActualType(op1->gtType) ||
  469. genActualType(lclTyp) == TYP_I_IMPL && op1->IsVarAddr() ||
  470. (genActualType(lclTyp) == TYP_I_IMPL && (op1->gtType == TYP_BYREF || op1->gtType == TYP_REF)) ||
  471. (genActualType(op1->gtType) == TYP_I_IMPL && lclTyp == TYP_BYREF) ||
  472. (varTypeIsFloating(lclTyp) && varTypeIsFloating(op1->TypeGet())) ||
  473. ((genActualType(lclTyp) == TYP_BYREF) && genActualType(op1->TypeGet()) == TYP_REF));
  474. /* If op1 is "&var" then its type is the transient "*" and it can
  475. be used either as TYP_BYREF or TYP_I_IMPL */
  476. if (op1->IsVarAddr())
  477. {
  478. assertImp(genActualType(lclTyp) == TYP_I_IMPL || lclTyp == TYP_BYREF);
  479. /* When "&var" is created, we assume it is a byref. If it is
  480. being assigned to a TYP_I_IMPL var, change the type to
  481. prevent unnecessary GC info */
  482. if (genActualType(lclTyp) == TYP_I_IMPL)
  483. {
  484. op1->gtType = TYP_I_IMPL;
  485. }
  486. }
  487. /* Filter out simple assignments to itself */
  488. if (op1->gtOper == GT_LCL_VAR && lclNum == op1->gtLclVarCommon.gtLclNum)
  489. {
  490. if (insertLdloc)
  491. {
  492. // This is a sequence of (ldloc, dup, stloc). Can simplify
  493. // to (ldloc, stloc). Goto LDVAR to reconstruct the ldloc node.
  494. CLANG_FORMAT_COMMENT_ANCHOR;
  495. #ifdef DEBUG
  496. if (tiVerificationNeeded)
  497. {
  498. assert(
  499. typeInfo::AreEquivalent(tiRetVal, NormaliseForStack(lvaTable[lclNum].lvVerTypeInfo)));
  500. }
  501. #endif
  502. op1 = nullptr;
  503. insertLdloc = false;
  504. impLoadVar(lclNum, opcodeOffs + sz + 1);
  505. break;
  506. }
  507. else if (opts.compDbgCode)
  508. {
  509. op1 = gtNewNothingNode();
  510. goto SPILL_APPEND;
  511. }
  512. else
  513. {
  514. break;
  515. }
  516. }
  517. /* Create the assignment node */
  518. op2 = gtNewLclvNode(lclNum, lclTyp, opcodeOffs + sz + 1);
  519. /* If the local is aliased, we need to spill calls and
  520. indirections from the stack. */
  521. if ((lvaTable[lclNum].lvAddrExposed || lvaTable[lclNum].lvHasLdAddrOp) &&
  522. verCurrentState.esStackDepth > 0)
  523. {
  524. impSpillSideEffects(false, (unsigned)CHECK_SPILL_ALL DEBUGARG("Local could be aliased"));
  525. }
  526. /* Spill any refs to the local from the stack */
  527. impSpillLclRefs(lclNum);
  528. #if !FEATURE_X87_DOUBLES
  529. // We can generate an assignment to a TYP_FLOAT from a TYP_DOUBLE
  530. // We insert a cast to the dest 'op2' type
  531. //
  532. if ((op1->TypeGet() != op2->TypeGet()) && varTypeIsFloating(op1->gtType) &&
  533. varTypeIsFloating(op2->gtType))
  534. {
  535. op1 = gtNewCastNode(op2->TypeGet(), op1, op2->TypeGet());
  536. }
  537. #endif // !FEATURE_X87_DOUBLES
  538. if (varTypeIsStruct(lclTyp))
  539. {
  540. op1 = impAssignStruct(op2, op1, clsHnd, (unsigned)CHECK_SPILL_ALL);
  541. }
  542. else
  543. {
  544. // The code generator generates GC tracking information
  545. // based on the RHS of the assignment. Later the LHS (which is
  546. // is a BYREF) gets used and the emitter checks that that variable
  547. // is being tracked. It is not (since the RHS was an int and did
  548. // not need tracking). To keep this assert happy, we change the RHS
  549. if (lclTyp == TYP_BYREF && !varTypeIsGC(op1->gtType))
  550. {
  551. op1->gtType = TYP_BYREF;
  552. }
  553. op1 = gtNewAssignNode(op2, op1);
  554. }
  555. /* If insertLdloc is true, then we need to insert a ldloc following the
  556. stloc. This is done when converting a (dup, stloc) sequence into
  557. a (stloc, ldloc) sequence. */
  558. if (insertLdloc)
  559. {
  560. // From SPILL_APPEND
  561. impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
  562. #ifdef DEBUG
  563. // From DONE_APPEND
  564. impNoteLastILoffs();
  565. #endif
  566. op1 = nullptr;
  567. insertLdloc = false;
  568. impLoadVar(lclNum, opcodeOffs + sz + 1, tiRetVal);
  569. break;
  570. }
  571. goto SPILL_APPEND;
  572. // 省略了一堆case...
  573. case CEE_NOP:
  574. if (opts.compDbgCode)
  575. {
  576. op1 = new (this, GT_NO_OP) GenTree(GT_NO_OP, TYP_VOID);
  577. goto SPILL_APPEND;
  578. }
  579. break;
  580. /******************************** NYI *******************************/
  581. case 0xCC:
  582. OutputDebugStringA("CLR: Invalid x86 breakpoint in IL stream\n");
  583. case CEE_ILLEGAL:
  584. case CEE_MACRO_END:
  585. default:
  586. BADCODE3("unknown opcode", ": %02X", (int)opcode);
  587. }
  588. codeAddr += sz;
  589. prevOpcode = opcode;
  590. prefixFlags = 0;
  591. assert(!insertLdloc || opcode == CEE_DUP);
  592. }
  593. assert(!insertLdloc);
  594. return;
  595. #undef _impResolveToken
  596. }
  597. #ifdef _PREFAST_
  598. #pragma warning(pop)
  599. #endif

首先codeAddrcodeEndp是block对应的IL的开始和结束地址, opcode是当前地址对应的byte,
ldloc.0为例, 这个指令的二进制是06, 06是opcode CEE_LDLOC_0,
ldc.i4.s 100为例, 这个指令的二进制是1f 64, 1f是opcode CEE_LDC_I4_S, 64是参数也就是100的16进制.
这个函数会用一个循环来解析属于当前block的IL范围内的IL指令, 因为IL指令有很多, 我只能挑几个典型的来解释.

IL指令ldc.i4.s会向运行堆栈推入一个常量int, 常量的范围在1 byte以内, 解析的代码如下:

  1. case CEE_LDC_I4_S:
  2. cval.intVal = getI1LittleEndian(codeAddr);
  3. goto PUSH_I4CON;
  4. case CEE_LDC_I4:
  5. cval.intVal = getI4LittleEndian(codeAddr);
  6. goto PUSH_I4CON;
  7. PUSH_I4CON:
  8. JITDUMP(" %d", cval.intVal);
  9. impPushOnStack(gtNewIconNode(cval.intVal), typeInfo(TI_INT));
  10. break;

我们可以看到它会读取指令后的1 byte(无s的指令会读取4 byte), 然后调用impPushOnStack(gtNewIconNode(cval.intVal), typeInfo(TI_INT)).
gtNewIconNode函数(Icon是int constant的缩写)会创建一个GT_CNS_INT类型的GenTree, 表示int常量的节点.
创建节点后会把这个节点推到运行堆栈里, impPushOnStack的源代码如下:

  1. /*****************************************************************************
  2. *
  3. * Pushes the given tree on the stack.
  4. */
  5. void Compiler::impPushOnStack(GenTreePtr tree, typeInfo ti)
  6. {
  7. /* Check for overflow. If inlining, we may be using a bigger stack */
  8. if ((verCurrentState.esStackDepth >= info.compMaxStack) &&
  9. (verCurrentState.esStackDepth >= impStkSize || ((compCurBB->bbFlags & BBF_IMPORTED) == 0)))
  10. {
  11. BADCODE("stack overflow");
  12. }
  13. #ifdef DEBUG
  14. // If we are pushing a struct, make certain we know the precise type!
  15. if (tree->TypeGet() == TYP_STRUCT)
  16. {
  17. assert(ti.IsType(TI_STRUCT));
  18. CORINFO_CLASS_HANDLE clsHnd = ti.GetClassHandle();
  19. assert(clsHnd != NO_CLASS_HANDLE);
  20. }
  21. if (tiVerificationNeeded && !ti.IsDead())
  22. {
  23. assert(typeInfo::AreEquivalent(NormaliseForStack(ti), ti)); // types are normalized
  24. // The ti type is consistent with the tree type.
  25. //
  26. // On 64-bit systems, nodes whose "proper" type is "native int" get labeled TYP_LONG.
  27. // In the verification type system, we always transform "native int" to "TI_INT".
  28. // Ideally, we would keep track of which nodes labeled "TYP_LONG" are really "native int", but
  29. // attempts to do that have proved too difficult. Instead, we'll assume that in checks like this,
  30. // when there's a mismatch, it's because of this reason -- the typeInfo::AreEquivalentModuloNativeInt
  31. // method used in the last disjunct allows exactly this mismatch.
  32. assert(ti.IsDead() || ti.IsByRef() && (tree->TypeGet() == TYP_I_IMPL || tree->TypeGet() == TYP_BYREF) ||
  33. ti.IsUnboxedGenericTypeVar() && tree->TypeGet() == TYP_REF ||
  34. ti.IsObjRef() && tree->TypeGet() == TYP_REF || ti.IsMethod() && tree->TypeGet() == TYP_I_IMPL ||
  35. ti.IsType(TI_STRUCT) && tree->TypeGet() != TYP_REF ||
  36. typeInfo::AreEquivalentModuloNativeInt(NormaliseForStack(ti),
  37. NormaliseForStack(typeInfo(tree->TypeGet()))));
  38. // If it is a struct type, make certain we normalized the primitive types
  39. assert(!ti.IsType(TI_STRUCT) ||
  40. info.compCompHnd->getTypeForPrimitiveValueClass(ti.GetClassHandle()) == CORINFO_TYPE_UNDEF);
  41. }
  42. #if VERBOSE_VERIFY
  43. if (VERBOSE && tiVerificationNeeded)
  44. {
  45. printf("\n");
  46. printf(TI_DUMP_PADDING);
  47. printf("About to push to stack: ");
  48. ti.Dump();
  49. }
  50. #endif // VERBOSE_VERIFY
  51. #endif // DEBUG
  52. verCurrentState.esStack[verCurrentState.esStackDepth].seTypeInfo = ti;
  53. verCurrentState.esStack[verCurrentState.esStackDepth++].val = tree;
  54. if ((tree->gtType == TYP_LONG) && (compLongUsed == false))
  55. {
  56. compLongUsed = true;
  57. }
  58. else if (((tree->gtType == TYP_FLOAT) || (tree->gtType == TYP_DOUBLE)) && (compFloatingPointUsed == false))
  59. {
  60. compFloatingPointUsed = true;
  61. }
  62. }

impPushOnStack会把GenTree节点添加到运行堆栈verCurrentState.esStack, 包含类型信息和刚才建立的GT_CNS_INT节点.

假设ldc.i4.s 100后面的指令是stloc.0, 表示给本地变量0赋值100, 那么后面的stloc.0指令需要使用前面的值,
我们来看看CEE_STLOC_0是怎么处理的:

  1. case CEE_STLOC_0:
  2. case CEE_STLOC_1:
  3. case CEE_STLOC_2:
  4. case CEE_STLOC_3:
  5. lclNum = (opcode - CEE_STLOC_0);
  6. assert(lclNum >= 0 && lclNum < 4);
  7. LOC_ST:
  8. if (tiVerificationNeeded)
  9. {
  10. Verify(lclNum < info.compMethodInfo->locals.numArgs, "bad local num");
  11. Verify(tiCompatibleWith(impStackTop().seTypeInfo,
  12. NormaliseForStack(lvaTable[lclNum + numArgs].lvVerTypeInfo), true),
  13. "type mismatch");
  14. }
  15. if (compIsForInlining())
  16. {
  17. lclTyp = impInlineInfo->lclVarInfo[lclNum + impInlineInfo->argCnt].lclTypeInfo;
  18. /* Have we allocated a temp for this local? */
  19. lclNum = impInlineFetchLocal(lclNum DEBUGARG("Inline stloc first use temp"));
  20. goto _PopValue;
  21. }
  22. lclNum += numArgs;
  23. VAR_ST:
  24. if (lclNum >= info.compLocalsCount && lclNum != lvaArg0Var)
  25. {
  26. assert(!tiVerificationNeeded); // We should have thrown the VerificationException before.
  27. BADCODE("Bad IL");
  28. }
  29. VAR_ST_VALID:
  30. /* if it is a struct assignment, make certain we don't overflow the buffer */
  31. assert(lclTyp != TYP_STRUCT || lvaLclSize(lclNum) >= info.compCompHnd->getClassSize(clsHnd));
  32. if (lvaTable[lclNum].lvNormalizeOnLoad())
  33. {
  34. lclTyp = lvaGetRealType(lclNum);
  35. }
  36. else
  37. {
  38. lclTyp = lvaGetActualType(lclNum);
  39. }
  40. _PopValue:
  41. /* Pop the value being assigned */
  42. {
  43. StackEntry se = impPopStack(clsHnd);
  44. op1 = se.val;
  45. tiRetVal = se.seTypeInfo;
  46. }
  47. #ifdef FEATURE_SIMD
  48. if (varTypeIsSIMD(lclTyp) && (lclTyp != op1->TypeGet()))
  49. {
  50. assert(op1->TypeGet() == TYP_STRUCT);
  51. op1->gtType = lclTyp;
  52. }
  53. #endif // FEATURE_SIMD
  54. op1 = impImplicitIorI4Cast(op1, lclTyp);
  55. #ifdef _TARGET_64BIT_
  56. // Downcast the TYP_I_IMPL into a 32-bit Int for x86 JIT compatiblity
  57. if (varTypeIsI(op1->TypeGet()) && (genActualType(lclTyp) == TYP_INT))
  58. {
  59. assert(!tiVerificationNeeded); // We should have thrown the VerificationException before.
  60. op1 = gtNewCastNode(TYP_INT, op1, TYP_INT);
  61. }
  62. #endif // _TARGET_64BIT_
  63. // We had better assign it a value of the correct type
  64. assertImp(
  65. genActualType(lclTyp) == genActualType(op1->gtType) ||
  66. genActualType(lclTyp) == TYP_I_IMPL && op1->IsVarAddr() ||
  67. (genActualType(lclTyp) == TYP_I_IMPL && (op1->gtType == TYP_BYREF || op1->gtType == TYP_REF)) ||
  68. (genActualType(op1->gtType) == TYP_I_IMPL && lclTyp == TYP_BYREF) ||
  69. (varTypeIsFloating(lclTyp) && varTypeIsFloating(op1->TypeGet())) ||
  70. ((genActualType(lclTyp) == TYP_BYREF) && genActualType(op1->TypeGet()) == TYP_REF));
  71. /* If op1 is "&var" then its type is the transient "*" and it can
  72. be used either as TYP_BYREF or TYP_I_IMPL */
  73. if (op1->IsVarAddr())
  74. {
  75. assertImp(genActualType(lclTyp) == TYP_I_IMPL || lclTyp == TYP_BYREF);
  76. /* When "&var" is created, we assume it is a byref. If it is
  77. being assigned to a TYP_I_IMPL var, change the type to
  78. prevent unnecessary GC info */
  79. if (genActualType(lclTyp) == TYP_I_IMPL)
  80. {
  81. op1->gtType = TYP_I_IMPL;
  82. }
  83. }
  84. /* Filter out simple assignments to itself */
  85. if (op1->gtOper == GT_LCL_VAR && lclNum == op1->gtLclVarCommon.gtLclNum)
  86. {
  87. if (insertLdloc)
  88. {
  89. // This is a sequence of (ldloc, dup, stloc). Can simplify
  90. // to (ldloc, stloc). Goto LDVAR to reconstruct the ldloc node.
  91. CLANG_FORMAT_COMMENT_ANCHOR;
  92. #ifdef DEBUG
  93. if (tiVerificationNeeded)
  94. {
  95. assert(
  96. typeInfo::AreEquivalent(tiRetVal, NormaliseForStack(lvaTable[lclNum].lvVerTypeInfo)));
  97. }
  98. #endif
  99. op1 = nullptr;
  100. insertLdloc = false;
  101. impLoadVar(lclNum, opcodeOffs + sz + 1);
  102. break;
  103. }
  104. else if (opts.compDbgCode)
  105. {
  106. op1 = gtNewNothingNode();
  107. goto SPILL_APPEND;
  108. }
  109. else
  110. {
  111. break;
  112. }
  113. }
  114. /* Create the assignment node */
  115. op2 = gtNewLclvNode(lclNum, lclTyp, opcodeOffs + sz + 1);
  116. /* If the local is aliased, we need to spill calls and
  117. indirections from the stack. */
  118. if ((lvaTable[lclNum].lvAddrExposed || lvaTable[lclNum].lvHasLdAddrOp) &&
  119. verCurrentState.esStackDepth > 0)
  120. {
  121. impSpillSideEffects(false, (unsigned)CHECK_SPILL_ALL DEBUGARG("Local could be aliased"));
  122. }
  123. /* Spill any refs to the local from the stack */
  124. impSpillLclRefs(lclNum);
  125. #if !FEATURE_X87_DOUBLES
  126. // We can generate an assignment to a TYP_FLOAT from a TYP_DOUBLE
  127. // We insert a cast to the dest 'op2' type
  128. //
  129. if ((op1->TypeGet() != op2->TypeGet()) && varTypeIsFloating(op1->gtType) &&
  130. varTypeIsFloating(op2->gtType))
  131. {
  132. op1 = gtNewCastNode(op2->TypeGet(), op1, op2->TypeGet());
  133. }
  134. #endif // !FEATURE_X87_DOUBLES
  135. if (varTypeIsStruct(lclTyp))
  136. {
  137. op1 = impAssignStruct(op2, op1, clsHnd, (unsigned)CHECK_SPILL_ALL);
  138. }
  139. else
  140. {
  141. // The code generator generates GC tracking information
  142. // based on the RHS of the assignment. Later the LHS (which is
  143. // is a BYREF) gets used and the emitter checks that that variable
  144. // is being tracked. It is not (since the RHS was an int and did
  145. // not need tracking). To keep this assert happy, we change the RHS
  146. if (lclTyp == TYP_BYREF && !varTypeIsGC(op1->gtType))
  147. {
  148. op1->gtType = TYP_BYREF;
  149. }
  150. op1 = gtNewAssignNode(op2, op1);
  151. }
  152. /* If insertLdloc is true, then we need to insert a ldloc following the
  153. stloc. This is done when converting a (dup, stloc) sequence into
  154. a (stloc, ldloc) sequence. */
  155. if (insertLdloc)
  156. {
  157. // From SPILL_APPEND
  158. impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
  159. #ifdef DEBUG
  160. // From DONE_APPEND
  161. impNoteLastILoffs();
  162. #endif
  163. op1 = nullptr;
  164. insertLdloc = false;
  165. impLoadVar(lclNum, opcodeOffs + sz + 1, tiRetVal);
  166. break;
  167. }
  168. goto SPILL_APPEND;
  169. SPILL_APPEND:
  170. /* Append 'op1' to the list of statements */
  171. impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
  172. goto DONE_APPEND;
  173. DONE_APPEND:
  174. #ifdef DEBUG
  175. // Remember at which BC offset the tree was finished
  176. impNoteLastILoffs();
  177. #endif
  178. break;

处理CEE_STLOC_0的代码有点长, 请耐心看:
首先0~3的指令会共用处理, stloc.00a, stloc.10b, stloc.20c, stloc.30d.
得到保存的本地变量序号后还要知道它在本地变量表lvaTable中的索引值是多少, 因为本地变量表开头存的是参数, 所以这里的索引值是lclNum += numArgs.
然后创建赋值(GT_ASG)的节点, 赋值的节点有两个参数, 第一个是lclVar 0, 第二个是const 100(类型一致所以不需要cast), 如下:

  1. /--* const int 100
  2. \--* = int
  3. \--* lclVar int V01

现在我们创建了一颗GenTree树, 这个树是一个单独的语句, 我们可以把这个语句添加到BasicBlock中,
添加到BasicBlock使用的代码是impAppendTree(op1, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs):

  1. /*****************************************************************************
  2. *
  3. * Append the given expression tree to the current block's tree list.
  4. * Return the newly created statement.
  5. */
  6. GenTreePtr Compiler::impAppendTree(GenTreePtr tree, unsigned chkLevel, IL_OFFSETX offset)
  7. {
  8. assert(tree);
  9. /* Allocate an 'expression statement' node */
  10. GenTreePtr expr = gtNewStmt(tree, offset);
  11. /* Append the statement to the current block's stmt list */
  12. impAppendStmt(expr, chkLevel);
  13. return expr;
  14. }
  15. /*****************************************************************************
  16. *
  17. * Append the given GT_STMT node to the current block's tree list.
  18. * [0..chkLevel) is the portion of the stack which we will check for
  19. * interference with stmt and spill if needed.
  20. */
  21. inline void Compiler::impAppendStmt(GenTreePtr stmt, unsigned chkLevel)
  22. {
  23. assert(stmt->gtOper == GT_STMT);
  24. noway_assert(impTreeLast != nullptr);
  25. /* If the statement being appended has any side-effects, check the stack
  26. to see if anything needs to be spilled to preserve correct ordering. */
  27. GenTreePtr expr = stmt->gtStmt.gtStmtExpr;
  28. unsigned flags = expr->gtFlags & GTF_GLOB_EFFECT;
  29. // Assignment to (unaliased) locals don't count as a side-effect as
  30. // we handle them specially using impSpillLclRefs(). Temp locals should
  31. // be fine too.
  32. // TODO-1stClassStructs: The check below should apply equally to struct assignments,
  33. // but previously the block ops were always being marked GTF_GLOB_REF, even if
  34. // the operands could not be global refs.
  35. if ((expr->gtOper == GT_ASG) && (expr->gtOp.gtOp1->gtOper == GT_LCL_VAR) &&
  36. !(expr->gtOp.gtOp1->gtFlags & GTF_GLOB_REF) && !gtHasLocalsWithAddrOp(expr->gtOp.gtOp2) &&
  37. !varTypeIsStruct(expr->gtOp.gtOp1))
  38. {
  39. unsigned op2Flags = expr->gtOp.gtOp2->gtFlags & GTF_GLOB_EFFECT;
  40. assert(flags == (op2Flags | GTF_ASG));
  41. flags = op2Flags;
  42. }
  43. if (chkLevel == (unsigned)CHECK_SPILL_ALL)
  44. {
  45. chkLevel = verCurrentState.esStackDepth;
  46. }
  47. if (chkLevel && chkLevel != (unsigned)CHECK_SPILL_NONE)
  48. {
  49. assert(chkLevel <= verCurrentState.esStackDepth);
  50. if (flags)
  51. {
  52. // If there is a call, we have to spill global refs
  53. bool spillGlobEffects = (flags & GTF_CALL) ? true : false;
  54. if (expr->gtOper == GT_ASG)
  55. {
  56. GenTree* lhs = expr->gtGetOp1();
  57. // If we are assigning to a global ref, we have to spill global refs on stack.
  58. // TODO-1stClassStructs: Previously, spillGlobEffects was set to true for
  59. // GT_INITBLK and GT_COPYBLK, but this is overly conservative, and should be
  60. // revisited. (Note that it was NOT set to true for GT_COPYOBJ.)
  61. if (!expr->OperIsBlkOp())
  62. {
  63. // If we are assigning to a global ref, we have to spill global refs on stack
  64. if ((lhs->gtFlags & GTF_GLOB_REF) != 0)
  65. {
  66. spillGlobEffects = true;
  67. }
  68. }
  69. else if ((lhs->OperIsBlk() && !lhs->AsBlk()->HasGCPtr()) ||
  70. ((lhs->OperGet() == GT_LCL_VAR) &&
  71. (lvaTable[lhs->AsLclVarCommon()->gtLclNum].lvStructGcCount == 0)))
  72. {
  73. spillGlobEffects = true;
  74. }
  75. }
  76. impSpillSideEffects(spillGlobEffects, chkLevel DEBUGARG("impAppendStmt"));
  77. }
  78. else
  79. {
  80. impSpillSpecialSideEff();
  81. }
  82. }
  83. impAppendStmtCheck(stmt, chkLevel);
  84. /* Point 'prev' at the previous node, so that we can walk backwards */
  85. stmt->gtPrev = impTreeLast;
  86. /* Append the expression statement to the list */
  87. impTreeLast->gtNext = stmt;
  88. impTreeLast = stmt;
  89. #ifdef FEATURE_SIMD
  90. impMarkContiguousSIMDFieldAssignments(stmt);
  91. #endif
  92. #ifdef DEBUGGING_SUPPORT
  93. /* Once we set impCurStmtOffs in an appended tree, we are ready to
  94. report the following offsets. So reset impCurStmtOffs */
  95. if (impTreeLast->gtStmt.gtStmtILoffsx == impCurStmtOffs)
  96. {
  97. impCurStmtOffsSet(BAD_IL_OFFSET);
  98. }
  99. #endif
  100. #ifdef DEBUG
  101. if (impLastILoffsStmt == nullptr)
  102. {
  103. impLastILoffsStmt = stmt;
  104. }
  105. if (verbose)
  106. {
  107. printf("\n\n");
  108. gtDispTree(stmt);
  109. }
  110. #endif
  111. }

这段代码会添加一个GT_STMT节点到当前的impTreeLast链表中, 这个链表后面会在impEndTreeList分配给block->bbTreeList.
GT_STMT节点的内容如下:

  1. * stmtExpr void
  2. | /--* const int 100
  3. \--* = int
  4. \--* lclVar int V01

可以看到是把原来的分配节点GT_ASG放到了GT_STMT的下面.
微软提供了一张Compiler, BasicBlock, GenTree的结构图(HIR版):

881857-20171028110118680-735116901.png

这里给出了最简单的两个指令ldc.i4.sstloc.0的解析例子, 有兴趣可以自己分析更多类型的指令.
现在我们可以知道运行堆栈在JIT中用于关联各个指令, 让它们构建成一棵GenTree树, 实际生成的代码将不会有运行堆栈这个概念.

在处理完当前block后, 会添加block的后继blocksuccessors到队列impPendingList中:

  1. for (unsigned i = 0; i < block->NumSucc(); i++)
  2. {
  3. impImportBlockPending(block->GetSucc(i));
  4. }

处理完所有block后, 每个BasicBlock中就有了语句(GT_STMT)的链表, 每条语句下面都会有一个GenTree树.

fgImport的例子如下:

881857-20171028110132289-576127936.jpg

PHASE_POST_IMPORT

这个阶段负责从IL导入HIR(GenTree)后的一些工作, 包含以下的代码:

  1. // Maybe the caller was not interested in generating code
  2. if (compIsForImportOnly())
  3. {
  4. compFunctionTraceEnd(nullptr, 0, false);
  5. return;
  6. }
  7. #if !FEATURE_EH
  8. // If we aren't yet supporting EH in a compiler bring-up, remove as many EH handlers as possible, so
  9. // we can pass tests that contain try/catch EH, but don't actually throw any exceptions.
  10. fgRemoveEH();
  11. #endif // !FEATURE_EH
  12. if (compileFlags->corJitFlags & CORJIT_FLG_BBINSTR)
  13. {
  14. fgInstrumentMethod();
  15. }
  16. // We could allow ESP frames. Just need to reserve space for
  17. // pushing EBP if the method becomes an EBP-frame after an edit.
  18. // Note that requiring a EBP Frame disallows double alignment. Thus if we change this
  19. // we either have to disallow double alignment for E&C some other way or handle it in EETwain.
  20. if (opts.compDbgEnC)
  21. {
  22. codeGen->setFramePointerRequired(true);
  23. // Since we need a slots for security near ebp, its not possible
  24. // to do this after an Edit without shifting all the locals.
  25. // So we just always reserve space for these slots in case an Edit adds them
  26. opts.compNeedSecurityCheck = true;
  27. // We don't care about localloc right now. If we do support it,
  28. // EECodeManager::FixContextForEnC() needs to handle it smartly
  29. // in case the localloc was actually executed.
  30. //
  31. // compLocallocUsed = true;
  32. }
  33. EndPhase(PHASE_POST_IMPORT);

这个阶段负责import之后的一些零碎的处理.
如果只需要检查函数的IL是否合法, 那么编译时会带CORJIT_FLG_IMPORT_ONLY, 在经过import阶段后就不需要再继续了.
fgInstrumentMethod用于插入profiler需要的语句, 这里不详细分析.
opts.compDbgEnC启用时代表编译IL程序集时用的是Debug配置, 这里会标记需要使用frame pointer和需要安全检查.
(x64允许函数不使用rbp寄存器保存进入函数前堆栈地址, 这样可以多出一个空余的寄存器以生成更高效的代码, 但是会让debug更困难)

PHASE_MORPH

因为import阶段只是简单的把IL转换成HIR, 转换出来的HIR还需要进行加工.
这个阶段负责了HIR的加工, 包含以下的代码:

  1. /* Initialize the BlockSet epoch */
  2. NewBasicBlockEpoch();
  3. /* Massage the trees so that we can generate code out of them */
  4. fgMorph();
  5. EndPhase(PHASE_MORPH);

NewBasicBlockEpoch更新了当前BasicBlock集合的epoch(fgCurBBEpoch), 这个值用于标识当前BasicBlock集合的版本.

fgMorph包含了这个阶段主要的处理, 源代码如下:

  1. /*****************************************************************************
  2. *
  3. * Transform all basic blocks for codegen.
  4. */
  5. void Compiler::fgMorph()
  6. {
  7. noway_assert(!compIsForInlining()); // Inlinee's compiler should never reach here.
  8. fgOutgoingArgTemps = nullptr;
  9. #ifdef DEBUG
  10. if (verbose)
  11. {
  12. printf("*************** In fgMorph()\n");
  13. }
  14. if (verboseTrees)
  15. {
  16. fgDispBasicBlocks(true);
  17. }
  18. #endif // DEBUG
  19. // Insert call to class constructor as the first basic block if
  20. // we were asked to do so.
  21. if (info.compCompHnd->initClass(nullptr /* field */, info.compMethodHnd /* method */,
  22. impTokenLookupContextHandle /* context */) &
  23. CORINFO_INITCLASS_USE_HELPER)
  24. {
  25. fgEnsureFirstBBisScratch();
  26. fgInsertStmtAtBeg(fgFirstBB, fgInitThisClass());
  27. }
  28. #ifdef DEBUG
  29. if (opts.compGcChecks)
  30. {
  31. for (unsigned i = 0; i < info.compArgsCount; i++)
  32. {
  33. if (lvaTable[i].TypeGet() == TYP_REF)
  34. {
  35. // confirm that the argument is a GC pointer (for debugging (GC stress))
  36. GenTreePtr op = gtNewLclvNode(i, TYP_REF);
  37. GenTreeArgList* args = gtNewArgList(op);
  38. op = gtNewHelperCallNode(CORINFO_HELP_CHECK_OBJ, TYP_VOID, 0, args);
  39. fgEnsureFirstBBisScratch();
  40. fgInsertStmtAtEnd(fgFirstBB, op);
  41. }
  42. }
  43. }
  44. if (opts.compStackCheckOnRet)
  45. {
  46. lvaReturnEspCheck = lvaGrabTempWithImplicitUse(false DEBUGARG("ReturnEspCheck"));
  47. lvaTable[lvaReturnEspCheck].lvType = TYP_INT;
  48. }
  49. if (opts.compStackCheckOnCall)
  50. {
  51. lvaCallEspCheck = lvaGrabTempWithImplicitUse(false DEBUGARG("CallEspCheck"));
  52. lvaTable[lvaCallEspCheck].lvType = TYP_INT;
  53. }
  54. #endif // DEBUG
  55. /* Filter out unimported BBs */
  56. fgRemoveEmptyBlocks();
  57. /* Add any internal blocks/trees we may need */
  58. fgAddInternal();
  59. #if OPT_BOOL_OPS
  60. fgMultipleNots = false;
  61. #endif
  62. #ifdef DEBUG
  63. /* Inliner could add basic blocks. Check that the flowgraph data is up-to-date */
  64. fgDebugCheckBBlist(false, false);
  65. #endif // DEBUG
  66. /* Inline */
  67. fgInline();
  68. #if 0
  69. JITDUMP("trees after inlining\n");
  70. DBEXEC(VERBOSE, fgDispBasicBlocks(true));
  71. #endif
  72. RecordStateAtEndOfInlining(); // Record "start" values for post-inlining cycles and elapsed time.
  73. #ifdef DEBUG
  74. /* Inliner could add basic blocks. Check that the flowgraph data is up-to-date */
  75. fgDebugCheckBBlist(false, false);
  76. #endif // DEBUG
  77. /* For x64 and ARM64 we need to mark irregular parameters early so that they don't get promoted */
  78. fgMarkImplicitByRefArgs();
  79. /* Promote struct locals if necessary */
  80. fgPromoteStructs();
  81. /* Now it is the time to figure out what locals have address-taken. */
  82. fgMarkAddressExposedLocals();
  83. #ifdef DEBUG
  84. /* Now that locals have address-taken marked, we can safely apply stress. */
  85. lvaStressLclFld();
  86. fgStress64RsltMul();
  87. #endif // DEBUG
  88. /* Morph the trees in all the blocks of the method */
  89. fgMorphBlocks();
  90. #if 0
  91. JITDUMP("trees after fgMorphBlocks\n");
  92. DBEXEC(VERBOSE, fgDispBasicBlocks(true));
  93. #endif
  94. /* Decide the kind of code we want to generate */
  95. fgSetOptions();
  96. fgExpandQmarkNodes();
  97. #ifdef DEBUG
  98. compCurBB = nullptr;
  99. #endif // DEBUG
  100. }

函数中的处理如下:

fgInsertStmtAtBeg(fgFirstBB, fgInitThisClass());
如果类型需要动态初始化(泛型并且有静态构造函数), 在第一个block插入调用JIT_ClassInitDynamicClass的代码

fgRemoveEmptyBlocks
枚举所有未import(也就是说这个block中的代码无法到达)的block,
如果有则更新block的序号和epoch.

fgAddInternal:
添加内部的BasicBlock和GenTree.
首先如果函数不是静态的, 且this变量需要传出地址(ref)或者修改, 则需要一个内部的本地变量(lvaArg0Var)储存this的值.
如果函数需要安全检查(compNeedSecurityCheck), 则添加一个临时变量(lvaSecurityObject).
如果当前平台不是x86(32位), 则为同步方法生成代码, 进入时调用JIT_MonEnterWorker, 退出时调用JIT_MonExitWorker.
判断是否要只生成一个return block(例如包含pinvoke的函数, 调用了非托管代码的函数, 或者同步函数),
如果需要只生成一个return block, 则添加一个合并用的BasicBlock和储存返回值用的本地变量, 这里还不会把其他return block重定向到新block.
如果函数有调用非托管函数, 则添加一个临时变量(lvaInlinedPInvokeFrameVar).
如果启用了JustMyCode, 则添加if (*pFlag != 0) { JIT_DbgIsJustMyCode() }到第一个block, 注意这里的节点是QMARK(?:).
如果tiRuntimeCalloutNeeded成立则添加verificationRuntimeCheck(MethodHnd)到第一个block.

fgInline
这个函数负责内联函数中的call,
虽然在微软的文档和我前一篇文章都把inline当作一个单独的阶段, 但在coreclr内部inline是属于PHASE_MORPH的.
首先会创建一个根内联上下文(rootContext), 然后把它分配到当前的所有语句(stmt)节点中, 内联上下文用于标记语句来源于哪里和组织一个树结构.
然后枚举所有语句(stmt), 判断是否call并且是内联候选(GTF_CALL_INLINE_CANDIDATE), 如果是则尝试内联(fgMorphCallInline).

前面的PHASE_IMPORTATION在导入call时会判断是否内联候选(impMarkInlineCandidate), 判断的条件包含:
注意以下条件不一定正确, 可能会根据clr的版本或者运行环境(设置的内联策略)不同而不同.

  • 未开启优化时不内联
  • 函数是尾调用则不内联
  • 函数的gtFlags & GTF_CALL_VIRT_KIND_MASK不等于GTF_CALL_NONVIRT时不内联
  • 函数是helper call时不内联
  • 函数是indirect call时不内联
  • 环境设置了COMPlus_AggressiveInlining时, 设置 CORINFO_FLG_FORCEINLINE
  • 未设置CORINFO_FLG_FORCEINLINE且函数在catch或者filter中时不内联
  • 之前尝试内联失败, 标记了CORINFO_FLG_DONT_INLINE时不内联
  • 同步函数(CORINFO_FLG_SYNCH)不内联
  • 函数需要安全检查(CORINFO_FLG_SECURITYCHECK)则不内联
  • 如果函数有例外处理器则不内联
  • 函数无内容(大小=0)则不内联
  • 函数参数是vararg时不内联
  • methodInfo中的本地变量数量大于MAX_INL_LCLS(32)时不内联
  • methodInfo中的参数数量大于MAX_INL_LCLS时不内联
  • 判断IL代码大小
    • 如果codesize <= CALLEE_IL_CODE_SIZE(16)则标记CALLEE_BELOW_ALWAYS_INLINE_SIZE
    • 如果force inline则标记CALLEE_IS_FORCE_INLINE(例如标记了MethodImpl属性)
    • 如果codesize <= DEFAULT_MAX_INLINE_SIZE(100)则标记CALLEE_IS_DISCRETIONARY_INLINE, 后面根据利益判断
    • 标记CALLEE_TOO_MUCH_IL, 表示代码过长不内联
  • 尝试初始化函数所在的class
    • 如果函数属于generic definition, 则不能内联
    • 如果类型需要在访问任何字段前初始化(IsBeforeFieldInit), 则不能内联
    • 如果未满足其他early out条件, 尝试了初始化class, 且失败了则不能内联
  • 其他判断
    • Boundary method的定义:
      • 会创建StackCrawlMark查找它的caller的函数
      • 调用满足以上条件的函数的函数 (标记为IsMdRequireSecObject)
      • 调用虚方法的函数 (虚方法可能满足以上的条件)
    • 调用Boundary method的函数不内联
    • 如果caller和callee的grant set或refuse set不一致则不内联
    • 判断是否跨程序集
      • 同一程序集的则判断可内联
      • 不同程序集时, 要求以下任意一项成立
        • caller是full trust, refused set为空
        • appdomain的IsHomogenous成立, 且caller和callee的refused set都为空
      • 如果callee和caller所在的module不一样, 且callee的string pool基于module
        • 则标记dwRestrictions |= INLINE_NO_CALLEE_LDSTR (callee中不能有ldstr)

以上条件都满足了就会标记call为内联候选, 并实际尝试内联(fgMorphCallInline), 尝试内联的步骤如下:

  • 检测是否相互内联(a inline b, b inline a), 如果是则标记内联失败
  • 通过内联上下文检测内联层数是否过多, 超过DEFAULT_MAX_INLINE_DEPTH(20)则标记内联失败
  • 针对callee调用jitNativeCode, 导入的BasicBlock和GenTree会在InlineeCompiler
    • 针对inline函数的利益分析(DetermineProfitability)将会在这里进行(fgFindJumpTargets), 如果判断不值得内联则会返回失败
    • DetermineProfitability的算法:
      • m_CalleeNativeSizeEstimate = DetermineNativeSizeEstimate() // 使用statemachine估算的机器代码大小
      • m_CallsiteNativeSizeEstimate = DetermineCallsiteNativeSizeEstimate(methodInfo) // 估算调用此函数的指令花费的机器代码大小
      • m_Multiplier = DetermineMultiplier() // 系数, 值越大越容易内联, 详见DetermineMultiplier
      • threshold = (int)(m_CallsiteNativeSizeEstimate * m_Multiplier) // 阈值
      • 如果 m_CalleeNativeSizeEstimate > threshold 则设置不内联, 也就是callee的机器代码越大则越不容易内联, 系数越大则越容易内联
    • 内联最多处理到PHASE_IMPORTATION, 可以参考上面compCompile函数的代码

如果编译callee成功, 并且是否内联的判断也通过则可以把callee中的HIR嵌入到caller的HIR中:

  • 如果InlineeCompiler中只有一个BasicBlock, 把该BasicBlock中的所有stmt插入到原stmt后面, 标记原来的stmt为空
  • 如果InlineeCompiler中有多个BasicBlock
    • 按原stmt的位置分割所在的BasicBlock到topBlock和bottomBlock
    • 插入callee的BasicBlock到topBlock和bottomBlock 之间
    • 标记原stmt为空, 原stmt还在topBlock中
  • 原stmt下的call会被替换为inline后的返回表达式

如果编译callee失败, 或者是否内联的判断不通过, 则需要恢复被修改的状态:

  • 清理新创建的本地变量, 恢复原有的本地变量数量(lvaCount)
  • 如果调用结果不是void
    • 把stmt中的expr设为空, 原来的stmt仍会被retExpr引用, 后面会替换回来
  • 取消原expr(call)的内联候选标记(GTF_CALL_INLINE_CANDIDATE)

到最后会再一次的遍历函数中引用了返回结果(retExpr)的树, 如果内联成功则替换节点到lclVar或者lclFld.

fgMarkImplicitByRefArgs

标记本地变量非标准大小的结构体为BY_REF, 标记后结构体将不能被promote.
结构体的promote简单的来说就是把结构体中的字段当作一个单独的本地变量,
例如struct X { int a; int b; int c },
如果有本地变量X x, 则可以替换这个变量为三个本地变量int a; int b; int c;.
在x86下非标准大小是3, 5, 6, 7, >8, arm64下是>16.

fgPromoteStructs

提升struct中的变量作为本地变量.
首先遍历本地变量中的struct变量, 判断是否应该提升, 依据包括(可能根据环境不同而不同):

  • 如果本地变量总计有512个以上则不提升
  • 如果变量在SIMD指令中使用则不提升
  • 如果变量是HFA(homogeneous floating-point aggregate)类型则不提升
  • 如果struct大小比sizeof(double) * 4更大则不提升
  • 如果struct有4个以上的字段则不提升
  • 如果struct有字段地址是重叠的(例如union)则不提升
  • 如果struct有自定义layout并且是HFA类型则不提升
  • 如果struct包含非primitive类型的字段则不提升
  • 如果struct包含有特殊对齐的字段(fldOffset % fldSize) != 0)则不提升

如果判断应该提升, 则会添加struct的所有成员到本地变量表(lvaTable)中,
原来的struct变量仍然会保留, 新添加的本地变量的lvParentLcl会指向原来的struct变量.

fgMarkAddressExposedLocals

标记所有地址被导出(通过ref传给了其他函数, 或者设到了全局变量)的本地变量, 这些本地变量将不能优化到寄存器中.
同时遍历GenTree, 如果节点类型是GT_FIELD并且对应的struct变量已经promoted, 则修改节点为lclVar.

fgMorphBlocks

这个函数也是个大头, 里面包含了各种对GenTree的变形处理, 因为处理相当多这里我只列出几个.
更多的处理可以参考我的JIT笔记.

断言创建(optAssertionGen)

根据一些GenTree模式可以创建断言(Assertion), 例如a = 1后可以断言a的值是1, b.abc()后可以断言b不等于null(已经检查过一次null).
断言可以用于优化代码, 例如归并节点, 减少null检查和减少边界检查.

断言传播(optAssertionProp)

根据创建的断言可以进行优化, 例如确定本地变量等于常量时修改为该常量, 确定对象不为null时标记不需要null检查等.
PHASE_MORPH阶段optAssertionProp只能做一些简单的优化,
后面创建了SSA和VN以后的PHASE_ASSERTION_PROP_MAIN阶段会再次调用这个函数进行更多优化.

转换部分cpu不支持的操作到函数调用

例如在32位上对long(64bit)做除法时, 因为cpu不支持, 需要转换为jit helper call.

添加隐式抛出例外的BasicBlock

如果代码中需要检查数值是否溢出或者数组是否越界访问, 则需要添加一个抛出例外的BasicBlock.
同一种类型的例外只会添加一个BasicBlock.
注意针对null的检查不会添加BasicBlock, null检查的实现机制是硬件异常, 详见之前的文章.

转换到效率更高的等价模式

一些模式, 例如x+产量1==常量2可以转换为x==常量2-常量1=>x==常量3, 转换后可以减少计算的步骤.
其他会转换的模式还包括:

  • x >= y == 0 => x < y
  • x >= 1 => x > 0 (x是int)
  • x < 1 => x <= 0 (x是int)
  • (x+常量1)+(y+常量2) => (x+y)+常量3
  • x + 0 => x
  • 等等

fgSetOptions

这个函数用于设置CodeGen(生成机器代码的模块)使用的选项, 包括:

  • genInterruptible: 是否生成完全可中断的代码, 用于debugger
  • setFramePointerRequired: 是否要求保存frame pointer(rbp)
  • setFramePointerRequiredEH: EH表有内容时要求frame pointer, 变量跟上面一样
  • setFramePointerRequiredGCInfo: 如果参数太多, 要安全检查或者有动态长度参数则要求frame pointer, 同上

fgExpandQmarkNodes

这个函数用于分解QMark节点, QMark其实就是三元表达式, 例如x?123:321.
本来这样的判断会分为三个BasicBlock, 但前面为了方便就使用了QMark节点而不去修改BasicBlock.
这个函数会查找树中的QMark节点, 转换为jTrue和添加BasicBlock.

如果函数中有unsafe buffer, 则会添加一个内部变量(GS Cookie)来检测是否发生栈溢出.
这个阶段负责了添加内部变量和添加设置内部变量的值的语句, 包含以下的代码:

  1. /* GS security checks for unsafe buffers */
  2. if (getNeedsGSSecurityCookie())
  3. {
  4. #ifdef DEBUG
  5. if (verbose)
  6. {
  7. printf("\n*************** -GS checks for unsafe buffers \n");
  8. }
  9. #endif
  10. gsGSChecksInitCookie();
  11. if (compGSReorderStackLayout)
  12. {
  13. gsCopyShadowParams();
  14. }
  15. #ifdef DEBUG
  16. if (verbose)
  17. {
  18. fgDispBasicBlocks(true);
  19. printf("\n");
  20. }
  21. #endif
  22. }
  23. EndPhase(PHASE_GS_COOKIE);

gsGSChecksInitCookie函数添加了一个新的本地变量(GS Cookie), 它的值是一个magic number, 在linux上它的值会是程序启动时的GetTickCount().
后面CodeGen会在函数返回前检查GS Cookie的值, 如果和预设的magic number不一致则调用CORINFO_HELP_FAIL_FAST函数.

PHASE_COMPUTE_PREDS

因为前面的morph阶段可能会添加新的BasicBlock(内联或者QMark),
这个阶段会重新分配BasicBlock的序号并且计算preds(前继block), 包含以下的代码:

  1. /* Compute bbNum, bbRefs and bbPreds */
  2. JITDUMP("\nRenumbering the basic blocks for fgComputePred\n");
  3. fgRenumberBlocks();
  4. noway_assert(!fgComputePredsDone); // This is the first time full (not cheap) preds will be computed.
  5. fgComputePreds();
  6. EndPhase(PHASE_COMPUTE_PREDS);

fgRenumberBlocks函数对block的序号进行重排, 序号从1开始递增.
fgComputePreds函数会重新计算各个block的preds(前继block), 关于preds可以参考前一篇文章中对Flowgraph Analysis的说明.

fgComputePreds的算法如下:

  • 枚举 BasicBlock
    • block->bbRefs = 0
  • 调用 fgRemovePreds, 删除所有 BasicBlock 的 bbPreds
  • 设置第一个 BasicBlock 的 fgFirstBB->bbRefs = 1
  • 枚举 BasicBlock
    • 如果类型是 BBJ_LEAVE, BBJ_COND, BBJ_ALWAYS, BBJ_EHCATCHRET
      • 调用 fgAddRefPred(block->bbJumpDest, block, nullptr, true)
    • 如果类型是 BBJ_NONE
      • 调用 fgAddRefPred(block->bbNext, block, nullptr, true)
    • 如果类型是 BBJ_EHFILTERRET
      • 调用 fgAddRefPred(block->bbJumpDest, block, nullptr, true)
    • 如果类型是 BBJ_EHFINALLYRET
      • 查找调用 finally funclet 的 block, 如果找到则 (调用完以后返回到bcall->bbNext)
      • fgAddRefPred(bcall->bbNext, block, nullptr, true)
    • 如果类型是 BBJ_THROW, BBJ_RETURN
      • 不做处理
    • 如果类型是 BBJ_SWITCH
      • 设置所有跳转目标的 fgAddRefPred(*jumpTab, block, nullptr, true)

PHASE_MARK_GC_POLL_BLOCKS

这个阶段判断哪些block需要检查是否正在运行gc, 包含以下的代码:

  1. /* If we need to emit GC Poll calls, mark the blocks that need them now. This is conservative and can
  2. * be optimized later. */
  3. fgMarkGCPollBlocks();
  4. EndPhase(PHASE_MARK_GC_POLL_BLOCKS);

fgMarkGCPollBlocks函数会枚举BasicBlock,
如果block会向前跳转(例如循环), 或者block是返回block, 则标记block->bbFlags |= BBF_NEEDS_GCPOLL.
标记了BBF_NEEDS_GCPOLL的block会在后面插入调用CORINFO_HELP_POLL_GC函数的代码, 用于在运行gc时暂停当前的线程.

PHASE_COMPUTE_EDGE_WEIGHTS

这个阶段会计算block和block edge的权重(weight), 包含以下的代码:

  1. /* From this point on the flowgraph information such as bbNum,
  2. * bbRefs or bbPreds has to be kept updated */
  3. // Compute the edge weights (if we have profile data)
  4. fgComputeEdgeWeights();
  5. EndPhase(PHASE_COMPUTE_EDGE_WEIGHTS);

block的权重(BasicBlock::bbWeight)用于表示block中的代码是否容易被运行, 权重值默认为1, 越高代表block越容易被运行.
edge是一个表示block之间的跳转的术语, edge weight越大表示两个block之间的跳转越容易发生,
edge weight保存在BasicBlock::bbPreds链表的元素(类型是flowList)中, 分别有两个值flEdgeWeightMinflEdgeWeightMax.
edge weight的计算非常复杂, 具体请看fgAddRefPredfgComputeEdgeWeights函数.
对于较少运行block会标记BBF_RUN_RARELY,
这个标记会在后面用于分析哪些block是热(hot)的, 哪些block是冷(cold)的, cold block有可能会排到后面并使用不同的heap块.

PHASE_CREATE_FUNCLETS

这个阶段会为例外处理器(例如catch和finally)创建小函数(funclet), 包含以下的代码:

  1. #if FEATURE_EH_FUNCLETS
  2. /* Create funclets from the EH handlers. */
  3. fgCreateFunclets();
  4. EndPhase(PHASE_CREATE_FUNCLETS);
  5. #endif // FEATURE_EH_FUNCLETS

小函数(funclet)是x64(64位)上调用例外处理器的方式, x86(32位)上不会采用这种方式.
例如代码是:

  1. int x = GetX();
  2. try {
  3. Console.WriteLine(x);
  4. throw new Exception("abc");
  5. } catch (Exception ex) {
  6. Console.WriteLine(ex);
  7. Console.WriteLine(x);
  8. }

在x64上会生成以下的汇编代码:

  1. 生成的主函数
  2. 00007FFF0FEC0480 55 push rbp // 备份原rbp
  3. 00007FFF0FEC0481 56 push rsi // 备份原rsi
  4. 00007FFF0FEC0482 48 83 EC 38 sub rsp,38h // 预留本地变量空间, 大小0x38
  5. 00007FFF0FEC0486 48 8D 6C 24 40 lea rbp,[rsp+40h] // rbp等于push rbp之前rsp的地址(0x38+0x8)
  6. 00007FFF0FEC048B 48 89 65 E0 mov qword ptr [rbp-20h],rsp // 保存预留本地变量后的rsp, 到本地变量[rbp-0x20], 也就是PSPSym
  7. 00007FFF0FEC048F E8 24 FC FF FF call 00007FFF0FEC00B8 // 调用GetX()
  8. 00007FFF0FEC0494 89 45 F4 mov dword ptr [rbp-0Ch],eax // 返回结果存本地变量[rbp-0x0c], 也就是x
  9. 185: try {
  10. 186: Console.WriteLine(x);
  11. 00007FFF0FEC0497 8B 4D F4 mov ecx,dword ptr [rbp-0Ch] // x => 第一个参数
  12. 00007FFF0FEC049A E8 B9 FE FF FF call 00007FFF0FEC0358 // 调用Console.WriteLine
  13. 187: throw new Exception("abc");
  14. 00007FFF0FEC049F 48 B9 B8 58 6C 6E FF 7F 00 00 mov rcx,7FFF6E6C58B8h // Exception的MethodTable => 第一个参数
  15. 00007FFF0FEC04A9 E8 A2 35 B1 5F call 00007FFF6F9D3A50 // 调用CORINFO_HELP_NEWFAST(JIT_New, 或汇编版本)
  16. 00007FFF0FEC04AE 48 8B F0 mov rsi,rax // 例外对象存rsi
  17. 00007FFF0FEC04B1 B9 12 02 00 00 mov ecx,212h // rid => 第一个参数
  18. 00007FFF0FEC04B6 48 BA 78 4D D6 0F FF 7F 00 00 mov rdx,7FFF0FD64D78h // module handle => 第二个参数
  19. 00007FFF0FEC04C0 E8 6B 20 AF 5F call 00007FFF6F9B2530 // 调用CORINFO_HELP_STRCNS(JIT_StrCns), 用于lazy load字符串常量对象
  20. 00007FFF0FEC04C5 48 8B D0 mov rdx,rax // 常量字符串对象 => 第二个参数
  21. 00007FFF0FEC04C8 48 8B CE mov rcx,rsi // 例外对象 => 第一个参数
  22. 00007FFF0FEC04CB E8 20 07 43 5E call 00007FFF6E2F0BF0 // 调用System.Exception:.ctor
  23. 00007FFF0FEC04D0 48 8B CE mov rcx,rsi // 例外对象 => 第一个参数
  24. 00007FFF0FEC04D3 E8 48 FC A0 5F call 00007FFF6F8D0120 // 调用CORINFO_HELP_THROW(IL_Throw)
  25. 00007FFF0FEC04D8 CC int 3 // unreachable
  26. 00007FFF0FEC04D9 48 8D 65 F8 lea rsp,[rbp-8] // 恢复到备份rbp和rsi后的地址
  27. 00007FFF0FEC04DD 5E pop rsi // 恢复rsi
  28. 00007FFF0FEC04DE 5D pop rbp // 恢复rbp
  29. 00007FFF0FEC04DF C3 ret
  30. 生成的funclet
  31. 00007FFF0FEC04E0 55 push rbp // 备份rbp
  32. 00007FFF0FEC04E1 56 push rsi // 备份rsi
  33. 00007FFF0FEC04E2 48 83 EC 28 sub rsp,28h // 本地的rsp预留0x28(PSP slot 0x8 + Outgoing arg space 0x20(如果funclet会调用其他函数))
  34. 00007FFF0FEC04E6 48 8B 69 20 mov rbp,qword ptr [rcx+20h] // rcx是InitialSP(预留本地变量后的rsp)
  35. // 原函数的rbp跟rsp差40, 所以[InitialSP+20h]等于[rbp-20h], 也就是PSPSym
  36. // 这个例子中因为只有一层, PSPSym里面保存的值跟传入的rcx一样(InitialSP)
  37. 00007FFF0FEC04EA 48 89 6C 24 20 mov qword ptr [rsp+20h],rbp // 复制PSPSym到funclet自己的frame
  38. 00007FFF0FEC04EF 48 8D 6D 40 lea rbp,[rbp+40h] // 原函数的rbp跟rsp差40, 计算得出原函数的rbp
  39. 188: } catch (Exception ex) {
  40. 189: Console.WriteLine(ex);
  41. 00007FFF0FEC04F3 48 8B CA mov rcx,rdx // rdx例外对象, 移动到第一个参数
  42. 00007FFF0FEC04F6 E8 7D FE FF FF call 00007FFF0FEC0378 // 调用Console.WriteLine
  43. 190: Console.WriteLine(x);
  44. 00007FFF0FEC04FB 8B 4D F4 mov ecx,dword ptr [rbp-0Ch] // [rbp-0xc]就是变量x, 移动到第一个参数
  45. 00007FFF0FEC04FE E8 55 FE FF FF call 00007FFF0FEC0358 // 调用Console.WriteLine
  46. 00007FFF0FEC0503 48 8D 05 CF FF FF FF lea rax,[7FFF0FEC04D9h] // 恢复执行的地址
  47. 00007FFF0FEC050A 48 83 C4 28 add rsp,28h // 释放本地的rsp预留的空间
  48. 00007FFF0FEC050E 5E pop rsi // 恢复rsi
  49. 00007FFF0FEC050F 5D pop rbp // 恢复rbp
  50. 00007FFF0FEC0510 C3 ret

我们可以看到在x64上实质上会为例外处理器单独生成一个小函数(00007FFF0FEC04E0~00007FFF0FEC0510),
发生例外时将会调用这个小函数进行处理, 处理完返回主函数.

fgCreateFunclets负责创建funclet, 源代码如下:

  1. /*****************************************************************************
  2. *
  3. * Function to create funclets out of all EH catch/finally/fault blocks.
  4. * We only move filter and handler blocks, not try blocks.
  5. */
  6. void Compiler::fgCreateFunclets()
  7. {
  8. assert(!fgFuncletsCreated);
  9. #ifdef DEBUG
  10. if (verbose)
  11. {
  12. printf("*************** In fgCreateFunclets()\n");
  13. }
  14. #endif
  15. fgCreateFuncletPrologBlocks();
  16. unsigned XTnum;
  17. EHblkDsc* HBtab;
  18. const unsigned int funcCnt = ehFuncletCount() + 1;
  19. if (!FitsIn<unsigned short>(funcCnt))
  20. {
  21. IMPL_LIMITATION("Too many funclets");
  22. }
  23. FuncInfoDsc* funcInfo = new (this, CMK_BasicBlock) FuncInfoDsc[funcCnt];
  24. unsigned short funcIdx;
  25. // Setup the root FuncInfoDsc and prepare to start associating
  26. // FuncInfoDsc's with their corresponding EH region
  27. memset((void*)funcInfo, 0, funcCnt * sizeof(FuncInfoDsc));
  28. assert(funcInfo[0].funKind == FUNC_ROOT);
  29. funcIdx = 1;
  30. // Because we iterate from the top to the bottom of the compHndBBtab array, we are iterating
  31. // from most nested (innermost) to least nested (outermost) EH region. It would be reasonable
  32. // to iterate in the opposite order, but the order of funclets shouldn't matter.
  33. //
  34. // We move every handler region to the end of the function: each handler will become a funclet.
  35. //
  36. // Note that fgRelocateEHRange() can add new entries to the EH table. However, they will always
  37. // be added *after* the current index, so our iteration here is not invalidated.
  38. // It *can* invalidate the compHndBBtab pointer itself, though, if it gets reallocated!
  39. for (XTnum = 0; XTnum < compHndBBtabCount; XTnum++)
  40. {
  41. HBtab = ehGetDsc(XTnum); // must re-compute this every loop, since fgRelocateEHRange changes the table
  42. if (HBtab->HasFilter())
  43. {
  44. assert(funcIdx < funcCnt);
  45. funcInfo[funcIdx].funKind = FUNC_FILTER;
  46. funcInfo[funcIdx].funEHIndex = (unsigned short)XTnum;
  47. funcIdx++;
  48. }
  49. assert(funcIdx < funcCnt);
  50. funcInfo[funcIdx].funKind = FUNC_HANDLER;
  51. funcInfo[funcIdx].funEHIndex = (unsigned short)XTnum;
  52. HBtab->ebdFuncIndex = funcIdx;
  53. funcIdx++;
  54. fgRelocateEHRange(XTnum, FG_RELOCATE_HANDLER);
  55. }
  56. // We better have populated all of them by now
  57. assert(funcIdx == funcCnt);
  58. // Publish
  59. compCurrFuncIdx = 0;
  60. compFuncInfos = funcInfo;
  61. compFuncInfoCount = (unsigned short)funcCnt;
  62. fgFuncletsCreated = true;
  63. #if DEBUG
  64. if (verbose)
  65. {
  66. JITDUMP("\nAfter fgCreateFunclets()");
  67. fgDispBasicBlocks();
  68. fgDispHandlerTab();
  69. }
  70. fgVerifyHandlerTab();
  71. fgDebugCheckBBlist();
  72. #endif // DEBUG
  73. }

首先fgCreateFuncletPrologBlocks函数枚举EH表,
如果handler对应的第一个block可能从handler中的其他block跳转(第一个block在循环中),
那么这个block可能会运行多次, funclet的prolog代码将不能插入到这个block, 遇到这种情况需要在handler的第一个block前插入一个新的block.
然后分配一个保存函数信息的数组保存到compFuncInfos, 第0个元素是主函数, 后面的元素都是funclet.
最后枚举EH表, 填充compFuncInfos中的元素, 并且调用fgRelocateEHRange函数.

fgRelocateEHRange函数把handler范围内的block移动到BasicBlock列表的最后面, CodeGen生成代码时也会遵从这个布局, 把funclet生成在主函数的后面.
例如移动前的block是这样的:

  1. -------------------------------------------------------------------------------------------------------------------------------------
  2. BBnum descAddr ref try hnd preds weight [IL range] [jump] [EH region] [flags]
  3. -------------------------------------------------------------------------------------------------------------------------------------
  4. BB01 [000000000137BC60] 1 1 [000..006) i label target gcsafe
  5. BB02 [000000000137BD70] 1 0 BB01 0 [006..017) (throw ) T0 try { } keep i try rare label gcsafe newobj
  6. BB03 [000000000137BE80] 0 0 1 [017..024)-> BB04 ( cret ) H0 catch { } keep i label target gcsafe
  7. BB04 [000000000137BF90] 1 BB03 1 [024..025) (return) i label target
  8. -------------------------------------------------------------------------------------------------------------------------------------

移动后就会变成这样:

  1. -------------------------------------------------------------------------------------------------------------------------------------
  2. BBnum descAddr ref try hnd preds weight [IL range] [jump] [EH region] [flags]
  3. -------------------------------------------------------------------------------------------------------------------------------------
  4. BB01 [000000000137BC60] 1 1 [000..006) i label target gcsafe
  5. BB02 [000000000137BD70] 1 0 BB01 0 [006..017) (throw ) T0 try { } keep i try rare label gcsafe newobj
  6. BB04 [000000000137BF90] 1 BB03 1 [024..025) (return) i label target
  7. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
  8. BB03 [000000000137BE80] 0 0 1 [017..024)-> BB04 ( cret ) H0 F catch { } keep i label target gcsafe flet
  9. -------------------------------------------------------------------------------------------------------------------------------------

PHASE_OPTIMIZE_LAYOUT

这个阶段会优化BasicBlock的布局(顺序), 包含以下的代码:

  1. if (!opts.MinOpts() && !opts.compDbgCode)
  2. {
  3. optOptimizeLayout();
  4. EndPhase(PHASE_OPTIMIZE_LAYOUT);
  5. // Compute reachability sets and dominators.
  6. fgComputeReachability();
  7. }

optOptimizeLayout的源代码如下:

  1. /*****************************************************************************
  2. *
  3. * Optimize the BasicBlock layout of the method
  4. */
  5. void Compiler::optOptimizeLayout()
  6. {
  7. noway_assert(!opts.MinOpts() && !opts.compDbgCode);
  8. #ifdef DEBUG
  9. if (verbose)
  10. {
  11. printf("*************** In optOptimizeLayout()\n");
  12. fgDispHandlerTab();
  13. }
  14. /* Check that the flowgraph data (bbNum, bbRefs, bbPreds) is up-to-date */
  15. fgDebugCheckBBlist();
  16. #endif
  17. noway_assert(fgModified == false);
  18. for (BasicBlock* block = fgFirstBB; block; block = block->bbNext)
  19. {
  20. /* Make sure the appropriate fields are initialized */
  21. if (block->bbWeight == BB_ZERO_WEIGHT)
  22. {
  23. /* Zero weighted block can't have a LOOP_HEAD flag */
  24. noway_assert(block->isLoopHead() == false);
  25. continue;
  26. }
  27. assert(block->bbLoopNum == 0);
  28. if (compCodeOpt() != SMALL_CODE)
  29. {
  30. /* Optimize "while(cond){}" loops to "cond; do{}while(cond);" */
  31. fgOptWhileLoop(block);
  32. }
  33. }
  34. if (fgModified)
  35. {
  36. // Recompute the edge weight if we have modified the flow graph in fgOptWhileLoop
  37. fgComputeEdgeWeights();
  38. }
  39. fgUpdateFlowGraph(true);
  40. fgReorderBlocks();
  41. fgUpdateFlowGraph();
  42. }

fgOptWhileLoop函数会优化while结构, 如里面的注释,
优化前的结构如下:

  1. jmp test
  2. loop:
  3. ...
  4. ...
  5. test:
  6. cond
  7. jtrue loop

优化后的结构如下, 加了一个事前的检测:

  1. cond
  2. jfalse done
  3. // else fall-through
  4. loop:
  5. ...
  6. ...
  7. test:
  8. cond
  9. jtrue loop
  10. done:

如果fgOptWhileLoop有更新则调用fgComputeEdgeWeights重新计算权重.

fgUpdateFlowGraph函数会删除空block, 无法到达的block和多余的跳转.
如果传给fgUpdateFlowGraph的参数doTailDuplication是true还会执行以下的优化:
优化前的代码:

  1. block:
  2. jmp target
  3. target:
  4. cond
  5. jtrue succ
  6. fallthrough:
  7. ...
  8. succ:
  9. ...

优化后的代码:
优化后target可能会变得多余, 所以下面还会执行一次参数是false的fgUpdateFlowGraph来删除它.

  1. block:
  2. cond
  3. jtrue succ
  4. new:
  5. jmp fallthrough
  6. target:
  7. cond
  8. jtrue succ
  9. fallthrough:
  10. ...
  11. succ:
  12. ...

fgReorderBlocks函数根据之前计算的权重(bbWeight)把比较少运行的block排到后面, 到后面这些block可能会变成cold code并且与hot code分开写入.

fgComputeReachability

这个函数负责计算可以到达block的block集合和DOM树, 没有标记所属的阶段, 包含以下的代码:

  1. // Compute reachability sets and dominators.
  2. fgComputeReachability();

fgComputeReachability的源代码如下:

  1. /*****************************************************************************
  2. *
  3. * Function called to compute the dominator and reachable sets.
  4. *
  5. * Assumes the predecessor lists are computed and correct.
  6. */
  7. void Compiler::fgComputeReachability()
  8. {
  9. #ifdef DEBUG
  10. if (verbose)
  11. {
  12. printf("*************** In fgComputeReachability\n");
  13. }
  14. fgVerifyHandlerTab();
  15. // Make sure that the predecessor lists are accurate
  16. assert(fgComputePredsDone);
  17. fgDebugCheckBBlist();
  18. #endif // DEBUG
  19. /* Create a list of all BBJ_RETURN blocks. The head of the list is 'fgReturnBlocks'. */
  20. fgReturnBlocks = nullptr;
  21. for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
  22. {
  23. // If this is a BBJ_RETURN block, add it to our list of all BBJ_RETURN blocks. This list is only
  24. // used to find return blocks.
  25. if (block->bbJumpKind == BBJ_RETURN)
  26. {
  27. fgReturnBlocks = new (this, CMK_Reachability) BasicBlockList(block, fgReturnBlocks);
  28. }
  29. }
  30. // Compute reachability and then delete blocks determined to be unreachable. If we delete blocks, we
  31. // need to loop, as that might have caused more blocks to become unreachable. This can happen in the
  32. // case where a call to a finally is unreachable and deleted (maybe the call to the finally is
  33. // preceded by a throw or an infinite loop), making the blocks following the finally unreachable.
  34. // However, all EH entry blocks are considered global entry blocks, causing the blocks following the
  35. // call to the finally to stay rooted, until a second round of reachability is done.
  36. // The dominator algorithm expects that all blocks can be reached from the fgEnterBlks set.
  37. unsigned passNum = 1;
  38. bool changed;
  39. do
  40. {
  41. // Just to be paranoid, avoid infinite loops; fall back to minopts.
  42. if (passNum > 10)
  43. {
  44. noway_assert(!"Too many unreachable block removal loops");
  45. }
  46. /* Walk the flow graph, reassign block numbers to keep them in ascending order */
  47. JITDUMP("\nRenumbering the basic blocks for fgComputeReachability pass #%u\n", passNum);
  48. passNum++;
  49. fgRenumberBlocks();
  50. //
  51. // Compute fgEnterBlks
  52. //
  53. fgComputeEnterBlocksSet();
  54. //
  55. // Compute bbReach
  56. //
  57. fgComputeReachabilitySets();
  58. //
  59. // Use reachability information to delete unreachable blocks.
  60. // Also, determine if the flow graph has loops and set 'fgHasLoops' accordingly.
  61. // Set the BBF_LOOP_HEAD flag on the block target of backwards branches.
  62. //
  63. changed = fgRemoveUnreachableBlocks();
  64. } while (changed);
  65. #ifdef DEBUG
  66. if (verbose)
  67. {
  68. printf("\nAfter computing reachability:\n");
  69. fgDispBasicBlocks(verboseTrees);
  70. printf("\n");
  71. }
  72. fgVerifyHandlerTab();
  73. fgDebugCheckBBlist(true);
  74. #endif // DEBUG
  75. //
  76. // Now, compute the dominators
  77. //
  78. fgComputeDoms();
  79. }

首先这个函数会把所有返回的block添加到fgReturnBlocks链表,
然后调用fgRenumberBlocks重新给block分配序号(下面的处理要求block的序号是整理过的),
然后调用fgComputeEnterBlocksSet把进入函数或者funclet的block(fgFirstBB和各个例外处理器的第一个block)加到fgEnterBlks集合中.
然后调用fgComputeReachabilitySets计算哪些block可以到达block(block自身和所有preds的bbReach的union)并保存到BasicBlock::bbReach.
然后调用fgRemoveUnreachableBlocks把不可从函数入口(fgEnterBlks)到达的block(fgEnterBlks | bbReach为空)删除.
最后调用fgComputeDoms计算DOM树.

关于DOM(dominator)树请参考前一篇文章中对Flowgraph Analysis的介绍,
一句话来说如果进入Block B必须经过Block A, 则称A是B的Dominator, 最近的Dominator会保存在BasicBlock::bbIDom中.
CoreCLR中计算DOM树的算法跟这篇论文中的算法一样.

PHASE_ALLOCATE_OBJECTS

这个阶段负责把GT_ALLOCOBJ节点转换为GT_CALL节点, 包含以下的代码:

  1. // Transform each GT_ALLOCOBJ node into either an allocation helper call or
  2. // local variable allocation on the stack.
  3. ObjectAllocator objectAllocator(this);
  4. objectAllocator.Run();

之前分析new的文章中提到过,
ObjectAllocator::Run会把allocobj节点转换为具体的jit helper call.
转换为call以后就和普通的函数调用一样了, 参数接收MethodTable的指针, 返回新创建的对象(构造函数未调用, 字段值全是0).

PHASE_OPTIMIZE_LOOPS

这个阶段负责识别和标记函数中的循环, 包含以下的代码:

  1. optOptimizeLoops();
  2. EndPhase(PHASE_OPTIMIZE_LOOPS);

optOptimizeLoops的处理如下:

首先调用optSetBlockWeights, 根据DOM树设置不能到达return block的block的权重(bbWeight) /= 2.
然后调用optFindNaturalLoops, 根据DOM树识别出循环并保存循环的信息到optLoopTable.

一个循环包含以下的组成部分(来源于optFindNaturalLoops的注释):

  1. /* We will use the following terminology:
  2. * HEAD - the basic block that flows into the loop ENTRY block (Currently MUST be lexically before entry).
  3. Not part of the looping of the loop.
  4. * FIRST - the lexically first basic block (in bbNext order) within this loop. (May be part of a nested loop,
  5. * but not the outer loop. ???)
  6. * TOP - the target of the backward edge from BOTTOM. In most cases FIRST and TOP are the same.
  7. * BOTTOM - the lexically last block in the loop (i.e. the block from which we jump to the top)
  8. * EXIT - the loop exit or the block right after the bottom
  9. * ENTRY - the entry in the loop (not necessarly the TOP), but there must be only one entry
  10. *
  11. * We (currently) require the body of a loop to be a contiguous (in bbNext order) sequence of basic blocks.
  12. |
  13. v
  14. head
  15. |
  16. | top/beg <--+
  17. | | |
  18. | ... |
  19. | | |
  20. | v |
  21. +---> entry |
  22. | |
  23. ... |
  24. | |
  25. v |
  26. +-- exit/tail |
  27. | | |
  28. | ... |
  29. | | |
  30. | v |
  31. | bottom ---+
  32. |
  33. +------+
  34. |
  35. v
  36. */

最后枚举循环中(top~bottom)的block并调用optMarkLoopBlocks.
optMarkLoopBlocks会增加循环中的block的权重(bbWeight),
对于backedge block(block的preds是比它更后的block, 例如循环的第一个block)的dominator, 权重会乘以BB_LOOP_WEIGHT(8), 否则乘以BB_LOOP_WEIGHT/2(4).

PHASE_CLONE_LOOPS

这个阶段用于执行复制循环的优化, 包含以下的代码:

  1. // Clone loops with optimization opportunities, and
  2. // choose the one based on dynamic condition evaluation.
  3. optCloneLoops();
  4. EndPhase(PHASE_CLONE_LOOPS);

上一个阶段PHASE_OPTIMIZE_LOOPS找出了函数中的循环信息,
optCloneLoops会判断哪些循环可以执行复制循环的优化并执行.
复制循环的优化具体如下(来源于optCloneLoop的注释):

  1. // We're going to make
  2. // Head --> Entry
  3. // First
  4. // Top
  5. // Entry
  6. // Bottom ?-> Top
  7. // X
  8. //
  9. // become
  10. //
  11. // Head ?-> Entry2
  12. // Head2--> Entry (Optional; if Entry == Top == First, let Head fall through to F/T/E)
  13. // First
  14. // Top
  15. // Entry
  16. // Bottom ?-> Top
  17. // X2--> X
  18. // First2
  19. // Top2
  20. // Entry2
  21. // Bottom2 ?-> Top2
  22. // X

更具体的例子:

  1. for (var x = 0; x < a.Length; ++x) {
  2. b[x] = a[x];
  3. }

(optCloneLoop)[https://github.com/dotnet/coreclr/blob/v1.1.0/src/jit/optimizer.cpp#L4420]前:

  1. if (x < a.Length) {
  2. do {
  3. var tmp = a[x];
  4. b[x] = tmp;
  5. x = x + 1;
  6. } while (x < a.Length);
  7. }

(optCloneLoop)[https://github.com/dotnet/coreclr/blob/v1.1.0/src/jit/optimizer.cpp#L4420]后:

  1. if (x < a.Length) {
  2. if ((a != null && b != null) && (a.Length <= b.Length)) {
  3. do {
  4. var tmp = a[x]; // no bounds check
  5. b[x] = tmp; // no bounds check
  6. x = x + 1;
  7. } while (x < a.Length);
  8. } else {
  9. do {
  10. var tmp = a[x];
  11. b[x] = tmp;
  12. x = x + 1;
  13. } while (x < a.Length);
  14. }
  15. }

这个优化的目的是在确保不会越界的情况(运行时)下, 可以省略掉循环中的边界检查.

PHASE_UNROLL_LOOPS

这个阶段用于执行展开循环的优化, 包含以下的代码:

  1. /* Unroll loops */
  2. optUnrollLoops();
  3. EndPhase(PHASE_UNROLL_LOOPS);

optUnrollLoops会尝试展开循环, 展开循环的条件有:

  • 循环次数在编译时可以确定
  • 当前编译模式不是debug, 也不需要小代码优化
  • 循环中代码体积不超过UNROLL_LIMIT_SZ(值参考代码)
  • 循环次数不超过ITER_LIMIT(值参考代码)

满足时将会把循环体按次数进行复制, 例如for (var x = 0; x < 3; ++x) { abc(); }会优化成abc(); abc(); abc();.

PHASE_MARK_LOCAL_VARS

这个阶段会更新本地变量表lvaTable中的信息, 包含以下的代码:

  1. /* Create the variable table (and compute variable ref counts) */
  2. lvaMarkLocalVars();
  3. EndPhase(PHASE_MARK_LOCAL_VARS);

lvaMarkLocalVars的处理如下:

  • 调用lvaAllocOutgoingArgSpace
    • 添加本地变量lvaOutgoingArgSpaceVar
    • 在x86上通过栈传递参数的时候会使用push, 在其他平台上可以直接复制值到这个变量完成栈参数的传递
    • 参考这个文档中对FEATURE_FIXED_OUT_ARGS的说明
  • 如果平台是x86(需要ShadowSP slots)
    • 添加本地变量lvaShadowSPslotsVar
    • 因为x86不会生成funclet, 例外处理机制需要使用额外的变量
    • 参考这个文档中对ShadowSP slots的说明
  • 如果平台不是x86(需要使用funclet)
    • 添加本地变量lvaPSPSym
    • PSPSym的全称是Previous Stack Pointer Symbol, 是一个指针大小的值, 保存上一个函数的堆栈地址
    • 调用eh funclet的时候恢复rsp到main function的rsp值, funclet就可以访问到原来的本地变量
    • 参考这个文档中对PSPSym的说明
    • 还可以参考上面funclet的例子中的汇编代码
  • 如果使用了localloc(stackalloc)
    • 添加本地变量lvaLocAllocSPvar
    • 用于保存修改后的rsp地址(genLclHeap)
  • 如果当前是除错模式则给各个本地变量分配序号
    • varDsc->lvSlotNum = lclNum (从0开始递增)
  • 枚举BasicBlock, 调用lvaMarkLocalVars
    • 枚举block中的树更新本地变量的引用计数
  • 如果本地变量用于储存来源于寄存器的引用参数, 则添加两次引用次数
  • 如果lvaKeepAliveAndReportThis成立(例如同步函数需要unlock this)
    • 并且如果该函数中无其他部分使用this, 则设置this的引用计数为1
  • 如果lvaReportParamTypeArg成立
    • 并且如果该函数中无其他部分使用这个变量, 则设置这个变量的引用计数为1
    • paramTypeArg(Generic Context)的作用是调用时传入MethodDesc
    • 例如new A<string>().Generic<int>(123)时会传入Generic<int>对应的MethodDesc
  • 调用lvaSortByRefCount
    • 判断各个本地变量是否可以跟踪(lvTracked), 和是否可以存到寄存器(lvDoNotEnregister)
    • 生成小代码时按lvRefCnt, 否则按lvRefCntWtd从大到小排序本地变量
    • 排序后生成新的lvaCurEpoch

PHASE_OPTIMIZE_BOOLS

这个阶段用于合并相邻的两个根据条件跳转的BasicBlock, 包含以下的代码:

  1. /* Optimize boolean conditions */
  2. optOptimizeBools();
  3. EndPhase(PHASE_OPTIMIZE_BOOLS);
  4. // optOptimizeBools() might have changed the number of blocks; the dominators/reachability might be bad.

optOptimizeBools会做以下的优化:

如果block的结构如下, 且B2中只有单条指令:

  1. B1: brtrue(t1, BX)
  2. B2: brtrue(t2, BX)
  3. B3

则转换为以下的结构:

  1. B1: brtrue(t1|t2, BX)
  2. B3:

如果block的结构如下, 且B2中只有单条指令:

  1. B1: brtrue(t1, B3)
  2. B2: brtrue(t2, BX)
  3. B3:
  4. ...
  5. BX:

则转换为以下的结构:

  1. B1: brtrue((!t1)&&t2, BX)
  2. B3:
  3. ...
  4. BX:

PHASE_FIND_OPER_ORDER

这个阶段用于判断各个节点(GenTree)的评价顺序并设置它们的运行和体积成本, 包含以下的代码:

  1. /* Figure out the order in which operators are to be evaluated */
  2. fgFindOperOrder();
  3. EndPhase(PHASE_FIND_OPER_ORDER);

fgFindOperOrder对每个BasicBlock中的语句调用gtSetStmtInfo.
gtSetStmtInfo针对GenTree递归调用gtSetEvalOrder.
gtSetEvalOrder函数会设置GenTree的运行成本(gtCostEx)和体积成本(gtCostSz),
且如果一个满足交换律的二元运算符的第二个参数成本比第一个参数高时, 标记这个运算需要先评价第二个参数.
运行成本(gtCostEx)和体积成本(gtCostSz)在后面用于判断是否值得执行CSE优化.

PHASE_SET_BLOCK_ORDER

这个阶段用于按各个节点(GenTree)的评价顺序把它们连成一个链表(LIR格式), 包含以下的代码:

  1. // Weave the tree lists. Anyone who modifies the tree shapes after
  2. // this point is responsible for calling fgSetStmtSeq() to keep the
  3. // nodes properly linked.
  4. // This can create GC poll calls, and create new BasicBlocks (without updating dominators/reachability).
  5. fgSetBlockOrder();
  6. EndPhase(PHASE_SET_BLOCK_ORDER);
  7. // IMPORTANT, after this point, every place where tree topology changes must redo evaluation
  8. // order (gtSetStmtInfo) and relink nodes (fgSetStmtSeq) if required.
  9. CLANG_FORMAT_COMMENT_ANCHOR;

fgSetBlockOrder会做以下的事情:

  • 判断是否要生成可中断的代码(例如有循环时需要生成), 如果要则设置genInterruptible = true
  • 调用fgCreateGCPolls
    • 枚举BasicBlock, 如果block标记为BBF_NEEDS_GCPOLL则插入调用CORINFO_HELP_POLL_GC(JIT_PollGC)的代码
    • JIT_PollGC会在运行GC时暂停当前的线程
  • 枚举BasicBlock, 调用fgSetBlockOrder
    • 枚举block中的语句, 调用fgSetStmtSeq
      • 对于语句中的节点(GenTree)递归调用fgSetTreeSeqHelper
        • 例如 a + b 会分别对 a, b, + 这3个节点调用fgSetTreeSeqFinish
        • fgSetTreeSeqFinish调用时会增加fgTreeSeqNum, 并且添加节点到链表fgTreeSeqLst
        • 全部完成后链表fgTreeSeqLst保存了所有GenTree节点, 这就是LIR的结构, 但正式使用LIR还要再经过几个阶段

PHASE_BUILD_SSA

这个阶段负责对访问本地变量的GenTree标记SSA版本, 包含以下的代码:

  1. if (doSsa)
  2. {
  3. fgSsaBuild();
  4. EndPhase(PHASE_BUILD_SSA);
  5. }

fgSsaBuild会给访问本地变量的节点(例如lclvar)分配SSA版本,
访问的形式有USE(读取了变量), DEF(写入了变量), USEASG(读取然后写入了变量, 例如+=),
变量的值写入一次SSA版本会加1, 同时读取的节点也会标记读取的是哪个版本的值, SSA版本保存在节点的GenTreeLclVarCommon::_gtSsaNum成员中.
如果读取的值来源于不同的block, 需要在运行时确定则在block的开头添加phi节点.
前一篇文章介绍了标记SSA的例子, 如下:

881857-20171028110140836-1221432801.jpg

fgSsaBuild的具体算法比较复杂, 请参考我的JIT笔记中的信息或者源代码.

PHASE_EARLY_PROP

这个阶段会根据SSA追踪本地变量并做出简单的优化, 包含以下的代码:

  1. if (doEarlyProp)
  2. {
  3. /* Propagate array length and rewrite getType() method call */
  4. optEarlyProp();
  5. EndPhase(PHASE_EARLY_PROP);
  6. }

optEarlyProp的处理如下:

  • 枚举BasicBlock和BasicBlock中的语句
    • 按执行顺序枚举语句中的tree, 调用optEarlyPropRewriteTree
      • 对于GT_ARR_LENGTH节点(获取数组长度的节点), 基于SSA跟踪数组的来源, 如果跟踪到new 数组[常量], 则把该节点替换为常量
      • 对于使用GT_INDIR获取MethodTable(vtable)的节点, 基于SSA追踪对象的来源, 和上面一样找到则把节点替换为常量
      • 对于获取对象成员并且需要检查null的节点, 如果成员的offset不超过一定值则可以去除nullcheck(因为一定会发生页错误), 在之前的文章中有提到过这个机制
      • 如果节点有修改则调用gtSetStmtInfo重新计算运行和体积成本
      • 如果节点有修改则调用fgSetStmtSeq更新GenTree的链表

PHASE_VALUE_NUMBER

这个阶段会为GenTree分配VN(Value Number), 包含以下的代码:

  1. if (doValueNum)
  2. {
  3. fgValueNumber();
  4. EndPhase(PHASE_VALUE_NUMBER);
  5. }

前面的SSA是针对访问本地变量的节点(GenTree)分配一个唯一的版本号, 版本号一致则值一致,
这里的VN则是针对所有节点(GenTree)分配一个唯一的标识, 标识相同则值相同.

fgValueNumber会调用fgValueNumberBlockfgValueNumberTree标记各个节点的VN.
VN有两种类型, Liberal假定其他线程只有在同步点才会修改heap中的内容, Conservative假定其他线程在任意两次访问之间都有可能修改heap中的内容.
VN会从ValueNumStore中分配, ValueNumStore包含以下类型的VN集合:

  • m_intCnsMap: int常量的VN集合
  • m_longCnsMap: long常量的VN集合
  • m_handleMap: field或者class handle的VN集合
  • m_floatCnsMap: float常量的VN集合
  • m_doubleCnsMap: double常量的VN集合
  • m_byrefCnsMap: byref常量的VN集合
  • m_VNFunc0Map: 带0个参数的操作符的VN集合
  • m_VNFunc1Map: 带1个参数的操作符(unary)的VN集合, 例如-x
  • m_VNFunc2Map: 带2个参数的操作符(binary)的VN集合, 例如a + b
  • m_VNFunc3Map: 带3个参数的操作符的VN集合

例如a = 1; b = GetNum(); c = a + b; d = a + b;,
a的VN是常量1, 储存在m_intCnsMap中,
b的VN因为无法确定值, 会调用VNForExpr分配一个新的VN,
c的VN是a+b的组合, 储存在m_VNFunc2Map中,
d的VN是a+b的组合, 因为之前已经生成过, 会从m_VNFunc2Map获取一个现有的VN,
这时我们可以确定c和d的值是相同的.

生成VN的具体算法请参考我的JIT笔记或者源代码.

PHASE_HOIST_LOOP_CODE

这个阶段会把循环中和循环无关的表达式提到循环外面, 包含以下的代码:

  1. if (doLoopHoisting)
  2. {
  3. /* Hoist invariant code out of loops */
  4. optHoistLoopCode();
  5. EndPhase(PHASE_HOIST_LOOP_CODE);
  6. }

optHoistLoopCode会枚举循环中的表达式,
获取表达式的VN, 并调用optVNIsLoopInvariant判断表达式的值是否和循环无关,
如果和循环无关, 并且表达式无副作用, 并且表达式的节点拥有足够的成本(gtCostEx)则把表达式提到循环外面.

例如优化前的代码:

  1. var a = SomeFunction();
  2. for (var x = 0; x < 3; ++x) {
  3. Console.WriteLine(a * 3);
  4. }

优化后可以把a * 3提到外面:

  1. var a = SomeFunction();
  2. var tmp = a * 3;
  3. for (var x = 0; x < 3; ++x) {
  4. Console.WriteLine(tmp);
  5. }

判断表达式的值和循环无关的依据有:

  • 如果VN是phi, 则phi的来源需要在循环外部(例如上面如果是x * 3则来源是循环内部)
  • 如果表达式访问了heap上的变量(class的成员)则不能判断无关
  • 表达式中访问的本地变量的SSA版本的定义需要在循环外部(例如上面的a的定义在循环外部)

PHASE_VN_COPY_PROP

这个阶段会替换具有相同VN的本地变量, 包含以下的代码:

  1. if (doCopyProp)
  2. {
  3. /* Perform VN based copy propagation */
  4. optVnCopyProp();
  5. EndPhase(PHASE_VN_COPY_PROP);
  6. }

optVnCopyProp会枚举所有读取(USE)本地变量的节点,
调用optCopyProp, 查找当前是否有VN相同并存活的其他变量, 如果有则替换读取的变量到该变量.

例如优化前的代码:

  1. var a = GetNum();
  2. var b = a;
  3. var c = b + 123;

优化后可以把b替换为a:

  1. var a = GetNum();
  2. var b = a;
  3. var c = a + 123;

后面如果b的引用计数为0则我们可以安全的删掉变量b.
这项优化可以减少多余的变量复制.

PHASE_OPTIMIZE_VALNUM_CSES

这个阶段会替换具有相同VN的表达式, 俗称CSE优化, 包含以下的代码:

  1. #if FEATURE_ANYCSE
  2. /* Remove common sub-expressions */
  3. optOptimizeCSEs();
  4. #endif // FEATURE_ANYCSE

optOptimizeCSEs会枚举所有节点,
调用optIsCSEcandidate判断是否应该对节点进行CSE优化, 判断依据包括表达式的成本(小代码时gtCostSz否则gtCostEx),
如果判断通过则调用optValnumCSE_Index, 对于拥有相同VN的节点,
第一次仅仅添加节点到optCSEhash索引中,
第二次因为节点已经在optCSEhash索引中, 会给该索引中的元素分配一个新的csdIndex(自增值), 然后设置节点的gtCSEnum等于csdIndex,
第三次之后节点已经在optCSEhash索引中, 也已经分配过csdIndex, 后面的节点的gtCSEnum都会指向同一个csdIndex.
完成后如果optCSEhash中有任意的元素有csdIndex, 则调用以下的函数执行CSE优化:

例如优化前的代码:

  1. var a = SomeFunction();
  2. var b = (a + 5) * a;
  3. var c = (a + 5) + a;

优化后可以把a + 5提取出来:

  1. var a = SomeFunction();
  2. var tmp = a + 5;
  3. var b = tmp * a;
  4. var c = tmp + a;

这项优化可以减少重复的计算, 但会增加本地变量的数量.

PHASE_ASSERTION_PROP_MAIN

这个阶段会根据SSA和VN再次传播断言, 包含以下的代码:

  1. if (doAssertionProp)
  2. {
  3. /* Assertion propagation */
  4. optAssertionPropMain();
  5. EndPhase(PHASE_ASSERTION_PROP_MAIN);
  6. }

optAssertionPropMain包含以下的处理:

  • 遍历节点调用optVNAssertionPropCurStmtVisitor
    • 调用optVnNonNullPropCurStmt
      • 针对call节点, 如果可以通过VN确定this不为null, 则标记可以省略null检查
      • 针对indir(deref)节点, 如果可以通过VN确定变量不为null, 则标记可以省略null检查
    • 调用optVNConstantPropCurStmt
      • 如果节点的VN是常量, 替换节点到该常量
  • 再次调用optAssertionGen根据当前的状态创建断言
  • 调用optComputeAssertionGen按跳转条件创建断言
    • 例如 if (a > 3) { /* block a */ } else { /* block b */ }, 可以断言block a中a > 3和block b中a <= 3
  • 再次调用[optAssertionProp]按传播后的断言优化节点
    • optAssertionProp_LclVar
      • 如果确定本地变量等于常量,修改为该常量
      • 如果确定本地变量等于另一本地变量,修改为另一本地变量
    • optAssertionProp_Ind
      • 如果indir(deref)左边的节点是lclVar, 并且该节点确定不为null, 则标记可以省略null检查
    • optAssertionProp_BndsChk
      • 如果数组的位置是常量并且确定不会溢出, 则标记不需要检查边界
    • optAssertionProp_Comma
      • 如果前面标记了不需要检查边界, 则删除边界检查(comma bound_check, expr) => (expr)
    • optAssertionProp_Cast
      • 如果是小范围类型转换为大范围类型, 则标记不会溢出
      • 如果是大范围类型转换为小范围类型, 且确定不会溢出则去除cast
    • optAssertionProp_Call
      • 如果可以确定this不为null, 则标记可以省略null检查
    • optAssertionProp_RelOp
      • 替换等于或者不等于的表达式, 例如x == const, x的值确定是可以替换成true或false

PHASE_OPTIMIZE_INDEX_CHECKS

这个阶段会根据VN和断言删除多余的数组边界检查, 包含以下的代码:

  1. if (doRangeAnalysis)
  2. {
  3. /* Optimize array index range checks */
  4. RangeCheck rc(this);
  5. rc.OptimizeRangeChecks();
  6. EndPhase(PHASE_OPTIMIZE_INDEX_CHECKS);
  7. }

OptimizeRangeChecks会枚举检查边界的节点(COMMA且左参数是ARR_BOUNDS_CHECK)并调用OptimizeRangeCheck,
如果可以通过VN确定访问的序号小于数组长度, 则可以去掉边界检查(COMMA左边只留副作用),

PHASE_UPDATE_FLOW_GRAPH

如果优化过程中做出了修改, 这个阶段会再次调用fgUpdateFlowGraph删除空block, 无法到达的block和多余的跳转:

  1. /* update the flowgraph if we modified it during the optimization phase*/
  2. if (fgModified)
  3. {
  4. fgUpdateFlowGraph();
  5. EndPhase(PHASE_UPDATE_FLOW_GRAPH);
  6. ...
  7. }

PHASE_COMPUTE_EDGE_WEIGHTS2

如果优化过程中做出了修改, 这个阶段会再次调用fgComputeEdgeWeights计算block和block edge的权重(weight):
从阶段的名字也可以看出来这个阶段的处理跟前面的PHASE_COMPUTE_EDGE_WEIGHTS阶段一样.

  1. /* update the flowgraph if we modified it during the optimization phase*/
  2. if (fgModified)
  3. {
  4. ...
  5. // Recompute the edge weight if we have modified the flow graph
  6. fgComputeEdgeWeights();
  7. EndPhase(PHASE_COMPUTE_EDGE_WEIGHTS2);
  8. }

PHASE_DETERMINE_FIRST_COLD_BLOCK

这个阶段负责标记第一个冷(cold)的BasicBlock, 包含以下的代码:

  1. fgDetermineFirstColdBlock();
  2. EndPhase(PHASE_DETERMINE_FIRST_COLD_BLOCK);

因为前面的fgReorderBlocks已经把权重较小的block排到链表的后面,
fgDetermineFirstColdBlock会查找BasicBlock链表的最后连续标记了BBF_RUN_RARELY的部分,
设置第一个标记的block到fgFirstColdBlock, 然后标记这些block为BBF_COLD, 如果找不到则fgFirstColdBlock会等于null.

CodeGen会根据fgFirstColdBlock把代码分为两部分, 热(hot)的部分和冷(cold)的部分分别写入到不同的位置.

PHASE_RATIONALIZE

这个阶段是JIT后端的第一个阶段, 解决LIR中需要上下文判断的节点并正式开始使用LIR, 包含以下的代码:

  1. #ifndef LEGACY_BACKEND
  2. // rationalize trees
  3. Rationalizer rat(this); // PHASE_RATIONALIZE
  4. rat.Run();
  5. #endif // !LEGACY_BACKEND

Rationalizer::Run包含以下的处理:

  • 枚举BasicBlock中的语句(stmt)
    • 如果当前的平台不支持GT_INTRINSIC节点的操作(例如abs, round, sqrt)则替换为helper call
    • 设置上一个语句的最后一个节点的下一个节点是下一个语句的第一个节点
    • 设置下一个语句的第一个节点的上一个节点是上一个语句的最后一个节点
  • 标记BasicBlock的第一个节点和最后一个节点
  • 标记BasicBlock的格式已经是LIR(BBF_IS_LIR)
  • 枚举BasicBlock中的语句(stmt)
    • 把语句节点(GT_STMT)转换为IL偏移值节点(GT_IL_OFFSET), 用于标记哪些节点属于哪行IL语句
    • 针对语句中的节点调用Rationalizer::RewriteNode
      • 把修改变量的GT_LCL_VAR, GT_LCL_FLD, GT_REG_VAR, GT_PHI_ARG节点转换为GT_STORE_LCL_VAR, GT_STORE_LCL_FLD
      • 把修改地址值的GT_IND节点转换为GT_STOREIND
      • 把修改类字段的GT_CLS_VAR节点转换为GT_CLS_VAR_ADDR+GT_STOREIND
      • 把修改块值的GT_BLK, GT_OBJ, GT_DYN_BLK节点转换为GT_STORE_BLK, GT_STORE_OBJ, GT_STORE_DYN_BLK
      • 删除GT_BOX节点(因为已经转换为call)
      • 对于GT_ADDR节点
        • 如果目标是本地变量则修改节点为GT_LCL_VAR_ADDR或者GT_LCL_FLD_ADDR
        • 如果目标是类字段则修改节点为GT_CLS_VAR_ADDR
        • 如果对象是indir则可以同时删除indir和addr(&*someVar => someVar)
      • 对于GT_NOP节点, 如果有参数则替换为参数并删除
      • 对于GT_COMMA节点
        • 如果第一个参数无副作用, 则删除第一个参数的所有节点
        • 如果第二个参数无副作用且值未被使用, 则删除第二个参数的所有节点
        • 删除GT_COMMA节点(第一个和第二个参数已经按顺序连接起来)
      • 删除GT_ARGPLACE节点(后面会添加GT_PUTARG_REGGT_PUTARG_STK节点)
      • 把读取类字段的GT_CLS_VAR节点转换为GT_CLS_VAR_ADDR+GT_IND
      • 确保当前cpu支持GT_INTRINSIC节点对应的操作(例如abs, round, sqrt)
  • 设置正式开始使用LIR Compiler::compRationalIRForm = true

PHASE_SIMPLE_LOWERING

这个阶段会做一些简单的Lowering(使LIR更接近机器代码)工作, 包含以下的代码:

  1. // Here we do "simple lowering". When the RyuJIT backend works for all
  2. // platforms, this will be part of the more general lowering phase. For now, though, we do a separate
  3. // pass of "final lowering." We must do this before (final) liveness analysis, because this creates
  4. // range check throw blocks, in which the liveness must be correct.
  5. fgSimpleLowering();
  6. EndPhase(PHASE_SIMPLE_LOWERING);

fgSimpleLowering包含以下的处理:

  • 按LIR顺序枚举节点
    • 如果节点是GT_ARR_LENGTH, 转换为GT_IND(arr + ArrLenOffset)
      • 例如数组对象在x64下0~8是指向MethodTable的指针, 8~12是数组长度, 则转换为indir(lclVar +(ref) const 8)
    • 如果节点是GT_ARR_BOUNDS_CHECK
      • 确保抛出IndexOutOfRangeException的BasicBlock存在, 不存在则添加

PHASE_LCLVARLIVENESS

这个阶段会设置各个BasicBlock进入和离开时存活的变量集合, 包含以下的代码:
这个阶段仅在使用旧的JIT后端(JIT32)时会启用, 也就是一般的CoreCLR不会执行这个阶段.

  1. #ifdef LEGACY_BACKEND
  2. /* Local variable liveness */
  3. fgLocalVarLiveness();
  4. EndPhase(PHASE_LCLVARLIVENESS);
  5. #endif // !LEGACY_BACKEND

fgLocalVarLiveness会设置BasicBlock的以下成员:

  • bbVarUse 使用过的本地变量集合
  • bbVarDef 修改过的本地变量集合
  • bbVarTmp 临时变量
  • bbLiveIn 进入block时存活的变量集合
  • bbLiveOut 离开block后存活的变量集合
  • bbHeapUse 是否使用过全局heap
  • bbHeapDef 是否修改过全局heap
  • bbHeapLiveIn 进入blob时全局heap是否存活
  • bbHeapLiveOut 离开blob后全局heap是否存活
  • bbHeapHavoc 是否会让全局heap进入未知的状态

PHASE_LOWERING

这个阶段会做主要的Lowering(使LIR更接近机器代码)工作, 确定各个节点需要的寄存器数量, 包含以下的代码:

  1. ///
  2. // Dominator and reachability sets are no longer valid. They haven't been
  3. // maintained up to here, and shouldn't be used (unless recomputed).
  4. ///
  5. fgDomsComputed = false;
  6. /* Create LSRA before Lowering, this way Lowering can initialize the TreeNode Map */
  7. m_pLinearScan = getLinearScanAllocator(this);
  8. /* Lower */
  9. Lowering lower(this, m_pLinearScan); // PHASE_LOWERING
  10. lower.Run();

Lowering::Run包含以下的处理:

  • 按LIR顺序枚举节点
    • 如果是x86(32位)则分解long节点到两个int节点(loResult => hiResult => long)
    • GT_IND: 判断是否可以替换为LEA节点(可以使用CPU中的LEA指令)
      • 例如*(((v07 << 2) + v01) + 16)可以替换为*(lea(v01 + v07*4 + 16))
    • GT_STOREIND: 判断是否可以替换为LEA节点, 同上
    • GT_ADD: 判断是否可以替换为LEA节点, 同上
    • GT_UDIV: 判断是否可以替换到RSZ节点
      • 例如16/2可以替换为16>>1
    • GT_UMOD: 判断是否可以替换到AND节点
      • 例如17/2可以替换为17&(2-1)
    • GT_DIV, GT_MOD:
      • 如果divisor是int.MinValue或者long.MinValue, 转换到EQ(只有自己除自己可以得到1)
      • 如果divisor是power of 2
        • 转换DIV到RSH, 例如16/-2转换到-(16>>1)
        • 转换MOD, 例如31%8转换到31-8*(31/8)转换到31-((31>>3)<<3)转换到31-(31& ~(8-1))
    • GT_SWITCH
      • 替换switch下的节点到一个本地变量
        • 例如switch v01 - 100替换到tmp = v01 - 100; switch tmp
      • 添加判断并跳到default case的节点
        • 例如if (tmp > jumpTableLength - 2) { goto jumpTable[jumpTableLength - 1]; }
      • 创建一个新的BasicBlock, 把原来的BBJ_SWITCH转移到这个block
        • 转移后的结构:
          • 原block (BBJ_COND, 条件成立时跳转到default case)
          • 新block (包含转移后的switch)
          • 剩余的block
      • 如果剩余的跳转目标都是同一个block, 可以省略掉switch, 直接跳过去
      • 否则如果跳转个数小于minSwitchTabJumpCnt则转换switch到多个jtrue(if ... else if ... else)
      • 否则转换switch到GT_SWITCH_TABLE节点(后面会生成一个包含偏移值的索引表, 按索引来跳转)
    • GT_CALL
      • 针对参数添加GT_PUTARG_REG或者GT_PUTARG_STK节点
      • 如果是调用委托则转换到具体的取值+调用
        • 例如把call originalThis转换到call indir(lea(originalThis+24)) with indir(lea(originalThis+8))
        • indir(lea(originalThis+24))是函数的地址
        • indir(lea(originalThis+8))是真正的this, 会替换掉原有的this式
      • 否则如果是GTF_CALL_VIRT_STUB则替换到call ind(函数地址的地址)
      • 否则如果是GTF_CALL_VIRT_VTABLE则替换到call ind(vtable中函数的地址)
        • 例如ind(lea(ind(lea(ind(lea(this+0))+72))+32))
      • 否则如果是GTF_CALL_NONVIRT
        • 如果是helper call则获取具体的函数地址(例如JIT_New的函数地址)
        • 如果函数地址已知则生成call addr
        • 如果函数地址的地址已知则生成call ind(addr)
        • 如果函数地址的地址的地址已知则生成call ind(ind(addr))
    • GT_JMP, GT_RETURN
      • 如果调用了非托管函数则在前面插入PME(pinvoke method epilog)
    • GT_CAST
      • 转换GT_CAST(small, float/double)GT_CAST(GT_CAST(small, int), float/double)
      • 转换GT_CAST(float/double, small)GT_CAST(GT_CAST(float/double, int), small)
    • GT_ARR_ELEM: 转换到获取元素地址并且IND的节点(例如IND(LEA))
    • GT_STORE_BLK, GT_STORE_OBJ, GT_STORE_DYN_BLK: 判断计算地址的节点是否可以替换为LEA节点, 同上
  • 按LIR顺序枚举节点
    • 计算节点需要的寄存器数量
    • 设置哪些节点是contained(contained节点是其他节点的指令的一部分)

可以参考上一篇文章关于Lowering的例子:

881857-20171028110148461-1101877151.jpg

PHASE_LINEAR_SCAN

这个阶段负责给各个节点分配寄存器, 使用的是LSRA算法, 包含以下的代码:

  1. assert(lvaSortAgain == false); // We should have re-run fgLocalVarLiveness() in lower.Run()
  2. lvaTrackedFixed = true; // We can not add any new tracked variables after this point.
  3. /* Now that lowering is completed we can proceed to perform register allocation */
  4. m_pLinearScan->doLinearScan();
  5. EndPhase(PHASE_LINEAR_SCAN);

LSRA算法可以看这一篇论文中的说明, 但CoreCLR中使用的算法和论文中的算法不完全一样.
LSRA算法要求根据LIR生成以下数据:

Interval

Interval表示同一个变量(本地L, 内部T, 其他I)对应的使用期间, 包含多个RefPosition,
本地变量的Interval会在一开始创建好, 其他(临时)的Interval会在需要使用寄存器(例如call返回值)时使用,
Interval有激活(activate)和未激活(inactive)状态, 未激活状态代表在当前位置该变量不会被使用(不占用寄存器).

LocationInfo

LocationInfo表示代码位置, 在构建时会对LIR中的GenTree分配位置, 位置总会+2.

RefPosition

RefPosition有以下的类型:

  • Def: 记录写入变量的位置, 有对应的Interval
  • Use: 记录读取变量的位置, 有对应的Interval
  • Kill: 记录寄存器值会被覆盖的位置, 常见于call时标记caller save registers被覆盖
  • BB: 记录BasicBlock的位置
  • FixedReg: 记录当前位置使用了固定的寄存器
  • ExpUse: 记录离开当前block时存活且进入后继block时也存活的变量(exposed use)
  • ParamDef: 记录函数开头传入(定义)的参数变量
  • DummyDef: 记录函数开头未定义的参数变量
  • ZeroInit: 记录函数开头需要0初始化的变量
  • KillGCRefs: 记录需要确保当前寄存器中无GC引用(对象或者struct的指针)的位置

可以参考上一篇文章中说明LSRA的图片:

881857-20171028110156070-945571268.jpg

LinearScan::doLinearScan包含以下的处理:

  • 调用setFrameType设置当前是否应该使用Frame Pointer
    • 使用Frame Pointer表示需要使用rbp保存进入函数时的rsp值, 需要清除所有节点的寄存器候选中的rbp
  • 调用initMaxSpill初始化用于记录spill层数的数组
    • 数组maxSpill有两个元素, 一个记录int的最大spill层数, 另一个记录float的最大spill层数
  • 调用buildIntervals构建LSRA算法需要的数据结构
    • 构建Interval, RefPosition, LocationInfo
  • 调用initVarRegMaps设置进入和离开block时变量使用的寄存器
    • 枚举BasicBlock
      • 设置inVarToRegMaps[blockIndex] = new regNumber[跟踪的变量数量]
      • 设置outVarToRegMaps[blockIndex] = new regNumber[跟踪的变量数量]
      • 枚举跟踪的变量数量
        • 设置inVarToRegMaps[blockIndex][regMapIndex] = REG_STK(默认通过栈传递)
        • 设置outVarToRegMap[blockIndex][regMapIndex] = REG_STK(默认通过栈传递)
    • 因为JIT需要确保如果变量在寄存器中, 离开block时变量所在的寄存器和进入后继block时变量所在的寄存器一致
  • 调用allocateRegisters分配寄存器
    • 这个函数包含了LSRA算法的主要处理, 以下是简化过的流程, 完整的请看我的JIT笔记
    • 建立一个寄存器索引physRegs[寄存器数量], 索引寄存器 => (上次使用寄存器的RefPosition, 是否正在使用)
    • 枚举Interval, 如果是传入的函数参数则设置isActive = true
    • 枚举RefPosition
      • 如果RefPosition是读取(Use)
        • 如果当前无正在分配的寄存器则标记为reload(把值从栈reload到寄存器)
      • 如果RefPosition要求使用固定的寄存器(例如Kill)
        • 让寄存器对应的Interval让出寄存器并设置为inactive
      • 如果RefPosition是最后一次读取(Use)
        • 标记下一轮处理Interval让出寄存器并设置为inactive
      • 如果RefPosition是读取(Use)或者写入(Def)且未分配寄存器
        • 调用tryAllocateFreeReg分配一个寄存器(论文中的First Pass)
        • 如果分配失败则调用allocateBusyReg再次尝试分配一个寄存器(论文中的Second Pass)
          • 必要时会让原来的寄存器对应的Interval让出寄存器(把值从寄存器spill到栈, 然后变为inactive)
        • 分配成功时Interval变为active
    • (如果一个变量对应的Interval从未让出过寄存器(spill), 则这个变量可以一直使用寄存器保存而不需要访问栈)
    • (反过来说如果一个Interval让出过寄存器(spill), 且该Interval不是本地变量, 则需要增加一个内部临时变量)
  • 调用resolveRegisters解决block之间寄存器的差异
    • 上面的分配是线性的, 并未考虑到flowgraph, 这个函数会确保离开block时变量所在的寄存器和进入后继block时变量所在的寄存器一致
    • 根据之前分配的结果给节点(GenTree)设置使用的寄存器
    • 如果需要重新从栈读取值则插入GT_RELOAD节点
    • 设置进入block时变量所在的寄存器索引inVarToRegMaps
    • 设置离开block时变量所在的寄存器索引outVarToRegMap
    • 调用resolveEdges
      • 如果block的后继block有多个前继block, 例如(A => B, C => B), 则需要在A中解决
        • 如果block结束时变量的寄存器跟后继block的寄存器一致, 则无需resolution
        • 如果block结束时变量的寄存器跟后继block的寄存器不一致, 但所有后继block的寄存器都相同
          • 在block结束前插入GT_COPY节点, 复制来源寄存器到目标寄存器(或者来源寄存器到堆栈到目标寄存器)
        • 如果block结束时变量的寄存器跟后继block的寄存器不一致, 且不是所有后继block的寄存器都相同
          • 在block和后继block之间插入一个新block, 新block中插入GT_COPY节点用于复制到目标寄存器
      • 如果block只有一个前继block, 例如(A => B), 则可以在B中解决
        • 对于不一致的寄存器在block开头插入GT_COPY节点
    • 对于从未spill过的本地变量, 设置它可以不放在栈上(lvRegister = true, lvOnFrame = false)
    • 对于非本地变量的spill, 根据maxSpill[int]maxSpill[float]调用tmpPreAllocateTemps创建指定数量的内部临时变量

经过这个阶段后, LIR中需要寄存器的节点都会得到明确的寄存器, 读取或者写入本地变量的节点也会明确目标是栈还是某个寄存器.

PHASE_RA_ASSIGN_VARS

因为旧的JIT后端不支持LSRA, 这个阶段负责给旧的JIT后端(JIT32)分配寄存器, 包含以下的代码:

  1. lvaTrackedFixed = true; // We cannot add any new tracked variables after this point.
  2. // For the classic JIT32 at this point lvaSortAgain can be set and raAssignVars() will call lvaSortOnly()
  3. // Now do "classic" register allocation.
  4. raAssignVars();
  5. EndPhase(PHASE_RA_ASSIGN_VARS);

因为一般的CoreCLR不会执行这个阶段, 这里就不详细分析了.

PHASE_GENERATE_CODE

从这个阶段开始就属于CodeGen了, CodeGen的入口如下:

  1. /* Generate code */
  2. codeGen->genGenerateCode(methodCodePtr, methodCodeSize);

genGenerateCode包含了三个阶段:

  • PHASE_GENERATE_CODE: 负责根据LIR生成汇编指令
  • PHASE_EMIT_CODE: 根据汇编指令写入可执行的机器代码
  • PHASE_EMIT_GCEH: 写入函数的附加信息(函数头, GC信息, 例外信息等)

CodeGen会使用以下的数据类型:

  • instrDesc: 汇编指令的数据, 一个instrDesc实例对应一条汇编指令
  • insGroup: 汇编指令的组, 一个insGroup包含一个或多个instrDesc, 跳转指令的目标只能是IG的第一条指令

以下是上一篇文章中的图片说明:

881857-20171028110203555-1584196634.jpg

PHASE_GENERATE_CODE阶段包含了以下的处理:

  • 调用lvaAssignFrameOffsets给各个本地变量分配栈偏移值
    • 会分两步计算
      • 第一步设置一个虚拟的初始偏移值0, 然后以这个0为基准设置各个变量的偏移值, 参数为正数本地变量为负数
      • 第二步根据是否使用frame pointer调整各个偏移值
    • 计算完毕后会设置compLclFrameSize, 代表进入函数时需要分配的大小(例如sub rsp, 0x80)
  • 调用emitBegFN预留函数的prolog所使用的IG
    • LIR只包含了函数体, 函数的prolog需要一个单独的IG保存
  • 调用genCodeForBBlist处理BasicBlock
    • 如果block是小函数(funclet)的第一个block, 则预留小函数的prolog所使用的IG
    • 以LIR顺序枚举block中的节点, 调用genCodeForTreeNode根据节点添加汇编指令
      • GT_CNS_INT: 如果常量的值是0, 生成xor targetReg, targetReg, 否则生成mov, targetReg, imm
      • GT_NEG: 如果来源寄存器跟目标寄存器不一致则生成mov targetReg, sourceReg, 然后生成neg targetReg
      • GT_LCL_VAR: 如果本地变量已经在寄存器则可以不处理, 否则生成从栈读取到寄存器的指令, 例如mov targetReg, [rbp-offset]
      • GT_STORE_LCL_VAR: 如果本地变量已经在相同的寄存器则不处理, 如果在不同的寄存器则添加复制寄存器的指令, 否则生成从寄存器保存到栈的指令
      • 更多的类型可以参考我的JIT笔记
    • 判断block的跳转类型
      • BBJ_ALWAYS: 添加jmp指令
      • BBJ_RETURN: 预留函数的epilog使用的IG
      • BBJ_THROW: 添加int 3指令(这个指令不会被执行)
      • BBJ_CALLFINALLY: 添加mov rcx, pspsym; call finally-funclet; jmp finally-return;的指令
      • BBJ_EHCATCHRET: 移动block的目标地址(返回地址)到rax, 然后预留小函数的epilog使用的IG
      • BBJ_EHFINALLYRET, BBJ_EHFILTERRET: 预留小函数的epilog使用的IG
  • 调用genGeneratePrologsAndEpilogs添加prolog和epilog中的指令
    • 调用genFnProlog生成主函数的prolog
      • 如果需要使用Frame Pointer, 则添加push rbp; mov rbp, rsp
      • push修改过的Callee Saved Register
      • 添加分配栈空间的指令, 例如sub rsp, size, 并添加确认栈空间的虚拟内存(所有页)可访问的指令
      • 添加清零栈空间的指令(本地变量的初始值是0)
      • 如果使用了小函数(funclet), 则添加mov [lvaPSPSym], rsp
      • 如果使用了Generic Context参数则添加保存它到本地变量的指令
      • 如果使用了GS Cookie则添加设置GS Cookie值的指令
    • 调用emitGeneratePrologEpilog生成主函数的epilog和小函数的prolog和epilog
      • 枚举之前预留的IG列表
        • IGPT_PROLOG: 上面已经生成过, 这里可以跳过
        • IGPT_EPILOG: 调用genFnEpilog生成主函数的epilog
          • pop之前prolog里面push过的Callee Saved Register
          • 如果使用Frame Pointer且是x86, 则添加mov esp, ebp; pop ebp;
          • 如果使用Frame Pointer且是x64, 则添加add rsp, size; pop rbp或者lea rsp, [rsp+size]; pop rbp;
          • 如果不使用Frame Pointer, 则添加add rsp, size或者lea rsp, [rsp+size]
          • 如果是tail call则添加call addr, 如果是fast tail call则添加jmp rax, 否则添加ret
        • IGPT_FUNCLET_PROLOG:
          • 添加push rbp
          • push修改过的Callee Saved Register
          • 添加分配栈空间的指令, 例如sub rsp, size
          • 添加继承PSPSym并恢复主函数rbp的指令, 例如mov rbp, [rcx+20h]; mov [rsp+20h], rbp; lea rbp,[rbp+40h];
        • IGPT_FUNCLET_EPILOG:
          • 添加释放栈空间的指令, 例如add rsp, size
          • pop之前prolog里面push过的Callee Saved Register
          • 添加pop rbp
          • 添加ret

PHASE_EMIT_CODE

上一个阶段生成了汇编指令, 但这些指令是通过instrDesc保存在insGroup的数据结构, 并不是可执行的机器代码.
这个阶段负责根据instrDesc列表写入实际可执行的机器代码.

以下是上一篇文章中的图片说明:

881857-20171028110210773-1401314447.jpg

生成的结构如下, 包含函数代码, 函数头和真函数头:

881857-20171028110218117-790239182.jpg

这个阶段的主要处理在emitEndCodeGen函数中, 包含以下的处理:

  • 调用CEEJitInfo::allocMem分配保存可执行机器代码的内存
    • 调用EEJitManager::allocCode
      • 调用EEJitManager::allocCodeRaw
        • 获取CodeHeap(chunk)的列表, 如果空间不足则调用EEJitManager::NewCodeHeap分配一个新的chunk
        • 如果是动态函数, 这里会分配"函数头+函数代码+真函数头"的大小并返回指向"函数代码"的指针
        • 如果不是动态函数, 这里会分配"函数头+函数代码"的大小并返回函数代码的指针
      • 如果不是动态函数, 调用pMD->GetLoaderAllocator()->GetLowFrequencyHeap()->AllocMem分配真函数头
        • 这里分配的区域只有PAGE_READWRITE, 不可被执行
      • 设置"函数头"中的指针指向"真函数头"
      • 调用NibbleMapSet设置Nibble Map, 用于定位函数的开始地址
        • Nibble Map在函数所在chunk(HeapList)的pHdrMap成员中, 是一个DWORD的数组, 一个DWORD包含8个Nibble格式如下
        • [ [ NIBBLE(4bit), NIBBLE, ...(8个) ], [ NIBBLE, NIBBLE, ...(8个) ], ... ]
        • 例如函数的开始地址是0x7fff7ce80078, 所在chunk(HeapList)的基础地址是0x7fff7ce80000, 则偏移值是120
        • Nibble的值是((120 % 32) / 4) + 1 = 7
        • Nibble存放在第120 / 32 / 8 = 0个DWORD中的第120 / 32 = 3个Nibble
        • 也就是DWORD的值会&= 0xfff0ffff然后|= 0x00070000
        • Nibble Map会可以根据当前PC查找函数的开始地址和对应的函数头, 对于调试和GC都是必要的信息
  • 枚举IG(insGroup)列表
    • 记录IG开始时有哪些GC引用(对象或者struct的指针)在栈和寄存器上, 添加到gcInfo的列表中
    • 枚举IG中的指令(instrDesc)
      • 调用emitIssue1Instr编码指令
        • 调用emitOutputInstr(x86/x64版本)
          • 判断指令的类型并写入指令, 指令的类型有
            • 无参数的指令, 例如nop
            • 带一个常量的指令, 例如jge, loop, ret
            • 带跳转目标(label)的指令, 例如jmp
            • 带函数或者函数指针的指令, 例如call
            • 带单个寄存器的指令, 例如inc, dec
            • 带两个寄存器的指令, 例如mov
            • 第一个参数是寄存器, 第二个参数是内存的指令, 例如mov
            • 更多的处理可以参考我的JIT笔记
      • 写入指令的同时会更新gcInfo的列表
        • 例如从函数地址+x开始寄存器rax中包含GC引用, 从函数地址+x1开始寄存器rax不包含GC引用等

这个阶段完成了对函数中机器代码的写入, 接下来就是最后一个阶段.

PHASE_EMIT_GCEH

这个阶段负责写入函数相关的信息, 也就是上面"真函数头"中的信息.
"真函数头"的类型是_hpRealCodeHdr, 包含以下的信息:

  • phdrDebugInfo: PC到IL offset的索引
  • phdrJitEHInfo: EH Clause的数组
  • phdrJitGCInfo: GC扫描栈和寄存器使用的信息
  • phdrMDesc: 函数的MethodDesc
  • nUnwindInfos: unwindInfos的数量
  • unindInfos: unwind信息(栈回滚信息)

DebugInfo

phdrDebugInfo是一个DWORD的数组, 格式是Nibble Stream, 以4 bit为单位保存数字.
例如 0xa9 0xa0 0x03 代表 80, 19 两个数字:

  1. 0xa9 = 0b1010'1001 (最高位的1代表还需要继续读取下一个nibble)
  2. 0xa0 = 0b1010'0000 (最高位的0表示当前数字已结束)
  3. 0x03 = 0b0000'0011
  4. 001 010 000 => 80
  5. 010 011 => 19

数字列表的结构是:

  • header, 包含两个数字, 第一个是offset mapping编码后的长度(bytes), 第二个是native vars编码后的长度(bytes)
  • offset mapping
    • offset mapping 的数量
    • native offset, 写入与前一条记录的偏移值
    • il offset
    • source 标记(flags), 有SOURCE_TYPE_INVALID, SEQUENCE_POINT, STACK_EMPTY
  • native vars (内部变量所在的scope的信息)
    • native vars 的数量
    • startOffset scope的开始偏移值
    • endOffset scope的结束偏移值, 写入距离start的delta
    • var number 变量的序号
    • var type (reg还是stack)
    • 后面的信息根据var type而定, 具体参考DoNativeVarInfo

IDE可以根据DebugInfo知道下断点的时候应该把断点设在哪个内存地址, 步过的时候应该在哪个内存地址停下来等.

EHInfo

phdrJitEHInfo是指向CorILMethod_Sect_FatFormat结构体的指针, 包含了EH Clause的数量和EE_ILEXCEPTION_CLAUSE的数组.

使用以下的C#代码:

  1. var x = GetString();
  2. try {
  3. Console.WriteLine(x);
  4. throw new Exception("abc");
  5. } catch (Exception ex) {
  6. Console.WriteLine(ex);
  7. Console.WriteLine(x);
  8. }

可以生成以下的汇编代码:

  1. IN0016: 000000 push rbp
  2. IN0017: 000001 push rbx
  3. IN0018: 000002 sub rsp, 24
  4. IN0019: 000006 lea rbp, [rsp+20H]
  5. IN001a: 00000B mov qword ptr [V06 rbp-20H], rsp
  6. G_M21556_IG02: ; offs=00000FH, size=0009H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
  7. IN0001: 00000F call ConsoleApplication.Program:GetString():ref
  8. IN0002: 000014 mov gword ptr [V01 rbp-10H], rax
  9. G_M21556_IG03: ; offs=000018H, size=0043H, gcVars=0000000000000001 {V01}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
  10. IN0003: 000018 mov rdi, gword ptr [V01 rbp-10H]
  11. IN0004: 00001C call System.Console:WriteLine(ref)
  12. IN0005: 000021 mov rdi, 0x7F78892D3CE8
  13. IN0006: 00002B call CORINFO_HELP_NEWSFAST
  14. IN0007: 000030 mov rbx, rax
  15. IN0008: 000033 mov edi, 1
  16. IN0009: 000038 mov rsi, 0x7F78881BCE70
  17. IN000a: 000042 call CORINFO_HELP_STRCNS
  18. IN000b: 000047 mov rsi, rax
  19. IN000c: 00004A mov rdi, rbx
  20. IN000d: 00004D call System.Exception:.ctor(ref):this
  21. IN000e: 000052 mov rdi, rbx
  22. IN000f: 000055 call CORINFO_HELP_THROW
  23. IN0010: 00005A int3
  24. G_M21556_IG04: ; offs=00005BH, size=0007H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, epilog, nogc
  25. IN001b: 00005B lea rsp, [rbp-08H]
  26. IN001c: 00005F pop rbx
  27. IN001d: 000060 pop rbp
  28. IN001e: 000061 ret
  29. G_M21556_IG05: ; func=01, offs=000062H, size=000EH, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref, funclet prolog, nogc
  30. IN001f: 000062 push rbp
  31. IN0020: 000063 push rbx
  32. IN0021: 000064 push rax
  33. IN0022: 000065 mov rbp, qword ptr [rdi]
  34. IN0023: 000068 mov qword ptr [rsp], rbp
  35. IN0024: 00006C lea rbp, [rbp+20H]
  36. G_M21556_IG06: ; offs=000070H, size=0018H, gcVars=0000000000000001 {V01}, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, gcvars, byref, isz
  37. IN0011: 000070 mov rdi, rsi
  38. IN0012: 000073 call System.Console:WriteLine(ref)
  39. IN0013: 000078 mov rdi, gword ptr [V01 rbp-10H]
  40. IN0014: 00007C call System.Console:WriteLine(ref)
  41. IN0015: 000081 lea rax, G_M21556_IG04
  42. G_M21556_IG07: ; offs=000088H, size=0007H, funclet epilog, nogc, emitadd
  43. IN0025: 000088 add rsp, 8
  44. IN0026: 00008C pop rbx
  45. IN0027: 00008D pop rbp
  46. IN0028: 00008E ret

用lldb来分析这个函数的EHInfo可以得到:

  1. (lldb) p *codePtr
  2. (void *) $1 = 0x00007fff7ceef920
  3. (lldb) p *(CodeHeader*)(0x00007fff7ceef920-8)
  4. (CodeHeader) $2 = {
  5. pRealCodeHeader = 0x00007fff7cf35c78
  6. }
  7. (lldb) p *(_hpRealCodeHdr*)(0x00007fff7cf35c78)
  8. (_hpRealCodeHdr) $3 = {
  9. phdrDebugInfo = 0x0000000000000000
  10. phdrJitEHInfo = 0x00007fff7cf35ce0
  11. phdrJitGCInfo = 0x0000000000000000
  12. phdrMDesc = 0x00007fff7baf9200
  13. nUnwindInfos = 2
  14. unwindInfos = {}
  15. }
  16. (lldb) me re -s8 -c20 -fx 0x00007fff7cf35ce0-8
  17. 0x7fff7cf35cd8: 0x0000000000000001 0x0000000000002040
  18. 0x7fff7cf35ce8: 0x0000001800000000 0x000000620000005b
  19. 0x7fff7cf35cf8: 0x000000000000008f 0x000000000100000e
  20. 0x7fff7cf35d08: 0x0000000000000030 0x0000000000000001
  21. 0x7fff7cf35d18: 0x00007ffff628f550 0x0000000000000b4a
  22. 0x7fff7cf35d28: 0x0000000000000000 0x0000000000000000
  23. 0x7fff7cf35d38: 0x0000000000000000 0x0000000000000000
  24. 0x7fff7cf35d48: 0x0000000000000000 0x0000000000000000
  25. 0x7fff7cf35d58: 0x0000000000000000 0x0000000000000000
  26. 0x7fff7cf35d68: 0x0000000000000000 0x0000000000000000
  27. 0x0000000000000001:
  28. phdrJitEHInfo - sizeof(size_t) is num clauses, here is 1
  29. 0x0000000000002040:
  30. memeber from base class IMAGE_COR_ILMETHOD_SECT_FAT
  31. Kind = 0x40 = CorILMethod_Sect_FatFormat
  32. DataSize = 0x20 = 32 = 1 * sizeof(EE_ILEXCEPTION_CLAUSE)
  33. (lldb) p ((EE_ILEXCEPTION_CLAUSE*)(0x00007fff7cf35ce0+8))[0]
  34. (EE_ILEXCEPTION_CLAUSE) $29 = {
  35. Flags = COR_ILEXCEPTION_CLAUSE_NONE
  36. TryStartPC = 24
  37. TryEndPC = 91
  38. HandlerStartPC = 98
  39. HandlerEndPC = 143
  40. = (TypeHandle = 0x000000000100000e, ClassToken = 16777230, FilterOffset = 16777230)
  41. }
  42. (lldb) sos Token2EE * 0x000000000100000e
  43. Module: 00007fff7bc04000
  44. Assembly: System.Private.CoreLib.ni.dll
  45. <invalid module token>
  46. --------------------------------------
  47. Module: 00007fff7baf6e70
  48. Assembly: coreapp_jit.dll
  49. Token: 000000000100000E
  50. MethodTable: 00007fff7cc0dce8
  51. EEClass: 00007fff7bcb9400
  52. Name: mdToken: 0100000e (/home/ubuntu/git/coreapp_jitnew/bin/Release/netcoreapp1.1/ubuntu.16.04-x64/publish/coreapp_jit.dll)
  53. (lldb) dumpmt 00007fff7cc0dce8
  54. EEClass: 00007FFF7BCB9400
  55. Module: 00007FFF7BC04000
  56. Name: System.Exception
  57. mdToken: 0000000002000249
  58. File: /home/ubuntu/git/coreapp_jitnew/bin/Release/netcoreapp1.1/ubuntu.16.04-x64/publish/System.Private.CoreLib.ni.dll
  59. BaseSize: 0x98
  60. ComponentSize: 0x0
  61. Slots in VTable: 51
  62. Number of IFaces in IFaceMap: 2

可以看到EE_ILEXCEPTION_CLAUSE包含了try开始和结束的PC地址, handler开始和结束的PC地址, 和指向捕捉例外类型(或者filter函数)的指针.
CLR可以根据EHInfo知道例外抛出时应该调用哪个catch和finally.

GCInfo

phdrJitGCInfo是一个bit数组, 它的编码非常复杂, 这里我给出一个实际解析GCInfo的例子.

C#代码和汇编代码和上面的EHInfo一样, 使用LLDB分析可以得到:

  1. (lldb) p *codePtr
  2. (void *) $1 = 0x00007fff7cee3920
  3. (lldb) p *(CodeHeader*)(0x00007fff7cee3920-8)
  4. (CodeHeader) $2 = {
  5. pRealCodeHeader = 0x00007fff7cf29c78
  6. }
  7. (lldb) p *(_hpRealCodeHdr*)(0x00007fff7cf29c78)
  8. (_hpRealCodeHdr) $3 = {
  9. phdrDebugInfo = 0x0000000000000000
  10. phdrJitEHInfo = 0x00007fff7cf29ce0
  11. phdrJitGCInfo = 0x00007fff7cf29d28 "\x91\x81G"
  12. phdrMDesc = 0x00007fff7baed200
  13. nUnwindInfos = 2
  14. unwindInfos = {}
  15. }
  16. (lldb) me re -s8 -c20 -fx 0x00007fff7cf29d28
  17. 0x7fff7cf29d28: 0x1963d80000478191 0x171f412003325ca8
  18. 0x7fff7cf29d38: 0xee92864c5ffe0280 0x1c5c1c1f09bea536
  19. 0x7fff7cf29d48: 0xed8a93e5c6872932 0x00000000000000c4
  20. 0x7fff7cf29d58: 0x000000000000002a 0x0000000000000001
  21. 0x7fff7cf29d68: 0x00007ffff628f550 0x0000000000000b2e
  22. 0x7fff7cf29d78: 0x0000000000000000 0x0000000000000000
  23. 0x7fff7cf29d88: 0x0000000000000000 0x0000000000000000
  24. 0x7fff7cf29d98: 0x0000000000000000 0x0000000000000000
  25. 0x7fff7cf29da8: 0x0000000000000000 0x0000000000000000
  26. 0x7fff7cf29db8: 0x0000000000000000 0x0000000000000000

对bit数组的解析如下:

  1. 10001001
  2. 1: use fat encoding
  3. 0: no var arg
  4. 0: no security object
  5. 0: no gc cookie
  6. 1: have pspsym stack slot
  7. 0 0: no generic context parameter
  8. 1: have stack base register
  9. 1000000
  10. 1: wants report only leaf
  11. 0: no edit and continue preserved area
  12. 0: no reverse pinvoke frame
  13. 0 0 0 0: return kind is RT_Scalar
  14. 1'11100010
  15. 0 10001111: code length is 143
  16. 0000000
  17. 0 000000: pspsym stack slot is 0
  18. 0'0000000
  19. 0 000: stack base register is rbp (rbp is 5, normalize function will ^5 so it's 0)
  20. 0 000: size of stack outgoing and scratch area is 0
  21. 0'000110
  22. 0 00: 0 call sites
  23. 1 0 0 1: 2 interruptible ranges
  24. 11'11000
  25. 0 001111: interruptible range 1 begins from 15
  26. 110'10011000'000
  27. 1 001011 0 000001: interruptible range 1 finished at 91 (15 + 75 + 1)
  28. 10101'00
  29. 0 010101: interruptible range 2 begins from 112 (91 + 21)
  30. 111010'01001100
  31. 0 010111: interruptible range 2 finished at 136 (112 + 23 + 1)
  32. 1: have register slots
  33. 1 00 0 01: 4 register slots
  34. 110000
  35. 1: have stack slots
  36. 0 01: 1 tracked stack slots
  37. 0 0: 0 untracked stack slots
  38. 00'0000010
  39. 0 000: register slot 1 is rax(0)
  40. 00: register slot 1 flag is GC_SLOT_IS_REGISTER(8 & 0b11 = 0)
  41. 0 10: register slot 2 is rbx(3) (0 + 2 + 1)
  42. 0'10000
  43. 0 10: register slot 3 is rsi(6) (3 + 2 + 1)
  44. 0 00: register slot 4 is rdi(7) (6 + 0 + 1)
  45. 010'11111000
  46. 01: stack slot 1 base on GC_FRAMEREG_REL(2)
  47. 0 111110: stack slot 1 offset is -16 (-16 / 8 = -2)
  48. 00: stack slot 1 flag is GC_SLOT_BASE(0)
  49. 111 01000
  50. 111: num bits per pointer is 7
  51. 00000001
  52. 0 0000001: chunk 0's bit offset is 0 (1-1)
  53. 01000000: chunk 1's bit offset is 63 (64-1)
  54. 011111
  55. 011111: chunk 0 could be live slot list, simple format, all could live
  56. 11'111
  57. 11111: chunk 0 final state, all slot lives
  58. 1 1010'00
  59. 1 000101: transition of register slot 1(rax) at 0x14 (20 = 15 + 5), becomes live
  60. 110010'01100001
  61. 1 001001: transition of register slot 1(rax) at 0x18 (24 = 15 + 9), becomes dead
  62. 1 100001: transition of register slot 1(rax) at 0x30 (48 = 15 + 33), becomes live
  63. 01001001
  64. 0: terminator, no more transition of register slot 1(rax) in this chunk
  65. 1 100100: transition of register slot 2(rbx) at 0x33 (51 = 15 + 36), becomes live
  66. 01110111
  67. 0: terminator, no more transition of register slot 2(rbx) in this chunk
  68. 1 111110: transition of register slot 3(rsi) at 0x4d (77 = 15 + 62), becomes live
  69. 01101100
  70. 0: terminator, no more transition of register slot 3(rsi) in this chunk
  71. 1 001101: transition of register slot 4(rdi) at 0x1c (28 = 15 + 13), becomes live
  72. 1010010
  73. 1 010010: transition of register slot 4(rdi) at 0x21 (33 = 15 + 18), becomes dead
  74. 1'0111110
  75. 1 111110: transition of register slot 4(rdi) at 0x4d (77 = 15 + 62), becomes live
  76. 0: terminator, no more transition of register slot 4(rdi) in this chunk
  77. 1'1001000
  78. 1 001001: transition of stack slot 1(rbp-16) at 0x18 (24 = 15 + 9), becomes live
  79. 0: terminator, no more transition of stack slot 1(rbp-16) in this chunk
  80. 0'11111
  81. 0 11111: chunk 1 could be live slot list, simple format, all could live
  82. 000'00
  83. 00000: chunk 1 final state, all slot dead
  84. 111000'00
  85. 1 000011: transition of register slot 1(rax) at 0x52 (15 + 64 + 3), becomes dead
  86. 0: terminator, no more transition of register slot 1(rax) in this chunk
  87. 111010'00
  88. 1: 001011: transition of register slot 2(rbx) at 0x5a (15 + 64 + 11), becomes dead
  89. 0: terminator, no more transition of register slot 2(rbx) in this chunk
  90. 111000'01001100
  91. 1 000011: transition of register slot 3(rsi) at 0x52 (15 + 64 + 3), becomes dead
  92. 1 001100: transition of register slot 3(rsi) at 0x70 (0x70 + (64+12 - (0x5b-0xf))), becomes live
  93. 10010100
  94. 1 010100: transition of register slot 3(rsi) at 0x78 (0x70 + (64+20 - (0x5b-0xf))), becomes dead
  95. 0: terminator, no more transition of register slot 3(rsi) in this chunk
  96. 1110000
  97. 1: 000011: transition of register slot 4(rdi) at 0x52 (15 + 64 + 3), becomes dead
  98. 1'011000
  99. 1 000110: transition of register slot 4(rdi) at 0x55 (15 + 64 + 6), becomes live
  100. 11'10100
  101. 1 001011: transition of register slot 4(rdi) at 0x5a (15 + 64 + 11), becomes dead
  102. 111'1100
  103. 1: 001111: transition of register slot 4(rdi) at 0x73 (0x70 + (64+15 - (0x5b-0xf))), becomes live
  104. 1001'010
  105. 1 010100: transition of register slot 4(rdi) at 0x78 (0x70 + (64+20 - (0x5b-0xf))), becomes dead
  106. 10001'10
  107. 1 011000: transition of register slot 4(rdi) at 0x7c (0x70 + (64+24 - (0x5b-0xf))), becomes live
  108. 110111'00
  109. 1 011101: transition of register slot 4(rdi) at 0x81 (0x70 + (64+29 - (0x5b-0xf))), becomes dead
  110. 0: terminator, no more transition of register slot 4(rdi) in this chunk
  111. 100011'00
  112. 1 011000: transition of stack slot 1(rbp-16) at 0x7c (0x70 + (64+24 - (0x5b-0xf))), becomes dead
  113. 0: terminator, no more transition of stack slot 1(rbp-16) in this chunk

CLR在执行GC的时候, 会停止线程并得到当前停止的PC地址,
然后根据PC地址和Nibble Map获取到函数头,
再根据函数头中的GCInfo就可以获取到当前执行函数中有哪些栈地址和寄存器包含了根对象.

因为GCInfo记录了函数运行过程(可中断的部分)中的所有GC引用的位置和生命周期,
CoreCLR中需要使用这样复杂的编码来减少它的大小.

UnwindInfo

unwindInfos是一个长度为nUnwindInfos的数组, 类型是RUNTIME_FUNCTION.
nUnwindInfos的值等于主函数 + 小函数(funclet)的数量.
RUNTIME_FUNCTION中又保存了指向UNWIND_INFO的偏移值, UNWIND_INFO保存了函数对栈指针的操作.

这里我也给出一个实际分析的例子, 使用以下的C#代码:

  1. var x = GetString();
  2. try {
  3. Console.WriteLine(x);
  4. throw new Exception("abc");
  5. } catch (Exception ex) {
  6. Console.WriteLine(ex);
  7. Console.WriteLine(x);
  8. } finally {
  9. Console.WriteLine("finally");
  10. }

可以生成以下的汇编代码:

  1. G_M21556_IG01: ; func=00, offs=000000H, size=000FH, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
  2. IN001e: 000000 push rbp
  3. IN001f: 000001 push rbx
  4. IN0020: 000002 sub rsp, 24
  5. IN0021: 000006 lea rbp, [rsp+20H]
  6. IN0022: 00000B mov qword ptr [V06 rbp-20H], rsp
  7. G_M21556_IG02: ; offs=00000FH, size=0009H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
  8. IN0001: 00000F call ConsoleApplication.Program:GetString():ref
  9. IN0002: 000014 mov gword ptr [V01 rbp-10H], rax
  10. G_M21556_IG03: ; offs=000018H, size=0043H, gcVars=0000000000000001 {V01}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
  11. IN0003: 000018 mov rdi, gword ptr [V01 rbp-10H]
  12. IN0004: 00001C call System.Console:WriteLine(ref)
  13. IN0005: 000021 mov rdi, 0x7F94DDF9CCE8
  14. IN0006: 00002B call CORINFO_HELP_NEWSFAST
  15. IN0007: 000030 mov rbx, rax
  16. IN0008: 000033 mov edi, 1
  17. IN0009: 000038 mov rsi, 0x7F94DCE85E70
  18. IN000a: 000042 call CORINFO_HELP_STRCNS
  19. IN000b: 000047 mov rsi, rax
  20. IN000c: 00004A mov rdi, rbx
  21. IN000d: 00004D call System.Exception:.ctor(ref):this
  22. IN000e: 000052 mov rdi, rbx
  23. IN000f: 000055 call CORINFO_HELP_THROW
  24. IN0010: 00005A int3
  25. G_M21556_IG04: ; offs=00005BH, size=0001H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
  26. IN0011: 00005B nop
  27. G_M21556_IG05: ; offs=00005CH, size=0008H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
  28. IN0012: 00005C mov rdi, rsp
  29. IN0013: 00005F call G_M21556_IG11
  30. G_M21556_IG06: ; offs=000064H, size=0001H, nogc, emitadd
  31. IN0014: 000064 nop
  32. G_M21556_IG07: ; offs=000065H, size=0007H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, epilog, nogc
  33. IN0023: 000065 lea rsp, [rbp-08H]
  34. IN0024: 000069 pop rbx
  35. IN0025: 00006A pop rbp
  36. IN0026: 00006B ret
  37. G_M21556_IG08: ; func=01, offs=00006CH, size=000EH, gcVars=0000000000000001 {V01}, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, gcvars, byref, funclet prolog, nogc
  38. IN0027: 00006C push rbp
  39. IN0028: 00006D push rbx
  40. IN0029: 00006E push rax
  41. IN002a: 00006F mov rbp, qword ptr [rdi]
  42. IN002b: 000072 mov qword ptr [rsp], rbp
  43. IN002c: 000076 lea rbp, [rbp+20H]
  44. G_M21556_IG09: ; offs=00007AH, size=0018H, gcVars=0000000000000001 {V01}, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, gcvars, byref, isz
  45. IN0015: 00007A mov rdi, rsi
  46. IN0016: 00007D call System.Console:WriteLine(ref)
  47. IN0017: 000082 mov rdi, gword ptr [V01 rbp-10H]
  48. IN0018: 000086 call System.Console:WriteLine(ref)
  49. IN0019: 00008B lea rax, G_M21556_IG04
  50. G_M21556_IG10: ; offs=000092H, size=0007H, funclet epilog, nogc, emitadd
  51. IN002d: 000092 add rsp, 8
  52. IN002e: 000096 pop rbx
  53. IN002f: 000097 pop rbp
  54. IN0030: 000098 ret
  55. G_M21556_IG11: ; func=02, offs=000099H, size=000EH, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, funclet prolog, nogc
  56. IN0031: 000099 push rbp
  57. IN0032: 00009A push rbx
  58. IN0033: 00009B push rax
  59. IN0034: 00009C mov rbp, qword ptr [rdi]
  60. IN0035: 00009F mov qword ptr [rsp], rbp
  61. IN0036: 0000A3 lea rbp, [rbp+20H]
  62. G_M21556_IG12: ; offs=0000A7H, size=0013H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
  63. IN001a: 0000A7 mov rdi, 0x7F94C8001068
  64. IN001b: 0000B1 mov rdi, gword ptr [rdi]
  65. IN001c: 0000B4 call System.Console:WriteLine(ref)
  66. IN001d: 0000B9 nop
  67. G_M21556_IG13: ; offs=0000BAH, size=0007H, funclet epilog, nogc, emitadd
  68. IN0037: 0000BA add rsp, 8
  69. IN0038: 0000BE pop rbx
  70. IN0039: 0000BF pop rbp
  71. IN003a: 0000C0 ret

使用LLDB分析可以得到:

  1. (lldb) p *codePtr
  2. (void *) $0 = 0x00007fff7ceee920
  3. (lldb) p *(CodeHeader*)(0x00007fff7ceee920-8)
  4. (CodeHeader) $1 = {
  5. pRealCodeHeader = 0x00007fff7cf34c78
  6. }
  7. (lldb) p *(_hpRealCodeHdr*)(0x00007fff7cf34c78)
  8. (_hpRealCodeHdr) $2 = {
  9. phdrDebugInfo = 0x0000000000000000
  10. phdrJitEHInfo = 0x0000000000000000
  11. phdrJitGCInfo = 0x0000000000000000
  12. phdrMDesc = 0x00007fff7baf8200
  13. nUnwindInfos = 3
  14. unwindInfos = {}
  15. }
  16. (lldb) p ((_hpRealCodeHdr*)(0x00007fff7cf34c78))->unwindInfos[0]
  17. (RUNTIME_FUNCTION) $3 = (BeginAddress = 2304, EndAddress = 2412, UnwindData = 2500)
  18. (lldb) p ((_hpRealCodeHdr*)(0x00007fff7cf34c78))->unwindInfos[1]
  19. (RUNTIME_FUNCTION) $4 = (BeginAddress = 2412, EndAddress = 2457, UnwindData = 2516)
  20. (lldb) p ((_hpRealCodeHdr*)(0x00007fff7cf34c78))->unwindInfos[2]
  21. (RUNTIME_FUNCTION) $5 = (BeginAddress = 2457, EndAddress = 2497, UnwindData = 2532)
  22. first unwind info:
  23. (lldb) p (void*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2304)
  24. (void *) $13 = 0x00007fff7ceee920
  25. (lldb) p (void*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2412)
  26. (void *) $14 = 0x00007fff7ceee98c
  27. # range is [0, 0x6c)
  28. (lldb) p *(UNWIND_INFO*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2500)
  29. (UNWIND_INFO) $16 = {
  30. Version = '\x01'
  31. Flags = '\x03'
  32. SizeOfProlog = '\x06'
  33. CountOfUnwindCodes = '\x03'
  34. FrameRegister = '\0'
  35. FrameOffset = '\0'
  36. UnwindCode = {
  37. [0] = {
  38. = (CodeOffset = '\x06', UnwindOp = '\x02', OpInfo = '\x02')
  39. EpilogueCode = (OffsetLow = '\x06', UnwindOp = '\x02', OffsetHigh = '\x02')
  40. FrameOffset = 8710
  41. }
  42. }
  43. }
  44. (lldb) p ((UNWIND_INFO*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2500))->UnwindCode[0]
  45. (UNWIND_CODE) $17 = {
  46. = (CodeOffset = '\x06', UnwindOp = '\x02', OpInfo = '\x02')
  47. EpilogueCode = (OffsetLow = '\x06', UnwindOp = '\x02', OffsetHigh = '\x02')
  48. FrameOffset = 8710
  49. }
  50. (lldb) p ((UNWIND_INFO*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2500))->UnwindCode[1]
  51. (UNWIND_CODE) $18 = {
  52. = (CodeOffset = '\x02', UnwindOp = '\0', OpInfo = '\x03')
  53. EpilogueCode = (OffsetLow = '\x02', UnwindOp = '\0', OffsetHigh = '\x03')
  54. FrameOffset = 12290
  55. }
  56. (lldb) p ((UNWIND_INFO*)(((CEEJitInfo*)compiler->info.compCompHnd)->m_moduleBase + 2500))->UnwindCode[2]
  57. (UNWIND_CODE) $19 = {
  58. = (CodeOffset = '\x01', UnwindOp = '\0', OpInfo = '\x05')
  59. EpilogueCode = (OffsetLow = '\x01', UnwindOp = '\0', OffsetHigh = '\x05')
  60. FrameOffset = 20481
  61. }

上面的UNWIND_CODE可能有点难懂, 可以结合COMPlus_JitDump输出的信息分析:

  1. Unwind Info:
  2. >> Start offset : 0x000000 (not in unwind data)
  3. >> End offset : 0x00006c (not in unwind data)
  4. Version : 1
  5. Flags : 0x00
  6. SizeOfProlog : 0x06
  7. CountOfUnwindCodes: 3
  8. FrameRegister : none (0)
  9. FrameOffset : N/A (no FrameRegister) (Value=0)
  10. UnwindCodes :
  11. CodeOffset: 0x06 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 2 * 8 + 8 = 24 = 0x18
  12. CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3)
  13. CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbp (5)
  14. allocUnwindInfo(pHotCode=0x00007F94DE27E920, pColdCode=0x0000000000000000, startOffset=0x0, endOffset=0x6c, unwindSize=0xa, pUnwindBlock=0x0000000002029516, funKind=0 (main function))
  15. Unwind Info:
  16. >> Start offset : 0x00006c (not in unwind data)
  17. >> End offset : 0x000099 (not in unwind data)
  18. Version : 1
  19. Flags : 0x00
  20. SizeOfProlog : 0x03
  21. CountOfUnwindCodes: 3
  22. FrameRegister : none (0)
  23. FrameOffset : N/A (no FrameRegister) (Value=0)
  24. UnwindCodes :
  25. CodeOffset: 0x03 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 0 * 8 + 8 = 8 = 0x08
  26. CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3)
  27. CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbp (5)
  28. allocUnwindInfo(pHotCode=0x00007F94DE27E920, pColdCode=0x0000000000000000, startOffset=0x6c, endOffset=0x99, unwindSize=0xa, pUnwindBlock=0x0000000002029756, funKind=1 (handler))
  29. Unwind Info:
  30. >> Start offset : 0x000099 (not in unwind data)
  31. >> End offset : 0x0000c1 (not in unwind data)
  32. Version : 1
  33. Flags : 0x00
  34. SizeOfProlog : 0x03
  35. CountOfUnwindCodes: 3
  36. FrameRegister : none (0)
  37. FrameOffset : N/A (no FrameRegister) (Value=0)
  38. UnwindCodes :
  39. CodeOffset: 0x03 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 0 * 8 + 8 = 8 = 0x08
  40. CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3)
  41. CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbp (5)

以第一个RUNTIME_FUNCTION(主函数)为例, 它包含了3个UnwindCode, 分别记录了

  1. push rbp
  2. push rbx
  3. sub rsp, 24

CLR查找调用链的时候, 例如A => B => C需要知道C的调用者,
可以根据当前PC获取当前Frame的顶部 => 获取Return Address => 根据Return Address获取上一个Frame的顶部 => 循环得到所有调用者.
这个流程也叫Stack Walking(或Stack Crawling).

GC查找根对象时也需要根据Unwind信息查找调用链中的所有函数.

参考链接

https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-tutorial.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/ryujit-overview.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/porting-ryujit.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/building/viewing-jit-dumps.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/project-docs/clr-configuration-knobs.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/building/debugging-instructions.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/clr-abi.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/design-docs/finally-optimizations.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/design-docs/jit-call-morphing.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/type-system.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/type-loader.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/method-descriptor.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/botr/virtual-stub-dispatch.md
https://github.com/dotnet/coreclr/blob/release/1.1.0/Documentation/design-docs/jit-call-morphing.md
https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes(v=vs.110).aspx
https://www.microsoft.com/en-us/research/wp-content/uploads/2001/01/designandimplementationofgenerics.pdf
https://www.cs.rice.edu/~keith/EMBED/dom.pdf
https://www.usenix.org/legacy/events/vee05/full_papers/p132-wimmer.pdf
http://aakinshin.net/ru/blog/dotnet/typehandle/
https://en.wikipedia.org/wiki/List_of_CIL_instructions
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.arn0008c/index.html
http://wiki.osdev.org/X86-64_Instruction_Encoding
https://github.com/dotnet/coreclr/issues/12383
https://github.com/dotnet/coreclr/issues/14414
http://ref.x86asm.net/
https://www.onlinedisassembler.com/odaweb/

写在最后

这篇文章对CoreCLR中JIT的整个流程做出了更详细的分析,
但因为JIT中的代码实在太多, 我无法像分析GC的时候一样把代码全部贴出来, 有很多细节也无法顾及.
如果你对某个点有疑问, 可以试着参考我的JIT笔记, 也欢迎在文章下面留言你的问题.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/100704
推荐阅读
相关标签
  

闽ICP备14008679号