赞
踩
上面说了pin-table的多核启动方式,看似很繁琐,实际上并不复杂,无外乎主处理器唤醒从处理器到指定地址上去执行指令,说他简单是相对于功能来说的,因为他只是实现了从处理器的启动,仅此而已,所以,现在社区几乎很少使用spin-table这种方式,取而代之的是psci,他不仅可以启动从处理器,还可以关闭,挂起等其他核操作,现在基本上arm64平台上使用多核启动方式都是psci。下面我们来揭开他神秘的面纱,其实理解了spin-table的启动方式,psci并不难(说白了也是需要主处理器给从处理器一个启动地址,然后从处理器从这个地址执行指令,实际上比这要复杂的多)。
首先,我们先来看下设备树cpu节点对psci的支持:
arch/arm64/boot/dts/xxx.dtsi:
cpu0: cpu@0 {
device_type = "cpu";
compatible = "arm,armv8";
reg = <0x0>;
enable-method = "psci";
};
psci {
compatible = "arm,psci";
method = "smc";
cpu_suspend = <0xC4000001>;
cpu_off = <0x84000002>;
cpu_on = <0xC4000003>;
};
psci节点的详细说明可以参考内核文档:Documentation/devicetree/bindings/arm/psci.txt
可以看到现在enable-method 属性已经是psci,说明使用的多核启动方式是psci, 下面还有psci节点,用于psci驱动使用,method用于说明调用psci功能使用什么指令,可选有两个smc和hvc。其实smc, hvc和svc都是从低运行级别向高运行级别请求服务的指令,我们最常用的就是svc指令了,这是实现系统调用的指令。高级别的运行级别会根据传递过来的参数来决定提供什么样的服务。smc是用于陷入el3(安全), hvc用于陷入el2(虚拟化, 虚拟化场景中一般通过hvc指令陷入el2来请求唤醒vcpu), svc用于陷入el1(系统)。
注:本文只讲解smc陷入el3启动多核的情况。
下面开始分析源代码:
我们都知道armv8将异常等级分为el0 - el3,其中,el3为安全监控器,为了实现对它的支持,arm公司设计了一种firmware叫做ATF(ARM Trusted firmware),下面是atf源码readme.rst文件的一段介绍:
Trusted Firmware-A (TF-A) provides a reference implementation of
secure world software forArmv7-A and Armv8-A
, including a
Secure Monitor
executing at Exception Level 3 (EL3). It
implements various Arm interface standards, such as:
- The
Power State Coordination Interface (PSCI)
_- Trusted Board Boot Requirements (TBBR, Arm DEN0006C-1)
SMC Calling Convention
_System Control and Management Interface (SCMI)
_Software Delegated Exception Interface (SDEI)
_
ATF代码运行在EL3, 是实现安全相关的软件部分固件,其中会为其他特权级别提供服务,也就是说提供了在EL3中服务的手段,我们本文介绍的PSCI的实现就是在这里面,本文不会过多的讲解
(注:其实本文只会涉及到atf如何响应服务el1的smc发过来的psci的服务请求,仅此而已,有关ATF(Trustzone)请参考其他资料)。
下面从源代码角度分析服务的注册处理流程:
atf/bl31/aarch64/bl31_entrypoint.S: //架构相关
bl31_entrypoint
->el3_entrypoint_common
_exception_vectors=runtime_exceptions //设置el3的异常向量表
->bl bl31_early_platform_setup //跳转到平台早期设置
->bl bl31_plat_arch_setup //跳转到平台架构设置
-> bl bl31_main //跳转到bl31_main atf/bl31/aarch64/bl31_main.c:
->NOTICE("BL31: %s\n", version_string); //打印版本信息
->NOTICE("BL31: %s\n", build_message); //打印编译信息
->bl31_platform_setup //执行平台设置
-> /* Initialize the runtime services e.g. psci. */ 初始化运行时服务 如psci
INFO("BL31: Initializing runtime services\n") //打印log信息
->runtime_svc_init //调用各种运行时服务历程
...
下面的宏是用于注册运行时服务的接口,每种服务通过它来注册:
/* * Convenience macro to declare a service descriptor 定义运行时服务描述符结构的宏 */ #define DECLARE_RT_SVC(_name, _start, _end, _type, _setup, _smch) \ static const rt_svc_desc_t __svc_desc_ ## _name \ __section("rt_svc_descs") __used = { \ //结构放在rt_svc_descs段中 .start_oen = _start, \ .end_oen = _end, \ .call_type = _type, \ .name = #_name, \ .init = _setup, \ .handle = _smch } 链接脚本中: bl31/bl31.ld.S: ... .rodata . : { __RT_SVC_DESCS_START__ = .; rt_svc_descs段开始 KEEP(*(rt_svc_descs)) //rt_svc_descs段 __RT_SVC_DESCS_END__ = .; rt_svc_descs段结束 } ...
在标准的运行时服务中将服务初始化和处理函数放到rt_svc_descs段中,供调用。
services/std_svc/std_svc_setup.c:
DECLARE_RT_SVC(
std_svc,
OEN_STD_START,
OEN_STD_END,
SMC_TYPE_FAST,
std_svc_setup,//初始化
std_svc_smc_handler //处理
);
在runtime_svc_init函数中,调用每一个通过DECLARE_RT_SVC注册的服务,其中包括std_svc服务:
for (index = 0; index < RT_SVC_DECS_NUM; index++) {
rt_svc_desc_t *service = &rt_svc_descs[index];
...
rc = service->init(); //调用每一个注册的运行时服务的设置函数
...
}
std_svc_setup (主要关注设置psci操作集)
std_svc_setup //services/std_svc/std_svc_setup.c ->psci_setup //lib/psci/pegsci_setup.c ->plat_setup_psci_ops //设置平台的psci操作 调用平台的plat_setup_psci_ops函数去设置psci操作 eg:qemu平台 ->*psci_ops = &plat_qemu_psci_pm_ops; static const plat_psci_ops_t plat_qemu_psci_pm_ops = { .cpu_standby = qemu_cpu_standby, .pwr_domain_on = qemu_pwr_domain_on, .pwr_domain_off = qemu_pwr_domain_off, .pwr_domain_suspend = qemu_pwr_domain_suspend, .pwr_domain_on_finish = qemu_pwr_domain_on_finish, .pwr_domain_suspend_finish = qemu_pwr_domain_suspend_finish, .system_off = qemu_system_off, .system_reset = qemu_system_reset, .validate_power_state = qemu_validate_power_state, .validate_ns_entrypoint = qemu_validate_ns_entrypoint };
可以看到,在遍历每一个注册的运行时服务的时候,会导致std_svc_setup调用,其中会做psci操作集的设置,操作集中我们可以看到对核电源的管理的接口如:核上电,下电,挂起等,我们主要关注上电 .pwr_domain_on = qemu_pwr_domain_on,这个接口当我们主处理器boot从处理器的时候会用到。
smc指令触发进入el3异常向量表:
runtime_exceptions //el3的异常向量表 ->sync_exception_aarch64 ->handle_sync_exception ->smc_handler64 -> ¦* Populate the parameters for the SMC handler. ¦* We already have x0-x4 in place. x5 will point to a cookie (not used ¦* now). x6 will point to the context structure (SP_EL3) and x7 will ¦* contain flags we need to pass to the handler Hence save x5-x7. ¦* ¦* Note: x4 only needs to be preserved for AArch32 callers but we do it ¦* for AArch64 callers as well for convenience ¦*/ stp x4, x5, [sp, #CTX_GPREGS_OFFSET + CTX_GPREG_X4] //保存x4-x7到栈 stp x6, x7, [sp, #CTX_GPREGS_OFFSET + CTX_GPREG_X6] /* Save rest of the gpregs and sp_el0*/ save_x18_to_x29_sp_el0 mov x5, xzr //x5清零 mov x6, sp //sp保存在x6 /* Get the unique owning entity number */ //获得唯一的入口编号 ubfx x16, x0, #FUNCID_OEN_SHIFT, #FUNCID_OEN_WIDTH ubfx x15, x0, #FUNCID_TYPE_SHIFT, #FUNCID_TYPE_WIDTH orr x16, x16, x15, lsl #FUNCID_OEN_WIDTH adr x11, (__RT_SVC_DESCS_START__ + RT_SVC_DESC_HANDLE) /* Load descriptor index from array of indices */ adr x14, rt_svc_descs_indices //获得服务描述 标识数组 ldrb w15, [x14, x16] //根据唯一的入口编号 找到处理函数的 地址 /* ¦* Restore the saved C runtime stack value which will become the new ¦* SP_EL0 i.e. EL3 runtime stack. It was saved in the 'cpu_context' ¦* structure prior to the last ERET from EL3. ¦*/ ldr x12, [x6, #CTX_EL3STATE_OFFSET + CTX_RUNTIME_SP] /* ¦* Any index greater than 127 is invalid. Check bit 7 for ¦* a valid index ¦*/ tbnz w15, 7, smc_unknown /* Switch to SP_EL0 */ msr spsel, #0 /* ¦* Get the descriptor using the index ¦* x11 = (base + off), x15 = index ¦* ¦* handler = (base + off) + (index << log2(size)) ¦*/ lsl w10, w15, #RT_SVC_SIZE_LOG2 ldr x15, [x11, w10, uxtw] /* ¦* Save the SPSR_EL3, ELR_EL3, & SCR_EL3 in case there is a world ¦* switch during SMC handling. ¦* TODO: Revisit if all system registers can be saved later. ¦*/ mrs x16, spsr_el3 //spsr_el3保存在x16 mrs x17, elr_el3 //elr_el3保存在x17 mrs x18, scr_el3 //scr_el3保存在x18 stp x16, x17, [x6, #CTX_EL3STATE_OFFSET + CTX_SPSR_EL3] / x16, x17/保存在栈 str x18, [x6, #CTX_EL3STATE_OFFSET + CTX_SCR_EL3] //x18保存到栈 /* Copy SCR_EL3.NS bit to the flag to indicate caller's security */ bfi x7, x18, #0, #1 mov sp, x12 /* ¦* Call the Secure Monitor Call handler and then drop directly into ¦* el3_exit() which will program any remaining architectural state ¦* prior to issuing the ERET to the desired lower EL. ¦*/ #if DEBUG cbz x15, rt_svc_fw_critical_error #endif blr x15 //跳转到处理函数 b el3_exit //从el3退出会eret 回到el1(后面会讲到)
上面其实主要的是找到服务例程,然后跳转执行
下面是跳转的处理函数:
std_svc_smc_handler //services/std_svc/std_svc_setup.c ->ret = psci_smc_handler(smc_fid, x1, x2, x3, x4, ¦ cookie, handle, flags) ... } else { /* 64-bit PSCI function */ switch (smc_fid) { case PSCI_CPU_SUSPEND_AARCH64: ret = (u_register_t) psci_cpu_suspend((unsigned int)x1, x2, x3); break; case PSCI_CPU_ON_AARCH64: ret = (u_register_t)psci_cpu_on(x1, x2, x3); break; ... }
处理函数根据funid来决定服务,可以看到PSCI_CPU_ON_AARCH64为0xc4000003,这正是设备树中填写的cpu_on属性的id,会委托psci_cpu_on来执行核上电任务。
下面分析是重点:!!!
->psci_cpu_on() //lib/psci/psci_main.c
->psci_validate_entry_point() //验证入口地址有效性并 保存入口点到一个结构ep中
->psci_cpu_on_start(target_cpu, &ep) //ep入口地址
->psci_plat_pm_ops->pwr_domain_on(target_cpu)
->qemu_pwr_domain_on //实现核上电(平台实现)
/* Store the re-entry information for the non-secure world. */
->cm_init_context_by_index() //重点: 会通过cpu的编号找到 cpu上下文(cpu_context_t),存在cpu寄存器的值,异常返回的时候写写到对应的寄存器中,然后eret,旧返回到了el1!!!
->cm_setup_context() //设置cpu上下文
-> write_ctx_reg(state, CTX_SCR_EL3, scr_el3); //lib/el3_runtime/aarch64/context_mgmt.c
write_ctx_reg(state, CTX_ELR_EL3, ep->pc); //注: 异常返回时执行此地址 于是完成了cpu的启动!!!
write_ctx_reg(state, CTX_SPSR_EL3, ep->spsr);
psci_cpu_on主要完成开核的工作,然后会设置一些异常返回后寄存器的值(eg:从el1 -> el3 -> el1),重点关注 ep->pc写到cpu_context结构的CTX_ELR_EL3偏移处(从处理器启动后会从这个地址取指执行)。
实际上,所有的从处理器启动后都会从bl31_warm_entrypoint开始执行,在plat_setup_psci_ops中会设置(每个平台都有自己的启动地址寄存器,通过写这个寄存器来获得上电后执行的指令地址)。
大致说一下:主处理器通过smc进入el3请求开核服务,atf中会响应这种请求,通过平台的开核操作来启动从处理器并且设置从处理的一些寄存器eg:scr_el3、spsr_el3、elr_el3,然后主处理器,恢复现场,eret再次回到el1,而处理器开核之后会从bl31_warm_entrypoint开始执行,最后通过el3_exit返回到el1的elr_el3设置的地址。
分析到这atf的分析到此为止,atf中主要是响应内核的snc的请求,然后做开核处理,也就是实际的开核动作,但是从处理器最后还是要回到内核中执行,下面分析内核的处理:注意流程如下:
init/main.c
start_kernel
->boot_cpu_init //引导cpu初始化 设置引导cpu的位掩码 online active present possible都为true
->setup_arch // arch/arm64/kernel/setup.c
-> if (acpi_disabled) //不支持acpi
psci_dt_init(); //drivers/firmware/psci.c(psci主要文件) psci初始化 解析设备树 寻找psci匹配的节点
else
psci_acpi_init(); //acpi中允许使用psci情况
->rest_init
->kernel_init
->kernel_init_freeable
->smp_prepare_cpus //准备cpu,对于每个可能的cpu! cpu_ops[cpu]->cpu_prepare(cpu) 2.set_cpu_present(cpu, true) cpu处于present状态
->do_pre_smp_initcalls //多核启动之前的调用initcall回调
->smp_init //smp初始化 kernel/smp.c 会启动其他从处理器
我们主要关注两个函数:psci_dt_init和smp_init
psci_dt_init是解析设备树,设置操作函数,smp_init用于启动从处理器。
static const struct of_device_id psci_of_match[] __initconst = { { .compatible = "arm,psci", .data = psci_0_1_init}, { .compatible = "arm,psci-0.2", .data = psci_0_2_init}, { .compatible = "arm,psci-1.0", .data = psci_1_0_init}, {}, }; int __init psci_dt_init(void) { struct device_node *np; const struct of_device_id *matched_np; psci_initcall_t init_fn; int ret; np = of_find_matching_node_and_match(NULL, psci_of_match, &matched_np); if (!np || !of_device_is_available(np)) return -ENODEV; init_fn = (psci_initcall_t)matched_np->data; ret = init_fn(np); of_node_put(np); return ret; }
以设备树中compatible = "arm,psci"为例
->psci_0_1_init() //设备树中compatible = "arm,psci"为例 ->get_set_conduit_method() //根据设备树method属性设置 invoke_psci_fn = __invoke_psci_fn_smc; (method="smc") -> invoke_psci_fn = __invoke_psci_fn_smc -> if (!of_property_read_u32(np, "cpu_on", &id)) { psci_function_id[PSCI_FN_CPU_ON] = id; psci_ops.cpu_on = psci_cpu_on; //设置psci操作的开核接口 } ->psci_cpu_on() ->invoke_psci_fn() ->__invoke_psci_fn_smc() -> arm_smccc_smc(function_id, arg0, arg1, arg2, 0, 0, 0, 0, &res) //这个时候x0=function_id x1=arg0, x2=arg1, x3arg2,... ->__arm_smccc_smc() ->SMCCC smc //arch/arm64/kernel/smccc-call.S -> .macro SMCCC instr .cfi_startproc \instr #0 //即是smc #0 陷入到el3 ldr x4, [sp] stp x0, x1, [x4, #ARM_SMCCC_RES_X0_OFFS] stp x2, x3, [x4, #ARM_SMCCC_RES_X2_OFFS] ldr x4, [sp, #8] cbz x4, 1f /* no quirk structure */ ldr x9, [x4, #ARM_SMCCC_QUIRK_ID_OFFS] cmp x9, #ARM_SMCCC_QUIRK_QCOM_A6 b.ne 1f str x6, [x4, ARM_SMCCC_QUIRK_STATE_OFFS] 1: ret .cfi_endproc .endm
smp_init函数做从处理器启动:
start_kernel ->arch_call_rest_init ->rest_init ->kernel_init, ->kernel_init_freeable ->smp_prepare_cpus //arch/arm64/kernel/smp.c ->smp_init //kernel/smp.c (这是从处理器启动的函数) ->cpu_up ->do_cpu_up ->_cpu_up ->cpuhp_up_callbacks ->cpuhp_invoke_callback ->cpuhp_hp_states[CPUHP_BRINGUP_CPU] ->bringup_cpu ->__cpu_up //arch/arm64/kernel/smp.c ->boot_secondary ->cpu_ops[cpu]->cpu_boot(cpu) ->cpu_psci_ops.cpu_boot ->cpu_psci_cpu_boot //arch/arm64/kernel/psci.c static int cpu_psci_cpu_boot(unsigned int cpu) { int err = psci_ops.cpu_on(cpu_logical_map(cpu), __pa_symbol(secondary_entry)); if (err) pr_err("failed to boot CPU%d (%d)\n", cpu, err); return err; }
启动从处理的时候最终调用到psci的cpu操作集的cpu_psci_cpu_boot函数,会调用上面的psci_cpu_on,最终调用smc,传递第一个参数为cpu的id标识启动哪个cpu,第二个参数为从处理器启动后进入内核执行的地址secondary_entry(这是个物理地址)。
所以综上,最后smc调用时传递的参数为arm_smccc_smc(0xC4000003, cpuid, secondary_entry, arg2, 0, 0, 0, 0, &res)。
这样陷入el3之后,就可以启动对应的从处理器,最终从处理器回到内核(el3->el1),执行secondary_entry处指令,从处理器启动完成。
可以发现psci的方式启动从处理器的方式相当复杂,这里面涉及到了el1到安全的el3的跳转,而且涉及到大量的函数回调,很容易绕晕。
下面给出psci方式多核启动图示:
无论是spin-table还是psci,从处理器启动进入内核之后都会执行secondary_startup:
secondary_startup: /* ¦* Common entry point for secondary CPUs. ¦*/ bl __cpu_secondary_check52bitva bl __cpu_setup // initialise processor adrp x1, swapper_pg_dir //设置内核主页表 bl __enable_mmu //使能mmu ldr x8, =__secondary_switched br x8 ENDPROC(secondary_startup) || || || \/ __secondary_switched: adr_l x5, vectors //设置从处理器的异常向量表 msr vbar_el1, x5 isb //指令同步屏障 保证屏障前面的指令执行完 adr_l x0, secondary_data //获得主处理器传递过来的从处理器数据 ldr x1, [x0, #CPU_BOOT_STACK] // get secondary_data.stack 获得栈地址 mov sp, x1 //设置到从处理器的sp ldr x2, [x0, #CPU_BOOT_TASK] //获得从处理器的tsk idle进程的tsk结构, msr sp_el0, x2 //保存在sp_el0 arm64使用sp_el0保存当前进程的tsk结构 mov x29, #0 //fp清0 mov x30, #0 //lr清0 b secondary_start_kernel //跳转到c程序 继续执行从处理器初始化 ENDPROC(__secondary_switched)
__cpu_up中设置了secondary_data结构中的一些成员:
arch/arm64/kernel/smp.c: int __cpu_up(unsigned int cpu, struct task_struct *idle) { int ret; long status; /* ¦* We need to tell the secondary core where to find its stack and the ¦* page tables. ¦*/ secondary_data.task = idle; //执行的进程描述符 secondary_data.stack = task_stack_page(idle) + THREAD_SIZE; //栈地址 THREAD_SIZE=16k update_cpu_boot_status(CPU_MMU_OFF); __flush_dcache_area(&secondary_data, sizeof(secondary_data)); /* ¦* Now bring the CPU into our world. ¦*/ ret = boot_secondary(cpu, idle);
跳转到secondary_start_kernel这个C函数继续执行初始化:
/* * This is the secondary CPU boot entry. We're using this CPUs * idle thread stack, but a set of temporary page tables. */ asmlinkage notrace void secondary_start_kernel(void) { u64 mpidr = read_cpuid_mpidr() & MPIDR_HWID_BITMASK; struct mm_struct *mm = &init_mm; const struct cpu_operations *ops; unsigned int cpu; cpu = task_cpu(current); set_my_cpu_offset(per_cpu_offset(cpu)); /* * All kernel threads share the same mm context; grab a * reference and switch to it. */ mmgrab(mm); //init_mm的引用计数加1 current->active_mm = mm; //设置idle借用的mm结构 /* * TTBR0 is only used for the identity mapping at this stage. Make it * point to zero page to avoid speculatively fetching new entries. */ cpu_uninstall_idmap(); if (system_uses_irq_prio_masking()) init_gic_priority_masking(); rcu_cpu_starting(cpu); preempt_disable(); //禁止内核抢占 trace_hardirqs_off(); /* * If the system has established the capabilities, make sure * this CPU ticks all of those. If it doesn't, the CPU will * fail to come online. */ check_local_cpu_capabilities(); ops = get_cpu_ops(cpu); if (ops->cpu_postboot) ops->cpu_postboot(); /* * Log the CPU info before it is marked online and might get read. */ cpuinfo_store_cpu(); //存储cpu信息 /* * Enable GIC and timers. */ notify_cpu_starting(cpu); //使能gic和timer ipi_setup(cpu); store_cpu_topology(cpu); //保存cpu拓扑 numa_add_cpu(cpu); //numa添加cpu /* * OK, now it's safe to let the boot CPU continue. Wait for * the CPU migration code to notice that the CPU is online * before we continue. */ pr_info("CPU%u: Booted secondary processor 0x%010lx [0x%08x]\n", cpu, (unsigned long)mpidr, read_cpuid_id()); update_cpu_boot_status(CPU_BOOT_SUCCESS); set_cpu_online(cpu, true); //设置cpu状态为online complete(&cpu_running); //唤醒主处理器的 完成等待函数,继续启动下一个从处理器 local_daif_restore(DAIF_PROCCTX); //从处理器继续往下执行 /* * OK, it's off to the idle thread for us */ cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); //idle进程进入idle状态 }
实际上,可以看的当从处理器启动到内核的时候,他们也需要设置异常向量表,设置mmu等,然后执行各自的idle进程(这些都是一些处理器强相关的初始化代码,一些通用的初始化都已经被主处理器初始化完),当cpu负载均衡的时候会放置一些进程到这些从处理器,然后进程就可以再这些从处理器上欢快的运行。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。