上次在写init和zygote的时候,说起过,SystemServer是Android系统的系统服务模块,主要功能是管理Android的system service。system_server进程是zygote进程通过fork方法创造出来的第一个子进程,而且当system_server进程启动失败时会导致zygote进程自杀重启。今天,看一下SystemServer的启动过程。
- if (startSystemServer) {
- Runnable r = forkSystemServer(abiList, socketName, zygoteServer);
- // {@code r == null} in the parent (zygote) process, and {@code r != null} in the
- // child (system_server) process.
- if (r != null) {
- r.run();
- return;
- }
- }
- /**
- * Prepare the arguments and forks for the system server process.
- *
- * Returns an {@code Runnable} that provides an entrypoint into system_server code in the
- * child process, and {@code null} in the parent.
- */
- private static Runnable forkSystemServer(String abiList, String socketName,
- ZygoteServer zygoteServer) {
- long capabilities = posixCapabilitiesAsBits(
- OsConstants.CAP_IPC_LOCK,
- OsConstants.CAP_KILL,
- OsConstants.CAP_NET_ADMIN,
- OsConstants.CAP_NET_RAW,
- OsConstants.CAP_SYS_MODULE,
- OsConstants.CAP_SYS_NICE,
- OsConstants.CAP_SYS_PTRACE,
- OsConstants.CAP_SYS_TIME,
- OsConstants.CAP_WAKE_ALARM,
- );
- /* Containers run without some capabilities, so drop any caps that are not available. */
- StructCapUserHeader header = new StructCapUserHeader(
- StructCapUserData[] data;
- try {
- data = Os.capget(header);
- } catch (ErrnoException ex) {
- throw new RuntimeException("Failed to capget()", ex);
- }
- capabilities &= ((long) data[0].effective) | (((long) data[1].effective) << 32);
- /* Hardcoded command line to start the system server */
- String args[] = {
- "--setuid=1000",
- "--setgid=1000",
- "--setgroups=1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1018,1021,1023,1024,1032,1065,3001,3002,3003,3006,3007,3009,3010",
- "--capabilities=" + capabilities + "," + capabilities,
- "--nice-name=system_server",
- "--runtime-args",
- "--target-sdk-version=" + VMRuntime.SDK_VERSION_CUR_DEVELOPMENT,
- "com.android.server.SystemServer",
- };
- ZygoteConnection.Arguments parsedArgs = null;
- int pid;
- try {
- parsedArgs = new ZygoteConnection.Arguments(args);
- ZygoteConnection.applyDebuggerSystemProperty(parsedArgs);
- ZygoteConnection.applyInvokeWithSystemProperty(parsedArgs);
- boolean profileSystemServer = SystemProperties.getBoolean(
- "dalvik.vm.profilesystemserver", false);
- if (profileSystemServer) {
- parsedArgs.runtimeFlags |= Zygote.PROFILE_SYSTEM_SERVER;
- }
- /* Request to fork the system server process */
- pid = Zygote.forkSystemServer(
- parsedArgs.uid, parsedArgs.gid,
- parsedArgs.gids,
- parsedArgs.runtimeFlags,
- null,
- parsedArgs.permittedCapabilities,
- parsedArgs.effectiveCapabilities);
- } catch (IllegalArgumentException ex) {
- throw new RuntimeException(ex);
- }
- /* For child process */
- if (pid == 0) {
- if (hasSecondZygote(abiList)) {
- waitForSecondaryZygote(socketName);
- }
- zygoteServer.closeServerSocket();
- return handleSystemServerProcess(parsedArgs);
- }
- return null;
- }
我们把方法前的英文注释翻译一下:为system server进程准备参数和forks。在子进程中返回一个提供进入system server代码入口点的Runnable并且在父进程中返回null。
(3)硬编码命令行启动system server
(4)发送fork一个system server 子进程的请求:Zygote.forkSystemServer
(5)如果pid是0,说明system server启动成功。关闭socket,并且做一些启动system server剩余的工作:handleSystemServerProcess(parsedArgs)
- /**
- * Special method to start the system server process. In addition to the
- * common actions performed in forkAndSpecialize, the pid of the child
- * process is recorded such that the death of the child process will cause
- * zygote to exit.
- *
- * @param uid the UNIX uid that the new process should setuid() to after
- * fork()ing and and before spawning any threads.
- * @param gid the UNIX gid that the new process should setgid() to after
- * fork()ing and and before spawning any threads.
- * @param gids null-ok; a list of UNIX gids that the new process should
- * setgroups() to after fork and before spawning any threads.
- * @param runtimeFlags bit flags that enable ART features.
- * @param rlimits null-ok an array of rlimit tuples, with the second
- * dimension having a length of 3 and representing
- * (resource, rlim_cur, rlim_max). These are set via the posix
- * setrlimit(2) call.
- * @param permittedCapabilities argument for setcap()
- * @param effectiveCapabilities argument for setcap()
- *
- * @return 0 if this is the child, pid of the child
- * if this is the parent, or -1 on error.
- */
- public static int forkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
- int[][] rlimits, long permittedCapabilities, long effectiveCapabilities) {
- VM_HOOKS.preFork();
- // Resets nice priority for zygote process.
- resetNicePriority();
- int pid = nativeForkSystemServer(
- uid, gid, gids, runtimeFlags, rlimits, permittedCapabilities, effectiveCapabilities);
- // Enable tracing as soon as we enter the system_server.
- if (pid == 0) {
- Trace.setTracingEnabled(true, runtimeFlags);
- }
- VM_HOOKS.postForkCommon();
- return pid;
- }
- native private static int nativeForkSystemServer(int uid, int gid, int[] gids, int runtimeFlags,
- int[][] rlimits, long permittedCapabilities, long effectiveCapabilities);
我们还是看一下方法的注释:创建system server的特殊方法。除了在forkandspecialize方法中执行的常见操作外,还记录了子进程的PID,以便子进程消亡的时候让zygote进程退出。这个方法最后调用了一个native方法:nativeForkSystemServer。
- static jint com_android_internal_os_Zygote_nativeForkSystemServer(
- JNIEnv* env, jclass, uid_t uid, gid_t gid, jintArray gids,
- jint runtime_flags, jobjectArray rlimits, jlong permittedCapabilities,
- jlong effectiveCapabilities) {
- pid_t pid = ForkAndSpecializeCommon(env, uid, gid, gids,
- runtime_flags, rlimits,
- permittedCapabilities, effectiveCapabilities,
- NULL, false, NULL, NULL);
- if (pid > 0) {
- // The zygote process checks whether the child process has died or not.
- ALOGI("System server process %d has been created", pid);
- gSystemServerPid = pid;
- // There is a slight window that the system server process has crashed
- // but it went unnoticed because we haven't published its pid yet. So
- // we recheck here just to make sure that all is well.
- int status;
- if (waitpid(pid, &status, WNOHANG) == pid) {
- ALOGE("System server process %d has died. Restarting Zygote!", pid);
- RuntimeAbort(env, __LINE__, "System server process has died. Restarting Zygote!");
- }
- // Assign system_server to the correct memory cgroup.
- // Not all devices mount /dev/memcg so check for the file first
- // to avoid unnecessarily printing errors and denials in the logs.
- if (!access("/dev/memcg/system/tasks", F_OK) &&
- !WriteStringToFile(StringPrintf("%d", pid), "/dev/memcg/system/tasks")) {
- ALOGE("couldn't write %d to /dev/memcg/system/tasks", pid);
- }
- }
- return pid;
- }
这是一个非常长的方法,我们可以看到代码分为两个逻辑,一个是if(pid ==0),另一个是if(pid>0)。前面我们说到过,fork进程成功后,返回两次,子进程返回0,父进程返回子进程的id。到此system_server进程已完成了创建的所有工作。
- // Utility routine to fork zygote and specialize the child process.
- static pid_t ForkAndSpecializeCommon(JNIEnv* env, uid_t uid, gid_t gid, jintArray javaGids,
- jint runtime_flags, jobjectArray javaRlimits,
- jlong permittedCapabilities, jlong effectiveCapabilities,
- jint mount_external,
- jstring java_se_info, jstring java_se_name,
- bool is_system_server, jintArray fdsToClose,
- jintArray fdsToIgnore, bool is_child_zygote,
- jstring instructionSet, jstring dataDir) {
- SetSignalHandlers();
- sigset_t sigchld;
- sigemptyset(&sigchld);
- sigaddset(&sigchld, SIGCHLD);
- auto fail_fn = [env, java_se_name, is_system_server](const std::string& msg)
- __attribute__ ((noreturn)) {
- const char* se_name_c_str = nullptr;
- std::unique_ptr<ScopedUtfChars> se_name;
- if (java_se_name != nullptr) {
- se_name.reset(new ScopedUtfChars(env, java_se_name));
- se_name_c_str = se_name->c_str();
- }
- if (se_name_c_str == nullptr && is_system_server) {
- se_name_c_str = "system_server";
- }
- const std::string& error_msg = (se_name_c_str == nullptr)
- ? msg
- : StringPrintf("(%s) %s", se_name_c_str, msg.c_str());
- env->FatalError(error_msg.c_str());
- __builtin_unreachable();
- };
- // Temporarily block SIGCHLD during forks. The SIGCHLD handler might
- // log, which would result in the logging FDs we close being reopened.
- // This would cause failures because the FDs are not whitelisted.
- //
- // Note that the zygote process is single threaded at this point.
- if (sigprocmask(SIG_BLOCK, &sigchld, nullptr) == -1) {
- fail_fn(CREATE_ERROR("sigprocmask(SIG_SETMASK, { SIGCHLD }) failed: %s", strerror(errno)));
- }
- // Close any logging related FDs before we start evaluating the list of
- // file descriptors.
- __android_log_close();
- std::string error_msg;
- // If this is the first fork for this zygote, create the open FD table.
- // If it isn't, we just need to check whether the list of open files has
- // changed (and it shouldn't in the normal case).
- std::vector<int> fds_to_ignore;
- if (!FillFileDescriptorVector(env, fdsToIgnore, &fds_to_ignore, &error_msg)) {
- fail_fn(error_msg);
- }
- if (gOpenFdTable == NULL) {
- gOpenFdTable = FileDescriptorTable::Create(fds_to_ignore, &error_msg);
- if (gOpenFdTable == NULL) {
- fail_fn(error_msg);
- }
- } else if (!gOpenFdTable->Restat(fds_to_ignore, &error_msg)) {
- fail_fn(error_msg);
- }
- pid_t pid = fork();
- if (pid == 0) {
- PreApplicationInit();
- // Clean up any descriptors which must be closed immediately
- if (!DetachDescriptors(env, fdsToClose, &error_msg)) {
- fail_fn(error_msg);
- }
- // Re-open all remaining open file descriptors so that they aren't shared
- // with the zygote across a fork.
- if (!gOpenFdTable->ReopenOrDetach(&error_msg)) {
- fail_fn(error_msg);
- }
- if (sigprocmask(SIG_UNBLOCK, &sigchld, nullptr) == -1) {
- fail_fn(CREATE_ERROR("sigprocmask(SIG_SETMASK, { SIGCHLD }) failed: %s", strerror(errno)));
- }
- // Keep capabilities across UID change, unless we're staying root.
- if (uid != 0) {
- if (!EnableKeepCapabilities(&error_msg)) {
- fail_fn(error_msg);
- }
- }
- if (!SetInheritable(permittedCapabilities, &error_msg)) {
- fail_fn(error_msg);
- }
- if (!DropCapabilitiesBoundingSet(&error_msg)) {
- fail_fn(error_msg);
- }
- bool use_native_bridge = !is_system_server && (instructionSet != NULL)
- && android::NativeBridgeAvailable();
- if (use_native_bridge) {
- ScopedUtfChars isa_string(env, instructionSet);
- use_native_bridge = android::NeedsNativeBridge(isa_string.c_str());
- }
- if (use_native_bridge && dataDir == NULL) {
- // dataDir should never be null if we need to use a native bridge.
- // In general, dataDir will never be null for normal applications. It can only happen in
- // special cases (for isolated processes which are not associated with any app). These are
- // launched by the framework and should not be emulated anyway.
- use_native_bridge = false;
- ALOGW("Native bridge will not be used because dataDir == NULL.");
- }
- if (!MountEmulatedStorage(uid, mount_external, use_native_bridge, &error_msg)) {
- ALOGW("Failed to mount emulated storage: %s (%s)", error_msg.c_str(), strerror(errno));
- if (errno == ENOTCONN || errno == EROFS) {
- // When device is actively encrypting, we get ENOTCONN here
- // since FUSE was mounted before the framework restarted.
- // When encrypted device is booting, we get EROFS since
- // FUSE hasn't been created yet by init.
- // In either case, continue without external storage.
- } else {
- fail_fn(error_msg);
- }
- }
- // If this zygote isn't root, it won't be able to create a process group,
- // since the directory is owned by root.
- if (!is_system_server && getuid() == 0) {
- int rc = createProcessGroup(uid, getpid());
- if (rc != 0) {
- if (rc == -EROFS) {
- ALOGW("createProcessGroup failed, kernel missing CONFIG_CGROUP_CPUACCT?");
- } else {
- ALOGE("createProcessGroup(%d, %d) failed: %s", uid, pid, strerror(-rc));
- }
- }
- }
- std::string error_msg;
- if (!SetGids(env, javaGids, &error_msg)) {
- fail_fn(error_msg);
- }
- if (!SetRLimits(env, javaRlimits, &error_msg)) {
- fail_fn(error_msg);
- }
- if (use_native_bridge) {
- ScopedUtfChars isa_string(env, instructionSet);
- ScopedUtfChars data_dir(env, dataDir);
- android::PreInitializeNativeBridge(data_dir.c_str(), isa_string.c_str());
- }
- int rc = setresgid(gid, gid, gid);
- if (rc == -1) {
- fail_fn(CREATE_ERROR("setresgid(%d) failed: %s", gid, strerror(errno)));
- }
- // Must be called when the new process still has CAP_SYS_ADMIN, in this case, before changing
- // uid from 0, which clears capabilities. The other alternative is to call
- // prctl(PR_SET_NO_NEW_PRIVS, 1) afterward, but that breaks SELinux domain transition (see
- // b/71859146). As the result, privileged syscalls used below still need to be accessible in
- // app process.
- SetUpSeccompFilter(uid);
- rc = setresuid(uid, uid, uid);
- if (rc == -1) {
- fail_fn(CREATE_ERROR("setresuid(%d) failed: %s", uid, strerror(errno)));
- }
- if (NeedsNoRandomizeWorkaround()) {
- // Work around ARM kernel ASLR lossage (http://b/5817320).
- int old_personality = personality(0xffffffff);
- int new_personality = personality(old_personality | ADDR_NO_RANDOMIZE);
- if (new_personality == -1) {
- ALOGW("personality(%d) failed: %s", new_personality, strerror(errno));
- }
- }
- if (!SetCapabilities(permittedCapabilities, effectiveCapabilities, permittedCapabilities,
- &error_msg)) {
- fail_fn(error_msg);
- }
- if (!SetSchedulerPolicy(&error_msg)) {
- fail_fn(error_msg);
- }
- const char* se_info_c_str = NULL;
- ScopedUtfChars* se_info = NULL;
- if (java_se_info != NULL) {
- se_info = new ScopedUtfChars(env, java_se_info);
- se_info_c_str = se_info->c_str();
- if (se_info_c_str == NULL) {
- fail_fn("se_info_c_str == NULL");
- }
- }
- const char* se_name_c_str = NULL;
- ScopedUtfChars* se_name = NULL;
- if (java_se_name != NULL) {
- se_name = new ScopedUtfChars(env, java_se_name);
- se_name_c_str = se_name->c_str();
- if (se_name_c_str == NULL) {
- fail_fn("se_name_c_str == NULL");
- }
- }
- rc = selinux_android_setcontext(uid, is_system_server, se_info_c_str, se_name_c_str);
- if (rc == -1) {
- fail_fn(CREATE_ERROR("selinux_android_setcontext(%d, %d, \"%s\", \"%s\") failed", uid,
- is_system_server, se_info_c_str, se_name_c_str));
- }
- // Make it easier to debug audit logs by setting the main thread's name to the
- // nice name rather than "app_process".
- if (se_name_c_str == NULL && is_system_server) {
- se_name_c_str = "system_server";
- }
- if (se_name_c_str != NULL) {
- SetThreadName(se_name_c_str);
- }
- delete se_info;
- delete se_name;
- // Unset the SIGCHLD handler, but keep ignoring SIGHUP (rationale in SetSignalHandlers).
- UnsetChldSignalHandler();
- env->CallStaticVoidMethod(gZygoteClass, gCallPostForkChildHooks, runtime_flags,
- is_system_server, is_child_zygote, instructionSet);
- if (env->ExceptionCheck()) {
- fail_fn("Error calling post fork hooks.");
- }
- } else if (pid > 0) {
- // the parent process
- // We blocked SIGCHLD prior to a fork, we unblock it here.
- if (sigprocmask(SIG_UNBLOCK, &sigchld, nullptr) == -1) {
- fail_fn(CREATE_ERROR("sigprocmask(SIG_SETMASK, { SIGCHLD }) failed: %s", strerror(errno)));
- }
- }
- return pid;
- }
上面说到,system server启动成功后,需要做一些剩余的工作。通过上面的流程,system server已经启动成功。剩余的工作就在handleSystemServerProcess(parsedArgs)方法中执行。下面是handleSystemServerProcess的源码:
- /**
- * Finish remaining work for the newly forked system server process.
- */
- private static Runnable handleSystemServerProcess(ZygoteConnection.Arguments parsedArgs) {
- // set umask to 0077 so new files and directories will default to owner-only permissions.
- Os.umask(S_IRWXG | S_IRWXO);
- if (parsedArgs.niceName != null) {
- Process.setArgV0(parsedArgs.niceName);
- }
- final String systemServerClasspath = Os.getenv("SYSTEMSERVERCLASSPATH");
- if (systemServerClasspath != null) {
- performSystemServerDexOpt(systemServerClasspath);
- // Capturing profiles is only supported for debug or eng builds since selinux normally
- // prevents it.
- boolean profileSystemServer = SystemProperties.getBoolean(
- "dalvik.vm.profilesystemserver", false);
- if (profileSystemServer && (Build.IS_USERDEBUG || Build.IS_ENG)) {
- try {
- prepareSystemServerProfile(systemServerClasspath);
- } catch (Exception e) {
- Log.wtf(TAG, "Failed to set up system server profile", e);
- }
- }
- }
- if (parsedArgs.invokeWith != null) {
- String[] args = parsedArgs.remainingArgs;
- // If we have a non-null system server class path, we'll have to duplicate the
- // existing arguments and append the classpath to it. ART will handle the classpath
- // correctly when we exec a new process.
- if (systemServerClasspath != null) {
- String[] amendedArgs = new String[args.length + 2];
- amendedArgs[0] = "-cp";
- amendedArgs[1] = systemServerClasspath;
- System.arraycopy(args, 0, amendedArgs, 2, args.length);
- args = amendedArgs;
- }
- WrapperInit.execApplication(parsedArgs.invokeWith,
- parsedArgs.niceName, parsedArgs.targetSdkVersion,
- VMRuntime.getCurrentInstructionSet(), null, args);
- throw new IllegalStateException("Unexpected return from WrapperInit.execApplication");
- } else {
- ClassLoader cl = null;
- if (systemServerClasspath != null) {
- cl = createPathClassLoader(systemServerClasspath, parsedArgs.targetSdkVersion);
- Thread.currentThread().setContextClassLoader(cl);
- }
- /*
- * Pass the remaining arguments to SystemServer.
- */
- return ZygoteInit.zygoteInit(parsedArgs.targetSdkVersion, parsedArgs.remainingArgs, cl);
- }
- /* should never reach here */
- }
(2)准备system server的配置文件。selinux通常不支持获取配置文件,因此,只能在debug或者eng版本获取。
(3)传递剩余的参数给system server。通过ZygoteInit.zygoteInit()方法。
在handleSystemServerProcess最后一步工作中,通过ZygoteInit.zygoteInit方法传递剩余的参数给system server。那么zygoteInit究竟做了什么?我们先把源码贴出来:
- public static final Runnable zygoteInit(int targetSdkVersion, String[] argv, ClassLoader classLoader) {
- if (RuntimeInit.DEBUG) {
- Slog.d(RuntimeInit.TAG, "RuntimeInit: Starting application from zygote");
- }
- Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "ZygoteInit");
- RuntimeInit.redirectLogStreams();
- RuntimeInit.commonInit();
- ZygoteInit.nativeZygoteInit();
- return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader);
- }
- protected static Runnable applicationInit(int targetSdkVersion, String[] argv,
- ClassLoader classLoader) {
- // If the application calls System.exit(), terminate the process
- // immediately without running any shutdown hooks. It is not possible to
- // shutdown an Android application gracefully. Among other things, the
- // Android runtime shutdown hooks close the Binder driver, which can cause
- // leftover running threads to crash before the process actually exits.
- nativeSetExitWithoutCleanup(true);
- // We want to be fairly aggressive about heap utilization, to avoid
- // holding on to a lot of memory that isn't needed.
- VMRuntime.getRuntime().setTargetHeapUtilization(0.75f);
- VMRuntime.getRuntime().setTargetSdkVersion(targetSdkVersion);
- final Arguments args = new Arguments(argv);
- // The end of of the RuntimeInit event (see #zygoteInit).
- // Remaining arguments are passed to the start class's static main
- return findStaticMain(args.startClass, args.startArgs, classLoader);
- }
最后,通过findStaticMain方法传递剩余的参数给类的static main方法。到底是哪个类的static main方法呢?我们继续追踪源码。
- protected static Runnable findStaticMain(String className, String[] argv,
- ClassLoader classLoader) {
- Class<?> cl;
- try {
- cl = Class.forName(className, true, classLoader);
- } catch (ClassNotFoundException ex) {
- throw new RuntimeException(
- "Missing class when invoking static main " + className,
- ex);
- }
- Method m;
- try {
- m = cl.getMethod("main", new Class[] { String[].class });
- } catch (NoSuchMethodException ex) {
- throw new RuntimeException(
- "Missing static main on " + className, ex);
- } catch (SecurityException ex) {
- throw new RuntimeException(
- "Problem getting static main on " + className, ex);
- }
- int modifiers = m.getModifiers();
- if (! (Modifier.isStatic(modifiers) && Modifier.isPublic(modifiers))) {
- throw new RuntimeException(
- "Main method is not public and static on " + className);
- }
- /*
- * This throw gets caught in ZygoteInit.main(), which responds
- * by invoking the exception's run() method. This arrangement
- * clears up all the stack frames that were required in setting
- * up the process.
- */
- return new MethodAndArgsCaller(m, argv);
- }
- static class MethodAndArgsCaller implements Runnable {
- /** method to call */
- private final Method mMethod;
- /** argument array */
- private final String[] mArgs;
- public MethodAndArgsCaller(Method method, String[] args) {
- mMethod = method;
- mArgs = args;
- }
- public void run() {
- try {
- mMethod.invoke(null, new Object[] { mArgs });
- } catch (IllegalAccessException ex) {
- throw new RuntimeException(ex);
- } catch (InvocationTargetException ex) {
- Throwable cause = ex.getCause();
- if (cause instanceof RuntimeException) {
- throw (RuntimeException) cause;
- } else if (cause instanceof Error) {
- throw (Error) cause;
- }
- throw new RuntimeException(ex);
- }
- }
- }
接下来,我画个简单的图总结一下system server的启动流程,其中第一行加粗的是类,第二行是方法,启动流程大致如下:
最后,简单总结一下。今天把system Server启动的过程给总结了一下。其实,前面还算顺利,到了最后,通过抛出异常和反射进入System server的main方法这一块确实一开始没看明白。后来,结合代码的注释,以及查询相关资料,了解了这样做的用意。下次博客,继续跟踪一下SystemServer的main方法。
