赞
踩
在前面的Android ANR 问题第二弹一文中,我们简诉了Android Input超时的原因,我们了解到系统Input系统分发Input的事件时如果有5s超时会触发应用ANR。在实际开发测试中,我们也会经常遇到Input超时导致的ANR问题,那么现在我们就以实际问题分析一下Input超时的ANR问题,描述一下实际开发测试中我们应该如何定位问题的原因。本文基于AndroidO(8.1)系统。
实战一:
Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.),手动模拟复现无窗口,有应用类型的ANR问题。直接上Log
- 05-28 11:24:49.798 1668 3141 I ActivityManager: START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=com.example.anrtestdemo/.MainActivity} from uid 10007 on display 0 //从手机Launcher界面点击启动测试demo
- //我们可以从11:24:55.536 1668 2070 I InputDispatcher中推算这里应该发生一次Input事件,而实际测试中,我却是在这个时间左右多次点击back键
- 05-28 11:24:49.849 23293 23293 D abm : onStart //测试demo MainActivity执行onStart方法
- 05-28 11:24:55.536 1668 2070 I InputDispatcher: Application is not responding: AppWindowToken{d262c54 token=Token{cca1066 ActivityRecord{2f3c0c1 u0 com.example.anrtestdemo/.MainActivity t18}}}. It has been 5018.6ms since event, 5006.2ms since wait started. Reason: Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.//Input ANR超时第一时间点
- 05-28 11:24:55.642 1668 1713 I am_anr : [0,23293,com.example.anrtestdemo,952680262,Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]//Android ActivityManagerService ANR Log打印,本例am_anr Log和InputDispatcher之间相差时间较少,有时也会出现很大的时间差
- 05-28 11:24:59.849 23293 23293 D abm : onStart return time is: 10000 //模拟耗时的操作
- 05-28 11:24:59.852 23293 23293 D abm : onResume //测试demo MainActivity执行onResume方法
- 05-28 11:24:59.852 23293 23293 I am_on_resume_called: [0,com.example.anrtestdemo.MainActivity,LAUNCH_ACTIVITY]
- 05-28 11:25:00.420 1668 1713 E ActivityManager: ANR in com.example.anrtestdemo (com.example.anrtestdemo/.MainActivity)
- 05-28 11:25:00.420 1668 1713 E ActivityManager: PID: 23293
- 05-28 11:25:00.420 1668 1713 E ActivityManager: Reason: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
- 05-28 11:25:00.420 1668 1713 E ActivityManager: Load: 5.72 / 5.5 / 5.29
- 05-28 11:25:00.420 1668 1713 E ActivityManager: CPU usage from 98992ms to 0ms ago (2018-05-28 11:23:16.593 to 2018-05-28 11:24:55.585):
- 05-28 11:25:00.420 1668 1713 E ActivityManager: 14% 1668/system_server: 8.8% user + 5.2% kernel / faults: 42792 minor 28 major
- 05-28 11:25:00.420 1668 1713 E ActivityManager: 9.1% 558/surfaceflinger: 3.5% user + 5.6% kernel / faults: 1609 minor 2 major
- 05-28 11:25:00.420 1668 1713 E ActivityManager: 5.7% 878/adbd: 0.5% user + 5.2% kernel / faults: 2680 minor
- 05-28 11:25:00.420 1668 1713 E ActivityManager: 5.4% 2490/com.android.systemui: 3.6% user + 1.8% kernel / faults: 15618 minor 99 major
- 05-28 11:25:00.420 1668 1713 E ActivityManager: 3.7% 8331/com.blackberry.blackberrylauncher: 2.8% user + 0.8% kernel / faults: 37990 minor 9 major
从adb log中,我们可以知道ANR类型,并且依据该类型的ANR,具体分析trace Log
- DALVIK THREADS (12):
- "main" prio=5 tid=1 Runnable
- | group="main" sCount=0 dsCount=0 obj=0x757f91f8 self=0x4832e3fa00
- | sysTid=23293 nice=-10 cgrp=default sched=0/0 handle=0x483710aa98
- | state=R schedstat=( 16106974966 17948853 313 ) utm=1450 stm=160 core=5 HZ=100
- | stack=0x5c73865000-0x5c73867000 stackSize=8MB
- | held mutexes= "mutator lock"(shared held)
- at com.example.anrtestdemo.MainActivity.onStart(MainActivity.java:22)
- at android.app.Instrumentation.callActivityOnStart(Instrumentation.java:1249)
- at android.app.Activity.performStart(Activity.java:6872)
- at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2666)
- at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2764)
- at android.app.ActivityThread.-wrap12(ActivityThread.java:-1)
- at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1510)
- at android.os.Handler.dispatchMessage(Handler.java:102)
- at android.os.Looper.loop(Looper.java:154)
- at android.app.ActivityThread.main(ActivityThread.java:6157)
- at java.lang.reflect.Method.invoke!(Native method)
- at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:926)
- at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:788)
我们一般查看tracelog的时候,一般都是查看main线程的堆栈,但不并是意味着其他的线程就不据参考性,只是本例中无需查看,实际遇到的其他问题,我们还是要仔细查看tarce文件中显示的各个线程的堆栈,因为我们可以从trace文件中看出,ANR发生时的某一时刻系统的状态,并且我们可以依据那一刻系统的状态,并结合adb log大胆的去想像anr发生时手机在做什么操作。本例可以从trace log中看出anr发生时,我的这个测试应用在调用at com.example.anrtestdemo.MainActivity.onStart(MainActivity.java:22)方法时存在耗时。搜索代码,查看是否存在耗时。
- @Override
- protected void onStart() {
- super.onStart();
- Log.d("abm", "onStart");
- long currentTimeMillis = System.currentTimeMillis();//记录系统当前时间
- while (true){//此处循环便是模拟的耗时操作,不断获取时间10s之后结束。
- long timeMillis = System.currentTimeMillis();
- if ((timeMillis - currentTimeMillis) >= 10000) {
- Log.d("abm", "onStart return time is: " + (timeMillis - currentTimeMillis));
- return;
- }
- }
- }
问题原因找到了,我们的MainActivity复写了MainActivity的onStart方法并且做了耗时操作,导致Input事件分发超时。那么问题来了,为什么我这边想要给大家介绍无窗口,有应用类型的ANR,为什么测试demo在onStart或者onCreate方法中做了耗时操作就能复现了呢该类型的ANR问题?那是因为,Android应用启动的时候都是先创建Application,创建号Application之后,才会执行应用启动的主Activity,再执行该Activity的onCreate,onStart,onResume方法,然后我们的应用窗口便是在onResume中才去向WindowManager添加注册的。因此在注册添加窗口之前,application或者启动的Activity的生命周期onCreate,onStart的任意方法,做了耗时操作,或者他们加载一起的执行时间过长,都是能够导致无窗口,有应用类型的Input ANR问题发生的。这样一来,我们实际开发应用的时候,就要尽可能的把耗时的操作,异步处理。具体异步实现思路你可以使用new thread + handler,Asynctask,HandlerThread等等,这里推荐使用HandlerThread,因为google封装的接口,使用起来简单。分析完本例演示的问题,突然发现依据adb log trace可以轻轻松松的定位问题的原因,但是实际遇到的问题,却并不如此的简单,要是遇到如下的trace文件,就比较尴尬了。
- "main" prio=5 tid=1 Native
- | group="main" sCount=1 dsCount=0 flags=1 obj=0x74374ee8 self=0x149d8c3a00
- | sysTid=8720 nice=0 cgrp=default sched=0/0 handle=0x15226569a8
- | state=S schedstat=( 0 0 0 ) utm=7673 stm=891 core=0 HZ=100
- | stack=0x46803c3000-0x46803c5000 stackSize=8MB
- | held mutexes=
- kernel: (couldn't read /proc/self/task/8720/stack)
- native: #00 pc 000000000006a660 /system/lib64/libc.so (__epoll_pwait+8)
- native: #01 pc 000000000001fca4 /system/lib64/libc.so (epoll_pwait+52)
- native: #02 pc 0000000000015d08 /system/lib64/libutils.so (android::Looper::pollInner(int)+144)
- native: #03 pc 0000000000015bf0 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+108)
- native: #04 pc 00000000001112e0 /system/lib64/libandroid_runtime.so (???)
- native: #05 pc 00000000001e166c /system/framework/arm64/boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+140)
- at android.os.MessageQueue.nativePollOnce(Native method)
- at android.os.MessageQueue.next(MessageQueue.java:325)
- at android.os.Looper.loop(Looper.java:142)
- at android.app.ActivityThread.main(ActivityThread.java:6558)
- at java.lang.reflect.Method.invoke(Native method)
- at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:469)
- at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:826)
你会发现trace文件都看得自己怀疑人生了,根据trace信息,咋一看,都要怀疑系统的looper messagequeue有问题,其实这是完全不可能的,要是系统looper都出现问题了,那整个系统都估计要挂了。但是这个从这个trace Log中我们也是可以获取的一些有用信息的,如果我们熟悉系统looper messagequeue的机制的话,我们不难知道当应用主线程的looper messagequeue中没有消息的时候,我们会休眠于MessageQueue.nativePollOnce方法,实际是执行底层的Looper的pollInner进入epoll_wait等待。当有消息来临的时候,此looper便会被wake,执行消息。因此我们可以大胆的猜测,ANR问题发生时,应用主线程中无可执行的Message,那么什么时候没有消息呢?一般执行完onResume完之后,并且界面没有更新,应用UI主线程的looper会进入MessageQueue.nativePollOnce。那么此时我们又应该如何继续分析呢?前面我们所说trace文件,我们一般看main thread的堆栈,那么此时是时候看一波应用其他的线程执行情况了。先搜索关键字“Binder:8720_”(这个8720是当前ANR进程的主进程号,系统一般都是按Binder:进程号_,进行拼接binger线程名)至于原因,系统代码,就是这么定的,代码如下:
- String8 ProcessState::makeBinderThreadName() {
- int32_t s = android_atomic_add(1, &mThreadPoolSeq);
- pid_t pid = getpid();
- String8 name;
- name.appendFormat("Binder:%d_%X", pid, s);
- return name;
- }
搜索binder线程,看应用是否存在binder调用,然后更具trace,结合代码,猜想ANR发生时,应用到底在做什么操作。实际遇到此问题中,说实在的我们并不能从trace中发现什么问题,此时需要我们多看一些adb log,去看看ANR问题发生之前,系统都在干嘛,做了哪些操作,或者查出系统的运行状态等等的一些有用信息,我们要养成多看Log的习惯,这样在分析ANR问题的时候,才能更加准确的推演ANR问题发生时的整个场景。
实战二:手动复现Input事件超时,waitqueue不为空问题
实体按键:Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5507.1ms.
没什么说的,直接上Log
- 05-28 15:16:36.753 1668 1702 I ActivityManager: Start proc 27398:com.example.anrtestdemo/u0a120 for activity com.example.anrtestdemo/.MainActivity
- 05-28 15:16:36.873 1668 11856 I am_proc_bound: [0,27398,com.example.anrtestdemo]
- 05-28 15:16:36.880 1668 11856 I am_restart_activity: [0,223351388,31,com.example.anrtestdemo/.MainActivity]
- 05-28 15:16:36.953 27398 27398 D abm : onResume
- 05-28 15:16:36.955 27398 27398 I am_on_resume_called: [0,com.example.anrtestdemo.MainActivity,LAUNCH_ACTIVITY]
- 05-28 15:16:37.090 1668 1720 I am_activity_launch_time: [0,223351388,com.example.anrtestdemo/.MainActivity,365,365]
- 05-28 15:16:37.090 1668 1720 I ActivityManager: Displayed com.example.anrtestdemo/.MainActivity: +365ms
- 05-28 15:16:45.279 1668 2070 I InputDispatcher: Application is not responding: Window{ea02cdb u0 com.example.anrtestdemo/com.example.anrtestdemo.MainActivity}. It has been 5006.4ms since event, 5005.8ms since wait started. Reason: Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5507.1ms.
- 05-28 15:16:45.343 1668 1713 I am_anr : [0,27398,com.example.anrtestdemo,952680262,Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5507.1ms.)]
- 05-28 15:16:48.456 1668 1713 E ActivityManager: ANR in com.example.anrtestdemo (com.example.anrtestdemo/.MainActivity)
- 05-28 15:16:48.456 1668 1713 E ActivityManager: PID: 27398
- 05-28 15:16:48.456 1668 1713 E ActivityManager: Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 6. Wait queue head age: 5507.1ms.)
- 05-28 15:16:48.456 1668 1713 E ActivityManager: Load: 6.11 / 5.38 / 4.77
- 05-28 15:16:48.456 1668 1713 E ActivityManager: CPU usage from 36255ms to 0ms ago (2018-05-28 15:16:09.036 to 2018-05-28 15:16:45.291):
- 05-28 15:16:48.456 1668 1713 E ActivityManager: 10% 1668/system_server: 4.6% user + 6% kernel / faults: 5194 minor 7 major
- 05-28 15:16:48.456 1668 1713 E ActivityManager: 6.2% 558/surfaceflinger: 2.7% user + 3.4% kernel / faults: 480 minor
- 05-28 15:16:48.456 1668 1713 E ActivityManager: 3.2% 434/logd: 2.3% user + 0.9% kernel / faults: 3 minor
- 05-28 15:16:48.456 1668 1713 E ActivityManager: 3.1% 23788/adbd: 0.4% user + 2.6% kernel / faults: 561 minor
- 05-28 15:16:48.456 1668 1713 E ActivityManager: 2.7% 2490/com.android.systemui: 1.5% user + 1.1% kernel / faults: 588 minor
- 05-28 15:16:48.456 1668 1713 E ActivityManager: 1.2% 26217/kworker/u16:2: 0% user + 1.2% kernel
实战一中,已经详细介绍了如何看Log,如何看trace文件了,这边就不再赘述,直接贴取关键Log,
trace log
- DALVIK THREADS (13):
- "main" prio=5 tid=1 Runnable
- | group="main" sCount=0 dsCount=0 obj=0x757f91f8 self=0x4832e3fa00
- | sysTid=27398 nice=-10 cgrp=default sched=0/0 handle=0x483710aa98
- | state=R schedstat=( 6436596993 5748639 166 ) utm=581 stm=62 core=4 HZ=100
- | stack=0x5c73865000-0x5c73867000 stackSize=8MB
- | held mutexes= "mutator lock"(shared held)
- native: #00 pc 0000000000479994 /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
- native: #01 pc 0000000000479990 /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
- native: #02 pc 000000000044dfa0 /system/lib64/libart.so (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+472)
- native: #03 pc 00000000004658fc /system/lib64/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
- native: #04 pc 000000000044eeac /system/lib64/libart.so (_ZN3art6Thread21RunCheckpointFunctionEv+192)
- native: #05 pc 0000000000547928 /system/lib64/libart.so (_ZN3art14JniMethodStartEPNS_6ThreadE+168)
- native: #06 pc 00000000000b22f4 /system/framework/arm64/boot.oat (Java_java_lang_System_currentTimeMillis__+96)
- at java.lang.System.currentTimeMillis(Native method)
- at com.example.anrtestdemo.MainActivity$MyHandler$1.run(MainActivity.java:70)
- at android.os.Handler.handleCallback(Handler.java:751)
- at android.os.Handler.dispatchMessage(Handler.java:95)
- at android.os.Looper.loop(Looper.java:154)
- at android.app.ActivityThread.main(ActivityThread.java:6157)
- at java.lang.reflect.Method.invoke!(Native method)
- at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:926)
- at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:788)
定位问题代码
- @Override
- protected void onResume() {
- super.onResume();
- Log.d("abm", "onResume");
- final MyHandler handler = new MyHandler();
- new Thread(new Runnable() {
- @Override
- public void run() {
- try {
- Thread.sleep(2000);
- handler.obtainMessage(666).sendToTarget();
- } catch (InterruptedException e) {
- e.printStackTrace();
- }
- }
- }).start();
-
- }
-
- class MyHandler extends Handler {
- @Override
- public void handleMessage(Message msg) {
- super.handleMessage(msg);
- if (msg.what == 666) {
- new Handler().post(new Runnable() {
- @Override
- public void run() {
- long currentTimeMillis = System.currentTimeMillis();//问题代码,这边模拟10s耗时操作
- while (true){
- long timeMillis = System.currentTimeMillis();
- if ((timeMillis - currentTimeMillis) >= 10000) {
- Log.d("abm", "onResume return time is: " + (timeMillis - currentTimeMillis));
- return;
- }
- }
- }
- });
- }
- }
- }
这边写的代码时,当应用界面加载好的时候,使用Handler向UI主线程的looper messagequeue post一个runnable callback,这个runnable里面便是执行的耗时操作,因为这个耗时操作是在UI主线程中执行的,此时当我们再点击手机实体按键或者屏幕时,如果5s超时,则触发ANR发生。当然实际操作中,我们并不会直接这么做如此显而易见的耗时操作,这里只是用类似这种简单粗暴的方式告诉大家,把耗时的操作移到UI线程之外,因为UI线程负责与用户交互界面更新,UI线程的卡吨直接影响用户体验,更会造成ANR问题的发生。
通过上面的两个小例子,我们已经将最常见到的Input超时导致的ANR问题介绍了。由于篇幅原因,计划将input各种类型的ANR问题分为两个部分来写,本篇先给大家介绍最为常见的两种类型的Input超时ANR,下篇再介绍剩下的一些类型比较少见的ANR问题。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。