记录一些遇到的ANR问题。
个人认为,ANR问题的难点在于系统出现问题时、或者其他app出现问题时的分析,自己app主线程耗时这种问题还是比较容易发现和处理的。

主线程进行耗时操作导致

主线程查询contentProvider导致ANR

先看anr

----- pid 12827 at 2022-11-10 09:37:31.569550930+0800 -----
Cmd line: com.xiaomi.aiasst.service

"main" prio=5 tid=1 Native
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x72915c48 self=0xb400007c90af5c00
  | sysTid=12827 nice=0 cgrp=foreground sched=0/0 handle=0x7c9221d4f8
  | state=S schedstat=( 1638490620 898223745 4661 ) utm=123 stm=40 core=5 HZ=100
  | stack=0x7fd6f80000-0x7fd6f82000 stackSize=8188KB
  | held mutexes=
  native: #00 pc 00000000000e0b3c  /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
  native: #01 pc 00000000000979bc  /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+160) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
  native: #02 pc 000000000005c384  /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+284) (BuildId: bc667c8603bf96311650a89a3f982968)
  native: #03 pc 000000000005d5e0  /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+76) (BuildId: bc667c8603bf96311650a89a3f982968)
  native: #04 pc 000000000005d31c  /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+224) (BuildId: bc667c8603bf96311650a89a3f982968)
  native: #05 pc 0000000000054a6c  /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+232) (BuildId: bc667c8603bf96311650a89a3f982968)
  native: #06 pc 000000000017a24c  /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+156) (BuildId: 7464f58195659636f1f83f255ba1ba12)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(BinderProxy.java:643)
  at android.database.BulkCursorProxy.getWindow(BulkCursorNative.java:165)
  at android.database.BulkCursorToCursorAdaptor.onMove(BulkCursorToCursorAdaptor.java:82)
  at android.database.AbstractCursor.moveToPosition(AbstractCursor.java:255)
  at android.database.AbstractCursor.moveToNext(AbstractCursor.java:287)
  at android.database.CursorWrapper.moveToNext(CursorWrapper.java:206)
  at c6.a.a(ContactLoader.java:6)
  ······

告诉我们是通过BpBinder进行数据交互时ANR了。
进一步看是由于客户端调用了cursor#moveToNext()

此为主线程进行耗时操作,应该放到IO线程,然后此问题就可以解决。

为了我们加深对ANR的学习,我又去调查了一下耗时的原因。以下是拓展内容:

查找PerfMonitor binderTransact,主要是看我们进程和线程对应的信息

11-10 09:37:25.092  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=290ms interface=android.content.IContentProvider code=1
11-10 09:37:25.340  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=237ms interface=android.content.IBulkCursor code=1
11-10 09:37:25.582  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=238ms interface=android.content.IBulkCursor code=1
11-10 09:37:25.831  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=244ms interface=android.content.IBulkCursor code=1
11-10 09:37:26.075  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=240ms interface=android.content.IBulkCursor code=1
11-10 09:37:26.319  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=240ms interface=android.content.IBulkCursor code=1
11-10 09:37:26.567  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=245ms interface=android.content.IBulkCursor code=1
11-10 09:37:26.821  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=250ms interface=android.content.IBulkCursor code=1
11-10 09:37:27.086  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=260ms interface=android.content.IBulkCursor code=1
11-10 09:37:27.371  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=279ms interface=android.content.IBulkCursor code=1
11-10 09:37:27.668  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=292ms interface=android.content.IBulkCursor code=1
11-10 09:37:27.930  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=259ms interface=android.content.IBulkCursor code=1
11-10 09:37:28.192  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=257ms interface=android.content.IBulkCursor code=1
11-10 09:37:28.450  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=254ms interface=android.content.IBulkCursor code=1
11-10 09:37:28.709  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=254ms interface=android.content.IBulkCursor code=1
11-10 09:37:28.967  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=254ms interface=android.content.IBulkCursor code=1
11-10 09:37:29.224  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=253ms interface=android.content.IBulkCursor code=1
11-10 09:37:29.484  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=257ms interface=android.content.IBulkCursor code=1
11-10 09:37:29.753  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=264ms interface=android.content.IBulkCursor code=1
11-10 09:37:30.042  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=284ms interface=android.content.IBulkCursor code=1
11-10 09:37:30.315  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=268ms interface=android.content.IBulkCursor code=1
11-10 09:37:30.596  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=275ms interface=android.content.IBulkCursor code=1
11-10 09:37:30.878  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=277ms interface=android.content.IBulkCursor code=1
11-10 09:37:31.151  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=270ms interface=android.content.IBulkCursor code=1
11-10 09:37:31.429  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=273ms interface=android.content.IBulkCursor code=1
11-10 09:37:31.704  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=270ms interface=android.content.IBulkCursor code=1
11-10 09:37:31.999  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=287ms interface=android.content.IBulkCursor code=1
11-10 09:37:32.280  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=276ms interface=android.content.IBulkCursor code=1
11-10 09:37:32.558  1000 12827 12827 W BpBinder: PerfMonitor binderTransact :  time=274ms interface=android.content.IBulkCursor code=1

可以看到有大量耗时操作,时间在200ms以上,且连续,所以判断这是个for循环。
然后看下对端接口情况,看下对端是否有异常。可以搜outgoing transaction

11-10 09:37:30.138  1000  2764 15665 W MIUIScout ANR: Binder Info:MIUI    outgoing transaction: from 12827:12827 to 15462:25110 context:binder code:  1 duration:     0.09 s
11-10 09:37:30.144  1000  2764 15665 W MIUIScout ANR: Pid 12827 Binder Info:
11-10 09:37:30.146  1000  2764 15665 W MIUIScout ANR: MIUI    outgoing transaction: from 12827:12827 to 15462:25110 context:binder code:  1 duration:     0.09 s

可以发现对端线程是25110,是15462/android.process.acore的子线程。

"binder:15462_9" prio=5 tid=3 Native
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x13040b30 self=0xb400007b6ed5f800
  | sysTid=25110 nice=0 cgrp=foreground sched=0/0 handle=0x7bcfa9bcb0
  | state=S schedstat=( 34871263159 1182034033 40493 ) utm=2403 stm=1083 core=5 HZ=100
  | stack=0x7bcf9a4000-0x7bcf9a6000 stackSize=991KB
  | held mutexes=
  native: #00 pc 00000000000e0b3c  /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
  native: #01 pc 00000000000979bc  /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+160) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
  native: #02 pc 000000000005c384  /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+284) (BuildId: bc667c8603bf96311650a89a3f982968)
  native: #03 pc 000000000005c678  /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+28) (BuildId: bc667c8603bf96311650a89a3f982968)
  native: #04 pc 000000000005cfc4  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+72) (BuildId: bc667c8603bf96311650a89a3f982968)
  native: #05 pc 000000000008cecc  /system/lib64/libbinder.so (android::PoolThread::threadLoop()+28) (BuildId: bc667c8603bf96311650a89a3f982968)
  native: #06 pc 0000000000013454  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+424) (BuildId: 6cc789a5db76fed354200c8693268976)
  native: #07 pc 00000000000cdaa8  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+144) (BuildId: 7464f58195659636f1f83f255ba1ba12)
  native: #08 pc 00000000000f5548  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
  native: #09 pc 000000000008ef3c  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
  (no managed stack frames)

也是再做talkWithDriver操作,并无异常,所以判断对端线程正常,anr原因就是系统性能问题。

回过来,在通过bugreport日志证实我们的想法,搜索ANR IN

11-10 09:23:12.742  1000  2764 28969 E ActivityManager: CPU usage from 322ms to 918ms later (2022-11-10 09:23:07.840 to 2022-11-10 09:23:08.436):
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:   73% 2764/system_server: 34% user + 38% kernel / faults: 1521 minor
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     51% 28969/AnrConsumer: 17% user + 34% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     12% 2776/HeapTaskDaemon: 10% user + 2.1% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     4.3% 6112/WifiHandlerThre: 2.1% user + 2.1% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     2.1% 2777/ReferenceQueueD: 2.1% user + 0% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:   99% 15462/android.process.acore: 70% user + 28% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     99% 25110/binder:15462_9: 70% user + 28% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:   34% 1553/surfaceflinger: 21% user + 13% kernel / faults: 7 minor
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     13% 1553/surfaceflinger: 9.5% user + 3.8% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     7.6% 2172/RenderEngine: 5.7% user + 1.9% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     3.8% 2848/surfaceflinger: 1.9% user + 1.9% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:   24% 1398/vendor.qti.hardware.display.composer-service: 14% user + 9.3% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     16% 10747/HwBinder:1398_3: 11% user + 5.6% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     7.4% 1579/composer-servic: 3.7% user + 3.7% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:   44% 28308/com.xiaomi.aiasst.service: 37% user + 7.4% kernel / faults: 663 minor
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     7.4% 28893/asr-work-thread: 7.4% user + 0% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     7.4% 28950/Voice-Mixer-Thr: 7.4% user + 0% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     3.7% 28308/.aiasst.service: 3.7% user + 0% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     3.7% 28849/ListenInCall: 3.7% user + 0% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager: 52% TOTAL: 26% user + 22% kernel + 2.1% irq + 0.6% softirq

其中有两个进程占比较高

11-10 09:23:12.742  1000  2764 28969 E ActivityManager:   99% 15462/android.process.acore: 70% user + 28% kernel
11-10 09:23:12.742  1000  2764 28969 E ActivityManager:     99% 25110/binder:15462_9: 70% user + 28% kernel

这里android.process.acore为系统联系人进程,binder:15462_9则是与我们APP进行binder交互的线程。

窗口绘制慢到ANR

12-07 18:23:28.203 19559 19559 I wm_on_resume_called: [0,com.miui.extraphoto.motionphoto.MotionPhotoActivity,RESUME_ACTIVITY,1]
12-07 18:23:28.203 19559 19559 I wm_on_top_resumed_gained_called: [40246013,com.miui.extraphoto.motionphoto.MotionPhotoActivity,topWhenResuming]
12-07 18:23:35.209 2307 19883 I am_anr : [0,19559,com.miui.extraphoto,551075397,Input dispatching timed out (Application does not have a focused window)]

// mDrawState 状态不是HAS_DRAWN,窗口未绘制完成,无法响应input key事件

mDisplayId=0 rootTaskId=3250 mSession=Session{8d1dcab 19559:u0a10082} mClient=android.os.BinderProxy@f1c0c6
    mOwnerUid=10082 showForAllUsers=false package=com.miui.extraphoto appop=NONE
    mAttrs={(0,0)(fillxfill) sim=

{adjust=pan forwardNavigation}
layoutInDisplayCutoutMode=shortEdges ty=BASE_APPLICATION fmt=TRANSPARENT wanim=0x10302fe
      fl=LAYOUT_IN_SCREEN FULLSCREEN LAYOUT_INSET_DECOR SPLIT_TOUCH HARDWARE_ACCELERATED TRANSLUCENT_STATUS DRAWS_SYSTEM_BAR_BACKGROUNDS
      pfl=NO_MOVE_ANIMATION FORCE_DRAW_STATUS_BAR_BACKGROUND USE_BLAST FIT_INSETS_CONTROLLED
      vsysui=LAYOUT_STABLE LAYOUT_FULLSCREEN
      bhv=SHOW_TRANSIENT_BARS_BY_SWIPE
      fitSides=}
  ......
    WindowStateAnimator{6f89c4e com.miui.extraphoto/com.miui.extraphoto.motionphoto.MotionPhotoActivity}:
      mSurface=Surface(name=com.miui.extraphoto/com.miui.extraphoto.motionphoto.MotionPhotoActivity)/@0x9ce816f
      Surface: shown=false layer=0 alpha=1.0 rect=(0.0,0.0)  transform=(1.0, 0.0, 0.0, 1.0)
      mDrawState=DRAW_PENDING      mLastHidden=true // mDrawState 状态不是HAS_DRAWN,窗口未绘制完成
      mEnterAnimationPending=false      mSystemDecorRect=[0,0][0,0]
    mForceSeamlesslyRotate=false seamlesslyRotate: pending=null finishedFrameNumber=0
    isOnScreen=true
    isVisible=true

此时可以去看下systrace,看看到底是啥方法耗时这么久。

主线程正常状态时的ANR

cpu高温导致降频

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x74b38080 self=0x7ad9014c00
  | sysTid=23081 nice=0 cgrp=default sched=0/0 handle=0x7b5fdc5548
  | state=S schedstat=( 284838633 166738594 505 ) utm=21 stm=7 core=1 HZ=100
  | stack=0x7fc95da000-0x7fc95dc000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0xb0/0xbc
  kernel: SyS_epoll_wait+0x288/0x364
  kernel: SyS_epoll_pwait+0xb0/0x124
  kernel: cpu_switch_to+0x38c/0x2258
  native: #00 pc 000000000007cd8c  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 0000000000014d48  /system/lib64/libutils.so (android::Looper::pollInner(int)+148)
  native: #02 pc 0000000000014c18  /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60)
  native: #03 pc 00000000001275f4  /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:330)
  at android.os.Looper.loop(Looper.java:169)
  at android.app.ActivityThread.main(ActivityThread.java:7073)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:536)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:876)

上述主线程堆栈就是一个很正常的空闲堆栈,表明主线程正在等待新的消息。

如果ANR日志里主线程是这样一个状态,那可能有两个原因:

  • 该ANR是CPU抢占、内存紧张、CPU降频等其他因素引起
  • 这份ANR日志抓取的时候,主线程已经恢复正常

分析思路:
去看anr发生时的cpu负载情况,搜索bugreport的ANR IN

10-17 12:04:19.449  1000  2963  5409 E ActivityManager: CPU usage from 41031ms to 0ms ago (2022-10-17 12:03:36.269 to 2022-10-17 12:04:17.300):
10-17 12:04:19.449  1000  2963  5409 E ActivityManager:   97% 1647/media.hwcodec: 9.2% user + 87% kernel
10-17 12:04:19.449  1000  2963  5409 E ActivityManager:   97% 9382/poll_avcD_137: 2.3% user + 94% kernel
10-17 12:04:19.449  1000  2963  5409 E ActivityManager:   64% 2963/system_server: 35% user + 29% kernel / faults: 296910 minor 154 major

10-17 12:04:19.449  1000  2963  5409 E ActivityManager: 91% TOTAL: 34% user + 52% kernel + 0.1% iowait + 4.5% irq + 0.5% softirq

40多秒的时间内,CPU占用91%。需要注意的是:TOTAL 的占用率,八核跑满是100%,不是800%,这个和单个进程的CPU占用计算方式不一样。

然后观察到BoardSensorTemp: 44.612,说明板子温度过高,此种情况下,可能触发cpu降频策略。此为非问题。

一些anr关键字

PerfMonitor binderTransact //此log一般在binder transact超过200ms时才有。
Slow Operation
activity resume慢/焦点窗口切换慢
09-18 18:22:46.380 system server端已经post app执行TransparentAliuserGuideActivity启动流程(onCreate、onStart、onResume)
关键字:wm_create_activity、wm_restart_activity、wm_set_resumed_activity、wm_pause_activity、wm_on_paused_called


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!

代码质量评估量化-sonarQube 上一篇
2022新的征程 下一篇