记录一些遇到的ANR问题。
个人认为,ANR问题的难点在于系统出现问题时、或者其他app出现问题时的分析,自己app主线程耗时这种问题还是比较容易发现和处理的。
主线程进行耗时操作导致
主线程查询contentProvider导致ANR
先看anr
----- pid 12827 at 2022-11-10 09:37:31.569550930+0800 -----
Cmd line: com.xiaomi.aiasst.service
"main" prio=5 tid=1 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x72915c48 self=0xb400007c90af5c00
| sysTid=12827 nice=0 cgrp=foreground sched=0/0 handle=0x7c9221d4f8
| state=S schedstat=( 1638490620 898223745 4661 ) utm=123 stm=40 core=5 HZ=100
| stack=0x7fd6f80000-0x7fd6f82000 stackSize=8188KB
| held mutexes=
native: #00 pc 00000000000e0b3c /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
native: #01 pc 00000000000979bc /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+160) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
native: #02 pc 000000000005c384 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+284) (BuildId: bc667c8603bf96311650a89a3f982968)
native: #03 pc 000000000005d5e0 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+76) (BuildId: bc667c8603bf96311650a89a3f982968)
native: #04 pc 000000000005d31c /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+224) (BuildId: bc667c8603bf96311650a89a3f982968)
native: #05 pc 0000000000054a6c /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+232) (BuildId: bc667c8603bf96311650a89a3f982968)
native: #06 pc 000000000017a24c /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+156) (BuildId: 7464f58195659636f1f83f255ba1ba12)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(BinderProxy.java:643)
at android.database.BulkCursorProxy.getWindow(BulkCursorNative.java:165)
at android.database.BulkCursorToCursorAdaptor.onMove(BulkCursorToCursorAdaptor.java:82)
at android.database.AbstractCursor.moveToPosition(AbstractCursor.java:255)
at android.database.AbstractCursor.moveToNext(AbstractCursor.java:287)
at android.database.CursorWrapper.moveToNext(CursorWrapper.java:206)
at c6.a.a(ContactLoader.java:6)
······告诉我们是通过BpBinder进行数据交互时ANR了。
进一步看是由于客户端调用了cursor#moveToNext()
此为主线程进行耗时操作,应该放到IO线程,然后此问题就可以解决。
为了我们加深对ANR的学习,我又去调查了一下耗时的原因。以下是拓展内容:
查找PerfMonitor binderTransact,主要是看我们进程和线程对应的信息
11-10 09:37:25.092 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=290ms interface=android.content.IContentProvider code=1
11-10 09:37:25.340 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=237ms interface=android.content.IBulkCursor code=1
11-10 09:37:25.582 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=238ms interface=android.content.IBulkCursor code=1
11-10 09:37:25.831 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=244ms interface=android.content.IBulkCursor code=1
11-10 09:37:26.075 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=240ms interface=android.content.IBulkCursor code=1
11-10 09:37:26.319 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=240ms interface=android.content.IBulkCursor code=1
11-10 09:37:26.567 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=245ms interface=android.content.IBulkCursor code=1
11-10 09:37:26.821 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=250ms interface=android.content.IBulkCursor code=1
11-10 09:37:27.086 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=260ms interface=android.content.IBulkCursor code=1
11-10 09:37:27.371 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=279ms interface=android.content.IBulkCursor code=1
11-10 09:37:27.668 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=292ms interface=android.content.IBulkCursor code=1
11-10 09:37:27.930 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=259ms interface=android.content.IBulkCursor code=1
11-10 09:37:28.192 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=257ms interface=android.content.IBulkCursor code=1
11-10 09:37:28.450 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=254ms interface=android.content.IBulkCursor code=1
11-10 09:37:28.709 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=254ms interface=android.content.IBulkCursor code=1
11-10 09:37:28.967 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=254ms interface=android.content.IBulkCursor code=1
11-10 09:37:29.224 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=253ms interface=android.content.IBulkCursor code=1
11-10 09:37:29.484 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=257ms interface=android.content.IBulkCursor code=1
11-10 09:37:29.753 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=264ms interface=android.content.IBulkCursor code=1
11-10 09:37:30.042 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=284ms interface=android.content.IBulkCursor code=1
11-10 09:37:30.315 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=268ms interface=android.content.IBulkCursor code=1
11-10 09:37:30.596 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=275ms interface=android.content.IBulkCursor code=1
11-10 09:37:30.878 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=277ms interface=android.content.IBulkCursor code=1
11-10 09:37:31.151 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=270ms interface=android.content.IBulkCursor code=1
11-10 09:37:31.429 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=273ms interface=android.content.IBulkCursor code=1
11-10 09:37:31.704 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=270ms interface=android.content.IBulkCursor code=1
11-10 09:37:31.999 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=287ms interface=android.content.IBulkCursor code=1
11-10 09:37:32.280 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=276ms interface=android.content.IBulkCursor code=1
11-10 09:37:32.558 1000 12827 12827 W BpBinder: PerfMonitor binderTransact : time=274ms interface=android.content.IBulkCursor code=1可以看到有大量耗时操作,时间在200ms以上,且连续,所以判断这是个for循环。
然后看下对端接口情况,看下对端是否有异常。可以搜outgoing transaction
11-10 09:37:30.138 1000 2764 15665 W MIUIScout ANR: Binder Info:MIUI outgoing transaction: from 12827:12827 to 15462:25110 context:binder code: 1 duration: 0.09 s
11-10 09:37:30.144 1000 2764 15665 W MIUIScout ANR: Pid 12827 Binder Info:
11-10 09:37:30.146 1000 2764 15665 W MIUIScout ANR: MIUI outgoing transaction: from 12827:12827 to 15462:25110 context:binder code: 1 duration: 0.09 s可以发现对端线程是25110,是15462/android.process.acore的子线程。
"binder:15462_9" prio=5 tid=3 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x13040b30 self=0xb400007b6ed5f800
| sysTid=25110 nice=0 cgrp=foreground sched=0/0 handle=0x7bcfa9bcb0
| state=S schedstat=( 34871263159 1182034033 40493 ) utm=2403 stm=1083 core=5 HZ=100
| stack=0x7bcf9a4000-0x7bcf9a6000 stackSize=991KB
| held mutexes=
native: #00 pc 00000000000e0b3c /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
native: #01 pc 00000000000979bc /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+160) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
native: #02 pc 000000000005c384 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+284) (BuildId: bc667c8603bf96311650a89a3f982968)
native: #03 pc 000000000005c678 /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+28) (BuildId: bc667c8603bf96311650a89a3f982968)
native: #04 pc 000000000005cfc4 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+72) (BuildId: bc667c8603bf96311650a89a3f982968)
native: #05 pc 000000000008cecc /system/lib64/libbinder.so (android::PoolThread::threadLoop()+28) (BuildId: bc667c8603bf96311650a89a3f982968)
native: #06 pc 0000000000013454 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+424) (BuildId: 6cc789a5db76fed354200c8693268976)
native: #07 pc 00000000000cdaa8 /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+144) (BuildId: 7464f58195659636f1f83f255ba1ba12)
native: #08 pc 00000000000f5548 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
native: #09 pc 000000000008ef3c /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 2bb0d7188c0db2e8beecb24658ba9d71)
(no managed stack frames)也是再做talkWithDriver操作,并无异常,所以判断对端线程正常,anr原因就是系统性能问题。
回过来,在通过bugreport日志证实我们的想法,搜索ANR IN
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: CPU usage from 322ms to 918ms later (2022-11-10 09:23:07.840 to 2022-11-10 09:23:08.436):
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 73% 2764/system_server: 34% user + 38% kernel / faults: 1521 minor
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 51% 28969/AnrConsumer: 17% user + 34% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 12% 2776/HeapTaskDaemon: 10% user + 2.1% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 4.3% 6112/WifiHandlerThre: 2.1% user + 2.1% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 2.1% 2777/ReferenceQueueD: 2.1% user + 0% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 99% 15462/android.process.acore: 70% user + 28% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 99% 25110/binder:15462_9: 70% user + 28% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 34% 1553/surfaceflinger: 21% user + 13% kernel / faults: 7 minor
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 13% 1553/surfaceflinger: 9.5% user + 3.8% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 7.6% 2172/RenderEngine: 5.7% user + 1.9% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 3.8% 2848/surfaceflinger: 1.9% user + 1.9% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 24% 1398/vendor.qti.hardware.display.composer-service: 14% user + 9.3% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 16% 10747/HwBinder:1398_3: 11% user + 5.6% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 7.4% 1579/composer-servic: 3.7% user + 3.7% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 44% 28308/com.xiaomi.aiasst.service: 37% user + 7.4% kernel / faults: 663 minor
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 7.4% 28893/asr-work-thread: 7.4% user + 0% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 7.4% 28950/Voice-Mixer-Thr: 7.4% user + 0% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 3.7% 28308/.aiasst.service: 3.7% user + 0% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 3.7% 28849/ListenInCall: 3.7% user + 0% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 52% TOTAL: 26% user + 22% kernel + 2.1% irq + 0.6% softirq其中有两个进程占比较高
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 99% 15462/android.process.acore: 70% user + 28% kernel
11-10 09:23:12.742 1000 2764 28969 E ActivityManager: 99% 25110/binder:15462_9: 70% user + 28% kernel这里android.process.acore为系统联系人进程,binder:15462_9则是与我们APP进行binder交互的线程。
窗口绘制慢到ANR
12-07 18:23:28.203 19559 19559 I wm_on_resume_called: [0,com.miui.extraphoto.motionphoto.MotionPhotoActivity,RESUME_ACTIVITY,1]
12-07 18:23:28.203 19559 19559 I wm_on_top_resumed_gained_called: [40246013,com.miui.extraphoto.motionphoto.MotionPhotoActivity,topWhenResuming]
12-07 18:23:35.209 2307 19883 I am_anr : [0,19559,com.miui.extraphoto,551075397,Input dispatching timed out (Application does not have a focused window)]// mDrawState 状态不是HAS_DRAWN,窗口未绘制完成,无法响应input key事件
mDisplayId=0 rootTaskId=3250 mSession=Session{8d1dcab 19559:u0a10082} mClient=android.os.BinderProxy@f1c0c6
mOwnerUid=10082 showForAllUsers=false package=com.miui.extraphoto appop=NONE
mAttrs={(0,0)(fillxfill) sim=
{adjust=pan forwardNavigation}
layoutInDisplayCutoutMode=shortEdges ty=BASE_APPLICATION fmt=TRANSPARENT wanim=0x10302fe
fl=LAYOUT_IN_SCREEN FULLSCREEN LAYOUT_INSET_DECOR SPLIT_TOUCH HARDWARE_ACCELERATED TRANSLUCENT_STATUS DRAWS_SYSTEM_BAR_BACKGROUNDS
pfl=NO_MOVE_ANIMATION FORCE_DRAW_STATUS_BAR_BACKGROUND USE_BLAST FIT_INSETS_CONTROLLED
vsysui=LAYOUT_STABLE LAYOUT_FULLSCREEN
bhv=SHOW_TRANSIENT_BARS_BY_SWIPE
fitSides=}
......
WindowStateAnimator{6f89c4e com.miui.extraphoto/com.miui.extraphoto.motionphoto.MotionPhotoActivity}:
mSurface=Surface(name=com.miui.extraphoto/com.miui.extraphoto.motionphoto.MotionPhotoActivity)/@0x9ce816f
Surface: shown=false layer=0 alpha=1.0 rect=(0.0,0.0) transform=(1.0, 0.0, 0.0, 1.0)
mDrawState=DRAW_PENDING mLastHidden=true // mDrawState 状态不是HAS_DRAWN,窗口未绘制完成
mEnterAnimationPending=false mSystemDecorRect=[0,0][0,0]
mForceSeamlesslyRotate=false seamlesslyRotate: pending=null finishedFrameNumber=0
isOnScreen=true
isVisible=true此时可以去看下systrace,看看到底是啥方法耗时这么久。
主线程正常状态时的ANR
cpu高温导致降频
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x74b38080 self=0x7ad9014c00
| sysTid=23081 nice=0 cgrp=default sched=0/0 handle=0x7b5fdc5548
| state=S schedstat=( 284838633 166738594 505 ) utm=21 stm=7 core=1 HZ=100
| stack=0x7fc95da000-0x7fc95dc000 stackSize=8MB
| held mutexes=
kernel: __switch_to+0xb0/0xbc
kernel: SyS_epoll_wait+0x288/0x364
kernel: SyS_epoll_pwait+0xb0/0x124
kernel: cpu_switch_to+0x38c/0x2258
native: #00 pc 000000000007cd8c /system/lib64/libc.so (__epoll_pwait+8)
native: #01 pc 0000000000014d48 /system/lib64/libutils.so (android::Looper::pollInner(int)+148)
native: #02 pc 0000000000014c18 /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60)
native: #03 pc 00000000001275f4 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:330)
at android.os.Looper.loop(Looper.java:169)
at android.app.ActivityThread.main(ActivityThread.java:7073)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:536)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:876)上述主线程堆栈就是一个很正常的空闲堆栈,表明主线程正在等待新的消息。
如果ANR日志里主线程是这样一个状态,那可能有两个原因:
- 该ANR是CPU抢占、内存紧张、CPU降频等其他因素引起
- 这份ANR日志抓取的时候,主线程已经恢复正常
分析思路:
去看anr发生时的cpu负载情况,搜索bugreport的ANR IN
10-17 12:04:19.449 1000 2963 5409 E ActivityManager: CPU usage from 41031ms to 0ms ago (2022-10-17 12:03:36.269 to 2022-10-17 12:04:17.300):
10-17 12:04:19.449 1000 2963 5409 E ActivityManager: 97% 1647/media.hwcodec: 9.2% user + 87% kernel
10-17 12:04:19.449 1000 2963 5409 E ActivityManager: 97% 9382/poll_avcD_137: 2.3% user + 94% kernel
10-17 12:04:19.449 1000 2963 5409 E ActivityManager: 64% 2963/system_server: 35% user + 29% kernel / faults: 296910 minor 154 major
10-17 12:04:19.449 1000 2963 5409 E ActivityManager: 91% TOTAL: 34% user + 52% kernel + 0.1% iowait + 4.5% irq + 0.5% softirq40多秒的时间内,CPU占用91%。需要注意的是:TOTAL 的占用率,八核跑满是100%,不是800%,这个和单个进程的CPU占用计算方式不一样。
然后观察到BoardSensorTemp: 44.612,说明板子温度过高,此种情况下,可能触发cpu降频策略。此为非问题。
一些anr关键字
PerfMonitor binderTransact //此log一般在binder transact超过200ms时才有。
Slow Operation
activity resume慢/焦点窗口切换慢
09-18 18:22:46.380 system server端已经post app执行TransparentAliuserGuideActivity启动流程(onCreate、onStart、onResume)
关键字:wm_create_activity、wm_restart_activity、wm_set_resumed_activity、wm_pause_activity、wm_on_paused_called
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!