大家知道binder线程池最大线程数为16个。
下面来证实下。

证实最大线程数16

编写AIDL:

package com.ericcode.bindertest;

interface ICalculator {

    int add(int a,int b);

    int sub(int a,int b);
}

编写服务:

class RemoteService : Service() {
    companion object {
        const val TAG = "RemoteService"
    }

    override fun onBind(intent: Intent): IBinder {
        return object: ICalculator.Stub() {
            override fun add(a: Int, b: Int): Int {
                Logger.i(TAG, "add:$a,$b")
                Thread.sleep(1000) // 阻塞binder线程
                return a + b
            }

            override fun sub(a: Int, b: Int): Int {
                return a - b
            }
        }
    }
}

编写客户端:


private fun testBinderThreadPool() {
    val intent = Intent(baseContext, RemoteService::class.java)
    bindService(intent, object : ServiceConnection {
        override fun onServiceConnected(name: ComponentName?, service: IBinder?) {
            Logger.i(TAG, "onServiceConnected")
            ICalculator.Stub.asInterface(service).apply {
                for (i in 0..30) { // 执行30次跨进程调用
                    thread {
                        val startTime = System.currentTimeMillis()
                        val result = add(i, i)
                        Logger.i(
                            TAG,
                            "result:$result index:$i use:${System.currentTimeMillis() - startTime}ms"
                        )
                    }
                }
            }
        }
        override fun onServiceDisconnected(name: ComponentName?) {
            TODO("Not yet implemented")
        }
    }, BIND_AUTO_CREATE)
}

以上实现了,客户端会调用30次服务端的add方法,每次add执行时都会耗时1秒,这样就会出现服务端处理不过来的场景,binder线程也会被占满。
以下是log:

2023-10-26 18:07:10.094 22945-22945 zsm:MainActivity                    com...ode.bindertest  I  main:onServiceConnected
2023-10-26 18:07:10.097 22978-22994 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_4:add:0,0
2023-10-26 18:07:10.098 22978-22992 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_2:add:3,3
2023-10-26 18:07:10.099 22978-22993 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_3:add:1,1
2023-10-26 18:07:10.100 22978-22991 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_1:add:11,11
2023-10-26 18:07:10.101 22978-23019 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_5:add:7,7
2023-10-26 18:07:10.105 22978-23024 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_6:add:9,9
2023-10-26 18:07:10.105 22978-23037 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_7:add:10,10
2023-10-26 18:07:10.106 22978-23038 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_8:add:2,2
2023-10-26 18:07:10.108 22978-23040 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_9:add:6,6
2023-10-26 18:07:10.110 22978-23041 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_A:add:4,4
2023-10-26 18:07:10.111 22978-23042 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_B:add:8,8
2023-10-26 18:07:10.112 22978-23043 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_C:add:22,22
2023-10-26 18:07:10.114 22978-23044 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_D:add:23,23
2023-10-26 18:07:10.115 22978-23045 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_E:add:16,16
2023-10-26 18:07:10.116 22978-23046 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_F:add:14,14
2023-10-26 18:07:10.117 22978-23048 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_10:add:25,25  // 服务端收到了很多次调用,但是卡在了第16次,因为线程池大小为16
2023-10-26 18:07:11.099 22978-22994 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_4:add:26,26  // 新的调用进来,线程号还是原来的,证明此为线程池,线程会复用。
2023-10-26 18:07:11.100 22945-23005 zsm:MainActivity                    com...ode.bindertest  I  Thread-6:result:0 index:0 use:1003ms  // 客户端收到了结果
2023-10-26 18:07:11.101 22978-22992 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_2:add:5,5
2023-10-26 18:07:11.102 22945-23008 zsm:MainActivity                    com...ode.bindertest  I  Thread-9:result:6 index:3 use:1005ms
2023-10-26 18:07:11.102 22945-23006 zsm:MainActivity                    com...ode.bindertest  I  Thread-7:result:2 index:1 use:1003ms
2023-10-26 18:07:11.103 22945-23016 zsm:MainActivity                    com...ode.bindertest  I  Thread-17:result:22 index:11 use:1004ms
2023-10-26 18:07:11.103 22978-22993 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_3:add:28,28
2023-10-26 18:07:11.104 22978-23019 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_5:add:13,13
2023-10-26 18:07:11.104 22978-22991 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_1:add:29,29
2023-10-26 18:07:11.104 22945-23012 zsm:MainActivity                    com...ode.bindertest  I  Thread-13:result:14 index:7 use:1004ms
2023-10-26 18:07:11.107 22978-23024 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_6:add:20,20
2023-10-26 18:07:11.108 22978-23037 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_7:add:17,17
2023-10-26 18:07:11.108 22978-23038 zsm:RemoteService                   com...ode.bindertest  I  binder:22978_8:add:21,21
.........

binder占满导致的ANR

各个线程及状态:
客户端:
main线程 talkWithDriver
存在大量线程都在与服务端交互(trace文件调用栈得出),且卡在talkWithDriver

服务端:
存在大量binder:xxx-x线程,个数为16,说明binder线程池占满,无法处理后续新的ipc调用,会将无法处理的ipc放入队列

oneway带来的不稳定调用问题

oneway修饰的方法,在ipc时,客户端可以不等待服务端的结果,提高客户端的效率。
所以oneway只能修饰返回值为void,且参数不可以被inout、out修饰。

但是在测试上面的最大线程数时,发现oneway方法无法成功执行。

AIDL:

interface ICalculator {

    int add(int a,int b);

    oneway void set(int i);
}

客户端代码:

private fun testBinderThreadPoolOneway() {
    val intent = Intent(baseContext, RemoteService::class.java)
    bindService(intent, object : ServiceConnection {
        override fun onServiceConnected(name: ComponentName?, service: IBinder?) {
            Logger.i(TAG, "onServiceConnected")
            ICalculator.Stub.asInterface(service).apply {
                for (i in 0..5000) {
                    thread(name = "ipc_client_$i") {
                        val startTime = System.currentTimeMillis()
                        set(i) // 跨进程调用
                        Logger.i(
                            TAG,
                            "index:$i use:${System.currentTimeMillis() - startTime}ms"
                        )
                    }
                }

            }

        }

        override fun onServiceDisconnected(name: ComponentName?) {
            TODO("Not yet implemented")
        }
    }, BIND_AUTO_CREATE)
}

服务端实现:

class RemoteService : Service() {
    companion object {
        const val TAG = "RemoteService"
    }

    override fun onBind(intent: Intent): IBinder {
        return object: ICalculator.Stub() {
            var sum = 0
            override fun add(a: Int, b: Int): Int {
                Logger.i(TAG, "add:$a,$b")
                Thread.sleep(1000)
                return a + b
            }

            @Synchronized
            override fun set(i: Int) {
                Logger.i(TAG, "set:$i")
                sum += i
                Thread.sleep(10)
                Logger.i(TAG, "set end, sum:$sum")
            }
        }
    }
}

出现了客户端崩溃:

FATAL EXCEPTION: ipc_client_2780
Process: com.ericcode.bindertest, PID: 8638
android.os.DeadObjectException: Transaction failed on small parcel; remote process probably died, but this could also be caused by running out of binder buffer space
    at android.os.BinderProxy.transactNative(Native Method)
    at android.os.BinderProxy.transact(BinderProxy.java:639)
    at com.ericcode.bindertest.ICalculator$Stub$Proxy.set(ICalculator.java:130)
    at com.ericcode.bindertest.MainActivity$testBinderThreadPoolOneway$1$onServiceConnected$1$1.invoke(MainActivity.kt:42)
    at com.ericcode.bindertest.MainActivity$testBinderThreadPoolOneway$1$onServiceConnected$1$1.invoke(MainActivity.kt:40)
    at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

远端进程肯定没有挂掉,那就是缓存的问题了

加入try catch

try {
    set(i)
} catch (e: Exception) {
    Log.e(TAG, "failed in $i", e)
}

虽然异常还会存在,服务端无法收到异常的调用,但是异常之后的其他调用,会成功。
比如:3000这个调用失败了,但是3001这个调用可能会成功。

所以说在迅速批量调用oneway方法时,oneway是不可靠的。我们需要尽量避免这种情况下使用oneway关键字。

stackoverflow上的案例

| https://stackoverflow.com/questions/45432647/android-throw-deadobjectexception-with-log-transaction-failed-on-small-parcel

可以通过以下步骤触发该错误:

  1. Process1向Process2发送大数据(例如980kB),Process2需要睡眠30秒,并且大binder缓冲区不会被释放。
  2. Process1 向 Process2 发送广播,其中包含例如 50kB 数据。这将超出 1016kB 的缓冲区容量,因为 980kB + 50kB 大于缓冲区容量。
  3. BroadcastQueue会抛出DeadObjectException,然后将scheduleCrash传递给应用程序端的ActivityThread。

即发生此错误时,可能不是本地调用的问题,而是存在一个调用没有返回,仍然占用了大量的buff。

| 参考:https://www.jianshu.com/p/4c8d346185cb
| https://stackoverflow.com/questions/45432647/android-throw-deadobjectexception-with-log-transaction-failed-on-small-parcel