rpc基础

这章的书籍部分（第八章）讲了非常多设计的演化

推荐都去看看

概念演化和设计思路

简单IPC
数据传递
- 基于共享内存的request/response模型，zero copy
- 基于OS内核: 两次拷贝
- L4 微内核系统：内存重映射（memory remapping）优化，两次缩到一次（以及它的问题）
通知机制：控制流转移
- 单向/双向通信：共享内存，管道，信号，套接字，…
- 同步/异步通信：阻塞还是不阻塞，阻塞的话DoS和超时时间
- 双方/多方通信
- 直接/间接通信：
  - 权限管理和安全问题（linux的结合fs权限检查，现代微内核的capability）
  - 接收方的选择（进程？线程？如何保证命名服务，依赖文件系统还是全局标识符，全局标识符如何避免攻击？）
- linux的管道
- 基于共享内存的IPC
- L4 IPC
  - 寄存器与虚拟寄存器传递的短消息
  - 从内存重映射到共享内存的长消息
  - 惰性调度的设计取舍
    - 优点：在ipc阻塞暂时的情况下，只修改TCB减少队列操作，降低TLB miss等开销
    - 缺点：增加调度系统复杂度，调度时间与当前ipc密度耦合，实时系统不适用
  - 直接进程切换的设计取舍：缓存命中，加速控制流转移，代价是可能的优先级失效
  - 通信连接：直接线程通信带来全局ID的安全性问题，后续系统更倾向于间接通信，采用capability权限系统
- LRPC的设计：与其传输数据不如传输代码（迁移线程模型），通过传递IPC服务进程的函数和页表，使得能够在本地进程和本地核上不通过核间通信，将IPI变成syscall
- Chcore的设计：类似L4和LRPC，server client注册服务后通过capability子系统完成conn的建立，之后进程间通信演变成syscall sys_ipc_call和sys_ipc_ret
- Android的设计 Binder IPC: 强依赖用户态服务
  - context manager设计
  - 线程池模型
  - binder 句柄的传输设计：特殊数据 + offset避免扫描，可以传递文件描述符等
  - 匿名共享内存Ashmem：解决mmap和system v+ipc key不够灵活的问题

chcore的ipc源码分析

ipc

先来看头文件里面是如何定义server_handler的

可以看用于通信的memory之中，头部的8个字节用于存储ipc response的header(实际上只有返回值)，而server_handler会接受共享内存块的地址、长度、能力的个数和能力组ID

(忘了能力组了？参考 https://sjtu-ipads.github.io/OS-Course-Lab/Lab3/RTFSC.html)

// build/chcore-libc/include/uapi/ipc.h
// clang-format off
/**
 * This structure would be placed at the front of shared memory
 * of an IPC connection. So it should be 8 bytes aligned, letting
 * any kinds of data structures following it can be properly aligned.
 * 
 * This structure is written by IPC server, and read by client. 
 * IPC server should never read from it. 
 * 
 * Layout of shared memory is shown as follows:
 *           ┌───┬──────────────────────────────────────────────┐
 *           │   │                                              │
 *           │   │                                              │
 *           │   │     custom data(defined by IPC protocol)     │
 *           │   │                                              │
 *           │   │                                              │
 *           └─┬─┴──────────────────────────────────────────────┘
 *             │
 *             │
 *             ▼
 *   struct ipc_response_hdr
 */
// clang-format on
struct ipc_response_hdr {
    unsigned int return_cap_num;
} __attribute__((aligned(8)));

#define SHM_PTR_TO_CUSTOM_DATA_PTR(shm_ptr) ((void *)((char *)(shm_ptr) + sizeof(struct ipc_response_hdr)))

/**
 * @brief This type specifies the function signature that an IPC server 
 * should follow to be properly called by the kernel.
 * 
 * @param shm_ptr: pointer to start address of IPC shared memory. Use
 * SHM_PTR_TO_CUSTOM_DATA_PTR macro to convert it to concrete custom
 * data pointer.
 * @param max_data_len: length of IPC shared memory.
 * @param send_cap_num: number of capabilites sent by client in this request.
 * @param client_badge: badge of client.
 */
typedef void (*server_handler)(void *shm_ptr, unsigned int max_data_len, unsigned int send_cap_num, badge_t client_badge);

然后我们来看如何register server

// user/system-services/chcore-libc/libchcore/porting/overrides/src/chcore-port/ipc.c
/*
 * Currently, a server thread can only invoke this interface once.
 * But, a server can use another thread to register a new service.
 */
int ipc_register_server_with_destructor(server_handler server_handler,
                                        void *(*client_register_handler)(void *),
                                        server_destructor server_destructor)
{
        cap_t register_cb_thread_cap;
        int ret;

/*
 * Create a passive thread for handling IPC registration.
 * - run after a client wants to register
 * - be responsible for initializing the ipc connection
 */
#define ARG_SET_BY_KERNEL 0
        pthread_t handler_tid;
        register_cb_thread_cap =
                chcore_pthread_create_register_cb(&handler_tid,
                                                  NULL,
                                                  client_register_handler,
                                                  (void *)ARG_SET_BY_KERNEL);
        BUG_ON(register_cb_thread_cap < 0);
        /*
         * Kernel will pass server_handler as the argument for the
         * register_cb_thread.
         */
        ret = usys_register_server((unsigned long)server_handler,
                                   (unsigned long)register_cb_thread_cap,
                                   (unsigned long)server_destructor);
        if (ret != 0) {
                printf("%s failed (retval is %d)\n", __func__, ret);
        }
        return ret;
}

核心是两个函数，

一是chcore_pthread_create_register_cb，创建执行注册的pthread线程，将其插入kernel调度启的ready queue之中，并得到其能力组作为返回值

在函数之中还处理了很多dirty walk, 例如线程入口点模拟函数调用的arch ABI, 内核栈和tls的工作等等

二是usys_register_server，他只是执行一个syscall

int usys_register_server(unsigned long callback,
                         cap_t register_thread_cap,
                         unsigned long destructor)
{
        return chcore_syscall3(CHCORE_SYS_register_server,
                               callback,
                               register_thread_cap,
                               destructor);
}
int sys_register_server(unsigned long ipc_routine, cap_t register_thread_cap,
                        unsigned long destructor)
{
        return register_server(
                current_thread, ipc_routine, register_thread_cap, destructor);
}

这个syscall实际调用的函数如下

server就是current_thread，也就是实现了类似LRPC优化的设计

随后调用ChCore提供的的系统调用：sys_register_server。该系统调用实现在kernel/ipc/connection.c当中，该系统调用会分配并初始化一个struct ipc_server_config和一个struct ipc_server_register_cb_config。之后将调用者线程（即主线程）的general_ipc_config字段设置为创建的struct ipc_server_config，其中记录了注册回调线程和IPC服务线程的入口函数（即图中的ipc_dispatcher）。将注册回调线程的general_ipc_config字段设置为创建的struct ipc_server_register_cb_config，其中记录了注册回调线程的入口函数和用户态栈地址等信息。

/*
 * Overall, a server thread that declares a serivce with this interface
 * should specify:
 * @ipc_routine (the real ipc service routine entry),
 * @register_thread_cap (another server thread for handling client
 * registration), and
 * @destructor (one routine invoked when some connnection is closed).
 */
static int register_server(struct thread *server, unsigned long ipc_routine,
                           cap_t register_thread_cap, unsigned long destructor)
{
        struct ipc_server_config *config;
        struct thread *register_cb_thread;
        struct ipc_server_register_cb_config *register_cb_config;

        BUG_ON(server == NULL);
        if (server->general_ipc_config != NULL) {
                kdebug("A server thread can only invoke **register_server** once!\n");
                return -EINVAL;
        }

        /*
         * Check the passive thread in server for handling
         * client registration.
         */
        register_cb_thread =
                obj_get(current_cap_group, register_thread_cap, TYPE_THREAD);
        if (!register_cb_thread) {
                kdebug("A register_cb_thread is required.\n");
                return -ECAPBILITY;
        }

        if (register_cb_thread->thread_ctx->type != TYPE_REGISTER) {
                kdebug("The register_cb_thread should be TYPE_REGISTER!\n");
                obj_put(register_cb_thread);
                return -EINVAL;
        }

        config = kmalloc(sizeof(*config));
        if (!config) {
                obj_put(register_cb_thread);
                return -ENOMEM;
        }

        /*
         * @ipc_routine will be the real ipc_routine_entry.
         * No need to validate such address because the server just
         * kill itself if the address is illegal.
         */
        config->declared_ipc_routine_entry = ipc_routine;

        /* Record the registration cb thread */
        config->register_cb_thread = register_cb_thread;

        register_cb_config = kmalloc(sizeof(*register_cb_config));
        if (!register_cb_config) {
                kfree(config);
                obj_put(register_cb_thread);
                return -ENOMEM;
        }
        register_cb_thread->general_ipc_config = register_cb_config;

        /*
         * This lock will be used to prevent concurrent client threads
         * from registering.
         * In other words, a register_cb_thread can only serve
         * registration requests one-by-one.
         */
        lock_init(&register_cb_config->register_lock);

        /* Record PC as well as the thread's initial stack (SP). */
        register_cb_config->register_cb_entry =
                arch_get_thread_next_ip(register_cb_thread);
        register_cb_config->register_cb_stack =
                arch_get_thread_stack(register_cb_thread);
        register_cb_config->destructor = destructor;
        obj_put(register_cb_thread);

#if defined(CHCORE_ARCH_AARCH64)
        /* The following fence can ensure: the config related data,
         * e.g., the register_lock, can been seen when
         * server->general_ipc_config is set.
         */
        smp_mb();
#else
        /* TSO: the fence is not required. */
#endif

        /*
         * The last step: fill the general_ipc_config.
         * This field is also treated as the whether the server thread
         * declares an IPC service (or makes the service ready).
         */
        server->general_ipc_config = config;

        return 0;
}

之后是客户端建立连接

客户端创建对应共享内存，并分享给server handler

这里用PMO_DATA而不用PMO_SHM的原因是PMO_DATA没有lazy alloc，而针对ipc register这样的小内存场景，我们不需要lazy alloc

usys_create_pmo, usys_yield同理只是syscall的一个简单包装

然后客户端尝试发起注册系统调用

/*
 * A client thread can register itself for multiple times.
 *
 * The returned ipc_struct_t is from heap,
 * so the callee needs to free it.
 */
ipc_struct_t *ipc_register_client(cap_t server_thread_cap)
{
        cap_t conn_cap;
        ipc_struct_t *client_ipc_struct;

        struct client_shm_config shm_config;
        cap_t shm_cap;

        client_ipc_struct = malloc(sizeof(ipc_struct_t));
        if (client_ipc_struct == NULL) {
                return NULL;
        }

        /*
         * Before registering client on the server,
         * the client allocates the shm (and shares it with
         * the server later).
         *
         * Now we used PMO_DATA instead of PMO_SHM because:
         * - SHM (IPC_PER_SHM_SIZE) only contains one page and
         *   PMO_DATA is thus more efficient.
         *
         * If the SHM becomes larger, we can use PMO_SHM instead.
         * Both types are tested and can work well.
         */

        // shm_cap = usys_create_pmo(IPC_PER_SHM_SIZE, PMO_SHM);
        shm_cap = usys_create_pmo(IPC_PER_SHM_SIZE, PMO_DATA);
        if (shm_cap < 0) {
                printf("usys_create_pmo ret %d\n", shm_cap);
                goto out_free_client_ipc_struct;
        }

        shm_config.shm_cap = shm_cap;
        shm_config.shm_addr = chcore_alloc_vaddr(IPC_PER_SHM_SIZE); // 0x1000

        // printf("%s: register_client with shm_addr 0x%lx\n",
        //      __func__, shm_config.shm_addr);

        while (1) {
                conn_cap = usys_register_client(server_thread_cap,
                                                (unsigned long)&shm_config);

                if (conn_cap == -EIPCRETRY) {
                        // printf("client: Try to connect again ...\n");
                        /* The server IPC may be not ready. */
                        usys_yield();
                } else if (conn_cap < 0) {
                        printf("client: %s failed (return %d), server_thread_cap is %d\n",
                               __func__,
                               conn_cap,
                               server_thread_cap);
                        goto out_free_vaddr;
                } else {
                        /* Success */
                        break;
                }
        }

        client_ipc_struct->lock = 0;
        client_ipc_struct->shared_buf = shm_config.shm_addr;
        client_ipc_struct->shared_buf_len = IPC_PER_SHM_SIZE;
        client_ipc_struct->conn_cap = conn_cap;

        return client_ipc_struct;

out_free_vaddr:
        usys_revoke_cap(shm_cap, false);
        chcore_free_vaddr(shm_config.shm_addr, IPC_PER_SHM_SIZE);

out_free_client_ipc_struct:
        free(client_ipc_struct);

        return NULL;
}

int ipc_client_close_connection(ipc_struct_t *ipc_struct)
{
        int ret;
        while (1) {
                ret = usys_ipc_close_connection(ipc_struct->conn_cap);

                if (ret == -EAGAIN) {
                        usys_yield();
                } else if (ret < 0) {
                        goto out;
                } else {
                        break;
                }
        }

        chcore_free_vaddr(ipc_struct->shared_buf, ipc_struct->shared_buf_len);
        free(ipc_struct);
out:
        return ret;
}

注册系统调用对应的实际函数如下

大体流程是

从当前thread 的cap_group里面找到传入的server_cap对应的slot，进而得到server线程控制块
从server获取它的ipc config，拿锁避免并发问题
检查client声明的共享内存地址没问题之后，拷贝到内核态，再给它去实际map共享内存
创建connection对象，并把cap给到server和client
设置好调用参数，栈寄存器，异常处理寄存器
然后调用sched切换控制权给注册的回调函数
(也可以看到整个流程不涉及到IPI)

cap_t sys_register_client(cap_t server_cap, unsigned long shm_config_ptr)
{
        struct thread *client;
        struct thread *server;

        /*
         * No need to initialize actually.
         * However, fbinfer will complain without zeroing because
         * it cannot tell copy_from_user.
         */
        struct client_shm_config shm_config = {0};
        int r;
        struct client_connection_result res;

        struct ipc_server_config *server_config;
        struct thread *register_cb_thread;
        struct ipc_server_register_cb_config *register_cb_config;

        client = current_thread;

        server = obj_get(current_cap_group, server_cap, TYPE_THREAD);
        if (!server) {
                r = -ECAPBILITY;
                goto out_fail;
        }

        server_config =
                (struct ipc_server_config *)(server->general_ipc_config);
        if (!server_config) {
                r = -EIPCRETRY;
                goto out_fail;
        }

        /*
         * Locate the register_cb_thread first.
         * And later, directly transfer the control flow to it
         * for finishing the registration.
         *
         * The whole registration procedure:
         * client thread -> server register_cb_thread -> client threrad
         */
        register_cb_thread = server_config->register_cb_thread;
        register_cb_config =
                (struct ipc_server_register_cb_config
                         *)(register_cb_thread->general_ipc_config);

        /* Acquiring register_lock: avoid concurrent client registration.
         *
         * Use try_lock instead of lock since the unlock operation is done by
         * another thread and ChCore does not support mutex.
         * Otherwise, dead lock may happen.
         */
        if (try_lock(&register_cb_config->register_lock) != 0) {
                r = -EIPCRETRY;
                goto out_fail;
        }

        /* Validate the user addresses before accessing them */
        if (check_user_addr_range(shm_config_ptr, sizeof(shm_config) != 0)) {
                r = -EINVAL;
                goto out_fail_unlock;
        }

        r = copy_from_user((void *)&shm_config,
                           (void *)shm_config_ptr,
                           sizeof(shm_config));
        if (r) {
                r = -EINVAL;
                goto out_fail_unlock;
        }

        /* Map the pmo of the shared memory */
        r = map_pmo_in_current_cap_group(
                shm_config.shm_cap, shm_config.shm_addr, VMR_READ | VMR_WRITE);
        if (r != 0) {
                goto out_fail_unlock;
        }

        /* Create the ipc_connection object */
        r = create_connection(
                client, server, shm_config.shm_cap, shm_config.shm_addr, &res);

        if (r != 0) {
                goto out_fail_unlock;
        }

        /* Record the connection cap of the client process */
        register_cb_config->conn_cap_in_client = res.client_conn_cap;
        register_cb_config->conn_cap_in_server = res.server_conn_cap;
        /* Record the server_shm_cap for current connection */
        register_cb_config->shm_cap_in_server = res.server_shm_cap;

        /* Mark current_thread as TS_BLOCKING */
        thread_set_ts_blocking(current_thread);

        /* Set target thread SP/IP/arg */
        arch_set_thread_stack(register_cb_thread,
                              register_cb_config->register_cb_stack);
        arch_set_thread_next_ip(register_cb_thread,
                                register_cb_config->register_cb_entry);
        arch_set_thread_arg0(register_cb_thread,
                             server_config->declared_ipc_routine_entry);
        obj_put(server);

        /* Pass the scheduling context */
        register_cb_thread->thread_ctx->sc = current_thread->thread_ctx->sc;

        /* On success: switch to the cb_thread of server  */
        sched_to_thread(register_cb_thread);

        /* Never return */
        BUG_ON(1);

out_fail_unlock:
        unlock(&register_cb_config->register_lock);
out_fail: /* Maybe EAGAIN */
        if (server)
                obj_put(server);
        return r;
}

这个注册的回调函数就是server register时候设置的函数，也就是server线程传递的函数(代码)，一般而言，这个函数采用默认值register_cb

 #define DEFAULT_CLIENT_REGISTER_HANDLER register_cb

register_cb函数如下

该函数首先分配一个用来映射共享内存的虚拟地址，随后创建一个服务线程。

随后调用sys_ipc_register_cb_return系统调用进入内核，该系统调用将共享内存映射到刚才分配的虚拟地址上，补全struct ipc_connection内核对象中的一些元数据之后切换回客户端线程继续运行，客户端线程从ipc_register_client返回，完成IPC建立连接的过程。

/* A register_callback thread uses this to finish a registration */
void ipc_register_cb_return(cap_t server_thread_cap,
                            unsigned long server_thread_exit_routine,
                            unsigned long server_shm_addr)
{
        usys_ipc_register_cb_return(
                server_thread_cap, server_thread_exit_routine, server_shm_addr);
}

/* A register_callback thread is passive (never proactively run) */
void *register_cb(void *ipc_handler)
{
        cap_t server_thread_cap = 0;
        unsigned long shm_addr;

        shm_addr = chcore_alloc_vaddr(IPC_PER_SHM_SIZE);

        // printf("[server]: A new client comes in! ipc_handler: 0x%lx\n",
        // ipc_handler);

        /*
         * Create a passive thread for serving IPC requests.
         * Besides, reusing an existing thread is also supported.
         */
        pthread_t handler_tid;
        server_thread_cap = chcore_pthread_create_shadow(
                &handler_tid, NULL, ipc_handler, (void *)NO_ARG);
        BUG_ON(server_thread_cap < 0);
#ifndef CHCORE_ARCH_X86_64
        ipc_register_cb_return(server_thread_cap,
                               (unsigned long)ipc_shadow_thread_exit_routine,
                               shm_addr);
#else
        ipc_register_cb_return(
                server_thread_cap,
                (unsigned long)ipc_shadow_thread_exit_routine_naked,
                shm_addr);
#endif

        return NULL;
}

int sys_ipc_register_cb_return(cap_t server_handler_thread_cap,
                               unsigned long server_thread_exit_routine,
                               unsigned long server_shm_addr)
{
        struct ipc_server_register_cb_config *config;
        struct ipc_connection *conn;
        struct thread *client_thread;

        struct thread *ipc_server_handler_thread;
        struct ipc_server_handler_config *handler_config;
        int r = -ECAPBILITY;

        config = (struct ipc_server_register_cb_config *)
                         current_thread->general_ipc_config;

        if (!config)
                goto out_fail;

        conn = obj_get(
                current_cap_group, config->conn_cap_in_server, TYPE_CONNECTION);

        if (!conn)
                goto out_fail;

        /*
         * @server_handler_thread_cap from server.
         * Server uses this handler_thread to serve ipc requests.
         */
        ipc_server_handler_thread = (struct thread *)obj_get(
                current_cap_group, server_handler_thread_cap, TYPE_THREAD);

        if (!ipc_server_handler_thread)
                goto out_fail_put_conn;

        /* Map the shm of the connection in server */
        r = map_pmo_in_current_cap_group(config->shm_cap_in_server,
                                         server_shm_addr,
                                         VMR_READ | VMR_WRITE);
        if (r != 0)
                goto out_fail_put_thread;

        /* Get the client_thread that issues this registration */
        client_thread = conn->current_client_thread;
        /*
         * Set the return value (conn_cap) for the client here
         * because the server has approved the registration.
         */
        arch_set_thread_return(client_thread, config->conn_cap_in_client);

        /*
         * Initialize the ipc configuration for the handler_thread (begin)
         *
         * When the handler_config isn't NULL, it means this server handler
         * thread has been initialized before. If so, skip the initialization.
         * This will happen when a server uses one server handler thread for
         * serving multiple client threads.
         */
        if (!ipc_server_handler_thread->general_ipc_config) {
                handler_config = (struct ipc_server_handler_config *)kmalloc(
                        sizeof(*handler_config));
                if (!handler_config) {
                        r = -ENOMEM;
                        goto out_fail_put_thread;
                }
                ipc_server_handler_thread->general_ipc_config = handler_config;
                lock_init(&handler_config->ipc_lock);

                /*
                 * Record the initial PC & SP for the handler_thread.
                 * For serving each IPC, the handler_thread starts from the
                 * same PC and SP.
                 */
                handler_config->ipc_routine_entry =
                        arch_get_thread_next_ip(ipc_server_handler_thread);
                handler_config->ipc_routine_stack =
                        arch_get_thread_stack(ipc_server_handler_thread);
                handler_config->ipc_exit_routine_entry =
                        server_thread_exit_routine;
                handler_config->destructor = config->destructor;
        }
        obj_put(ipc_server_handler_thread);
        /* Initialize the ipc configuration for the handler_thread (end) */

        /* Fill the server information in the IPC connection. */
        conn->shm.server_shm_uaddr = server_shm_addr;
        conn->server_handler_thread = ipc_server_handler_thread;
        conn->state = CONN_VALID;
        conn->current_client_thread = NULL;
        conn->conn_cap_in_client = config->conn_cap_in_client;
        conn->conn_cap_in_server = config->conn_cap_in_server;
        obj_put(conn);

        /*
         * Return control flow (sched-context) back later.
         * Set current_thread state to TS_WAITING again.
         */
        thread_set_ts_waiting(current_thread);

        unlock(&config->register_lock);

        /* Register thread should not any more use the client's scheduling
         * context. */
        current_thread->thread_ctx->sc = NULL;

        /* Finish the registration: switch to the original client_thread */
        sched_to_thread(client_thread);
        /* Nerver return */

out_fail_put_thread:
        obj_put(ipc_server_handler_thread);
out_fail_put_conn:
        obj_put(conn);
out_fail:
        return r;
}

总结以上的流程：

server端注册回调，指明当client端连接时应该调用函数f处理，创建了这个函数f相关的内核对象（如能力组，上下文等，这个函数实质上是一个不会被主动调度到的线程（因为没有sc调度上下文））
client端注册，申请共享内存，指定调用参数，然后syscall, 内核校验后传递给f
f处理并返回

概念演化和设计思路​

chcore的ipc源码分析​

概念演化和设计思路

chcore的ipc源码分析