Linux arm64平台指令替换函数 aarch64_insn_patch_text_nosync

发布于:2024-07-09 ⋅ 阅读:(45) ⋅ 点赞:(0)

前言

这篇文章介绍了 Linux x86_64平台指令替换函数 text_poke_smp/bp

接下来介绍arm64平台指令替换函数aarch64_insn_patch_text_nosync,内核版本5.4.18为例。

一、简介

1.1 aarch64_insn_patch_text_nosync

// linux-5.4.18/arch/arm64/kernel/insn.c

int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
{
	u32 *tp = addr;
	int ret;

	/* A64 instructions must be word aligned */
	if ((uintptr_t)tp & 0x3)
		return -EINVAL;

	ret = aarch64_insn_write(tp, insn);
	if (ret == 0)
		__flush_icache_range((uintptr_t)tp,
				     (uintptr_t)tp + AARCH64_INSN_SIZE);

	return ret;
}

这段代码实现了在 AArch64 架构下进行指令修补的函数。

函数的参数包括:

addr:要修补的指令地址。
insn:修补后的指令。

函数的实现步骤如下:
(1)将 addr 强制转换为 u32* 类型,并赋值给变量 tp,即指令地址的指针。
(2)检查 tp 是否按字对齐,即低两位是否为零。如果不是,则返回错误码 -EINVAL 表示非法参数。
(3)调用 aarch64_insn_write() 函数将修补后的指令写入到 tp 指向的地址中,并将返回值赋值给变量 ret。
(4)如果 ret 的值为 0,表示指令写入成功,则调用 __flush_icache_range() 函数刷新指令高速缓存,以确保修补后的指令能够生效。该函数接受指令地址的起始和结束地址作为参数。
(5)返回 ret,表示修补操作的结果

1.2 aarch64_insn_write

static int __kprobes __aarch64_insn_write(void *addr, __le32 insn)
{
	void *waddr = addr;
	unsigned long flags = 0;
	int ret;

	raw_spin_lock_irqsave(&patch_lock, flags);
	waddr = patch_map(addr, FIX_TEXT_POKE0);

	ret = probe_kernel_write(waddr, &insn, AARCH64_INSN_SIZE);

	patch_unmap(FIX_TEXT_POKE0);
	raw_spin_unlock_irqrestore(&patch_lock, flags);

	return ret;
}

int __kprobes aarch64_insn_write(void *addr, u32 insn)
{
	return __aarch64_insn_write(addr, cpu_to_le32(insn));
}

这段代码实现了在 AArch64 架构下进行指令写入的函数 __aarch64_insn_write()。

函数的参数包括:

addr:要写入指令的地址。
insn:待写入的指令。

函数的实现步骤如下:
(1)调用 patch_map() 函数将 addr 映射到固定映射项 FIX_TEXT_POKE0 上,并将返回的映射后的地址赋值给 waddr。
(2)调用 probe_kernel_write() 函数将指令 insn 写入到地址 waddr,并将写入操作的返回值赋值给变量 ret。
(3)调用 patch_unmap() 函数解除固定映射项 FIX_TEXT_POKE0 的映射关系。

该函数首先使用原子自旋锁 raw_spin_lock_irqsave() 锁定 patch_lock,以确保在并发环境下的安全性。然后调用 patch_map() 函数将待写入指令的地址映射到固定映射项 FIX_TEXT_POKE0 上。接着使用 probe_kernel_write() 函数将指令写入到映射后的地址中。最后,解除映射关系、解锁自旋锁,并返回写入操作的结果。

1.3 patch_map

static void __kprobes *patch_map(void *addr, int fixmap)
{
	unsigned long uintaddr = (uintptr_t) addr;
	bool module = !core_kernel_text(uintaddr);
	struct page *page;

	if (module && IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
		page = vmalloc_to_page(addr);
	else if (!module)
		page = phys_to_page(__pa_symbol(addr));
	else
		return addr;

	BUG_ON(!page);
	return (void *)set_fixmap_offset(fixmap, page_to_phys(page) +
			(uintaddr & ~PAGE_MASK));
}

这段代码实现了一个函数 patch_map(),用于将给定的地址映射到固定映射项中,并返回映射后的地址。

函数的参数包括:

addr:要映射的地址。
fixmap:固定映射项的索引。

函数的实现步骤如下:
(1)将 addr 强制转换为 uintptr_t 类型,并将结果赋值给 uintaddr,即将地址转换为无符号整数。
(2)使用 core_kernel_text() 函数检查 addr 是否位于核心内核文本段(即内核代码段)之外。如果是,则说明地址属于模块的文本段。
(3)如果启用了严格的模块可读写执行(CONFIG_STRICT_MODULE_RWX),并且地址属于模块的文本段,则调用 vmalloc_to_page() 函数将虚拟地址转换为对应的页面结构。
(4)如果地址不属于模块的文本段,则调用 __pa_symbol() 函数将地址转换为物理地址,并使用 phys_to_page() 函数将物理地址转换为对应的页面结构。
(5)如果地址属于模块的文本段且严格模块可读写执行被禁用,或者地址不属于模块的文本段,则直接返回原始地址。
(6)使用 set_fixmap_offset() 宏将页面的物理地址与偏移进行设置,并将结果作为指针返回。

该函数的主要功能是根据给定的地址,将其映射到固定映射项中。在映射过程中,根据地址的类型(核心内核文本段或模块文本段),选择不同的转换方式,最终利用 set_fixmap_offset() 宏设置固定映射项的偏移地址,将页面的物理地址与地址的低位偏移相加,得到映射后的地址。

1.4 set_fixmap_offset

/* Return a pointer with offset calculated */
#define __set_fixmap_offset(idx, phys, flags)				\
({									\
	unsigned long ________addr;					\
	__set_fixmap(idx, phys, flags);					\
	________addr = fix_to_virt(idx) + ((phys) & (PAGE_SIZE - 1));	\
	________addr;							\
})

#define set_fixmap_offset(idx, phys) \
	__set_fixmap_offset(idx, phys, FIXMAP_PAGE_NORMAL)

__set_fixmap_offset() 宏用于设置固定映射项的偏移地址,并返回偏移后的地址。宏的参数包括:

idx:固定映射项的索引。
phys:页面的物理地址。
flags:固定映射的标志。

宏的实现步骤如下:
(1)声明一个局部变量 ________addr,用于保存计算后的地址。
(2)使用 __set_fixmap() 宏设置固定映射项的物理地址和标志。
(3)计算偏移后的地址,将 fix_to_virt(idx) 得到的虚拟地址与 phys 的低位偏移(((phys) & (PAGE_SIZE - 1)))相加,并将结果赋值给 ________addr。
(4)返回 ________addr。

set_fixmap_offset() 宏是对 __set_fixmap_offset() 宏的封装,固定映射的标志被设置为 FIXMAP_PAGE_NORMAL。

这两个宏主要用于在内核中设置固定映射项的偏移地址,并返回偏移后的地址。在给定固定映射项的物理地址后,通过计算偏移得到虚拟地址,以便进行地址映射和访问。

该宏的其他用途:

#define pte_set_fixmap(addr)		((pte_t *)set_fixmap_offset(FIX_PTE, addr))
#define pte_set_fixmap_offset(pmd, addr)	pte_set_fixmap(pte_offset_phys(pmd, addr))
#define pte_clear_fixmap()		clear_fixmap(FIX_PTE)
#define pmd_set_fixmap(addr)		((pmd_t *)set_fixmap_offset(FIX_PMD, addr))
#define pmd_set_fixmap_offset(pud, addr)	pmd_set_fixmap(pmd_offset_phys(pud, addr))
#define pmd_clear_fixmap()		clear_fixmap(FIX_PMD)
#define pud_set_fixmap(addr)		((pud_t *)set_fixmap_offset(FIX_PUD, addr))
#define pud_set_fixmap_offset(pgd, addr)	pud_set_fixmap(pud_offset_phys(pgd, addr))
#define pud_clear_fixmap()		clear_fixmap(FIX_PUD)
#define pgd_set_fixmap(addr)	((pgd_t *)set_fixmap_offset(FIX_PGD, addr))
#define pgd_clear_fixmap()	clear_fixmap(FIX_PGD)

1.5 __set_fixmap

// arch/arm64/mm/mmu.c

static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;

static inline pte_t * fixmap_pte(unsigned long addr)
{
	return &bm_pte[pte_index(addr)];
}

/*
 * Unusually, this is also called in IRQ context (ghes_iounmap_irq) so if we
 * ever need to use IPIs for TLB broadcasting, then we're in trouble here.
 */
void __set_fixmap(enum fixed_addresses idx,
			       phys_addr_t phys, pgprot_t flags)
{
	unsigned long addr = __fix_to_virt(idx);
	pte_t *ptep;

	BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);

	ptep = fixmap_pte(addr);

	if (pgprot_val(flags)) {
		set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags));
	} else {
		pte_clear(&init_mm, addr, ptep);
		flush_tlb_kernel_range(addr, addr+PAGE_SIZE);
	}
}

这段代码实现了函数 __set_fixmap(),用于设置固定映射项的物理地址和标志。

函数的参数包括:

idx:固定映射项的索引。
phys:页面的物理地址。
flags:固定映射的标志。

函数的实现步骤如下:
(1)声明一个局部变量 addr,用于保存固定映射项的虚拟地址。
(2)使用 __fix_to_virt() 函数将固定映射项的索引转换为对应的虚拟地址,并将结果赋值给 addr。
(3)使用 BUG_ON() 宏检查索引是否在有效范围内。如果索引小于等于 FIX_HOLE 或大于等于 __end_of_fixed_addresses,则触发 BUG。
(4)调用 fixmap_pte() 函数获取固定映射项对应的页表项指针 ptep。
(5)如果 pgprot_val(flags) 不为 0,表示标志有效,则调用 set_pte() 函数设置页表项的值为 pfn_pte(phys >> PAGE_SHIFT, flags),即物理页帧号和标志。
(6)如果 pgprot_val(flags) 为 0,表示标志无效,则调用 pte_clear() 函数清除页表项,并使用 flush_tlb_kernel_range() 函数刷新 TLB 缓存,使之无效。

该函数用于在内核中设置固定映射项的物理地址和标志。根据传入的物理地址和标志,将对应的页表项设置为相应的值。如果标志无效,则清除页表项并刷新 TLB 缓存。

函数开头的注释提到,该函数也可能在 IRQ 上下文中被调用,如果以后需要使用 IPI(Inter-Processor Interrupt)来进行 TLB 广播,可能会出现问题。这意味着在某些情况下,该函数需要注意上下文的使用情况,以确保正确性和可靠性。

二、用途

2.1 jump lable

void arch_jump_label_transform(struct jump_entry *entry,
			       enum jump_label_type type)
{
	void *addr = (void *)jump_entry_code(entry);
	u32 insn;

	if (type == JUMP_LABEL_JMP) {
		insn = aarch64_insn_gen_branch_imm(jump_entry_code(entry),
						   jump_entry_target(entry),
						   AARCH64_INSN_BRANCH_NOLINK);
	} else {
		insn = aarch64_insn_gen_nop();
	}

	aarch64_insn_patch_text_nosync(addr, insn);
}

2.2 ftrace

(1)
参考:trace系列4 - kprobe学习笔记

// linux-5.4.18/kernel/trace/trace_events.c

static const struct file_operations ftrace_enable_fops = {
	.open = tracing_open_generic,
	.read = event_enable_read,
	.write = event_enable_write,
	.llseek = default_llseek,
};
// kernel/trace/trace_events.c
event_enable_write()
	-->ftrace_event_enable_disable()
		-->__ftrace_event_enable_disable(){
			-->struct trace_event_call *call = file->event_call;
			//kernel/trace/trace.c
			-->trace_buffered_event_enable()
			}
// kernel/trace/trace_kprobe.c

static inline void init_trace_event_call(struct trace_kprobe *tk)
{
	struct trace_event_call *call = trace_probe_event_call(&tk->tp);
	......
	call->class->reg = kprobe_register;
}

static int register_kprobe_event(struct trace_kprobe *tk)
{
	init_trace_event_call(tk);

	return trace_probe_register_event_call(&tk->tp);
}
// kernel/trace/trace_kprobe.c

kprobe_register()
	-->enable_trace_kprobe(){
	list_for_each_entry(pos, trace_probe_probe_list(tp), list) {
			tk = container_of(pos, struct trace_kprobe, tp);
			if (trace_kprobe_has_gone(tk))
				continue;
			ret = __enable_trace_kprobe(tk);
			if (ret)
				break;
			enabled = true;
		}
	}
// kernel/trace/trace_kprobe.c

static inline int __enable_trace_kprobe(struct trace_kprobe *tk)
{
	int ret = 0;

	if (trace_kprobe_is_registered(tk) && !trace_kprobe_has_gone(tk)) {
		if (trace_kprobe_is_return(tk))
			ret = enable_kretprobe(&tk->rp);
		else
			ret = enable_kprobe(&tk->rp.kp);
	}

	return ret;
}
// kernel/kprobes.c
/* Enable one kprobe */
enable_kprobe()
	-->arm_kprobe()
		-->__arm_kprobe()
			-->arch_arm_kprobe()

对于arm64(写入BRK断点指令):

// arch/arm64/kernel/probes/kprobes.c

/* arm kprobe: install breakpoint in text */
void __kprobes arch_arm_kprobe(struct kprobe *p)
{
	patch_text(p->addr, BRK64_OPCODE_KPROBES);
}
// arch/arm64/kernel/probes/kprobes.c

static int __kprobes patch_text(kprobe_opcode_t *addr, u32 opcode)
{
	void *addrs[1];
	u32 insns[1];

	addrs[0] = addr;
	insns[0] = opcode;

	return aarch64_insn_patch_text(addrs, insns, 1);
}
// arch/arm64/kernel/insn.c

int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
{
	struct aarch64_insn_patch patch = {
		.text_addrs = addrs,
		.new_insns = insns,
		.insn_cnt = cnt,
		.cpu_count = ATOMIC_INIT(0),
	};

	if (cnt <= 0)
		return -EINVAL;

	return stop_machine_cpuslocked(aarch64_insn_patch_text_cb, &patch,
				       cpu_online_mask);
}
// arch/arm64/kernel/insn.c

aarch64_insn_patch_text_cb()
	-->aarch64_insn_patch_text_nosync()

对于x86(写入int3断点指令):

// arch/x86/include/asm/kprobes.h
#define BREAKPOINT_INSTRUCTION	0xcc

// arch/x86/kernel/kprobes/core.c
void arch_arm_kprobe(struct kprobe *p)
{
	text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
}
// arch/x86/kernel/alternative.c

__ro_after_init struct mm_struct *poking_mm;
__ro_after_init unsigned long poking_addr;

static void *__text_poke(void *addr, const void *opcode, size_t len)
{
	bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
	struct page *pages[2] = {NULL};
	temp_mm_state_t prev;
	unsigned long flags;
	pte_t pte, *ptep;
	spinlock_t *ptl;
	pgprot_t pgprot;

	/*
	 * While boot memory allocator is running we cannot use struct pages as
	 * they are not yet initialized. There is no way to recover.
	 */
	BUG_ON(!after_bootmem);

	if (!core_kernel_text((unsigned long)addr)) {
		pages[0] = vmalloc_to_page(addr);
		if (cross_page_boundary)
			pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
	} else {
		pages[0] = virt_to_page(addr);
		WARN_ON(!PageReserved(pages[0]));
		if (cross_page_boundary)
			pages[1] = virt_to_page(addr + PAGE_SIZE);
	}
	/*
	 * If something went wrong, crash and burn since recovery paths are not
	 * implemented.
	 */
	BUG_ON(!pages[0] || (cross_page_boundary && !pages[1]));

	local_irq_save(flags);

	/*
	 * Map the page without the global bit, as TLB flushing is done with
	 * flush_tlb_mm_range(), which is intended for non-global PTEs.
	 */
	pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL);

	/*
	 * The lock is not really needed, but this allows to avoid open-coding.
	 */
	ptep = get_locked_pte(poking_mm, poking_addr, &ptl);

	/*
	 * This must not fail; preallocated in poking_init().
	 */
	VM_BUG_ON(!ptep);

	pte = mk_pte(pages[0], pgprot);
	set_pte_at(poking_mm, poking_addr, ptep, pte);

	if (cross_page_boundary) {
		pte = mk_pte(pages[1], pgprot);
		set_pte_at(poking_mm, poking_addr + PAGE_SIZE, ptep + 1, pte);
	}

	/*
	 * Loading the temporary mm behaves as a compiler barrier, which
	 * guarantees that the PTE will be set at the time memcpy() is done.
	 */
	prev = use_temporary_mm(poking_mm);

	kasan_disable_current();
	memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len);
	kasan_enable_current();

	/*
	 * Ensure that the PTE is only cleared after the instructions of memcpy
	 * were issued by using a compiler barrier.
	 */
	barrier();

	pte_clear(poking_mm, poking_addr, ptep);
	if (cross_page_boundary)
		pte_clear(poking_mm, poking_addr + PAGE_SIZE, ptep + 1);

	/*
	 * Loading the previous page-table hierarchy requires a serializing
	 * instruction that already allows the core to see the updated version.
	 * Xen-PV is assumed to serialize execution in a similar manner.
	 */
	unuse_temporary_mm(prev);

	/*
	 * Flushing the TLB might involve IPIs, which would require enabled
	 * IRQs, but not if the mm is not used, as it is in this point.
	 */
	flush_tlb_mm_range(poking_mm, poking_addr, poking_addr +
			   (cross_page_boundary ? 2 : 1) * PAGE_SIZE,
			   PAGE_SHIFT, false);

	/*
	 * If the text does not match what we just wrote then something is
	 * fundamentally screwy; there's nothing we can really do about that.
	 */
	BUG_ON(memcmp(addr, opcode, len));

	pte_unmap_unlock(ptep, ptl);
	local_irq_restore(flags);
	return addr;
}

/**
 * text_poke - Update instructions on a live kernel
 * @addr: address to modify
 * @opcode: source of the copy
 * @len: length to copy
 *
 * Only atomic text poke/set should be allowed when not doing early patching.
 * It means the size must be writable atomically and the address must be aligned
 * in a way that permits an atomic write. It also makes sure we fit on a single
 * page.
 *
 * Note that the caller must ensure that if the modified code is part of a
 * module, the module would not be removed during poking. This can be achieved
 * by registering a module notifier, and ordering module removal and patching
 * trough a mutex.
 */
void *text_poke(void *addr, const void *opcode, size_t len)
{
	lockdep_assert_held(&text_mutex);

	return __text_poke(addr, opcode, len);
}

(2)

#ifdef CONFIG_DYNAMIC_FTRACE
/*
 * Replace a single instruction, which may be a branch or NOP.
 * If @validate == true, a replaced instruction is checked against 'old'.
 */
static int ftrace_modify_code(unsigned long pc, u32 old, u32 new,
			      bool validate)
{
	u32 replaced;

	/*
	 * Note:
	 * We are paranoid about modifying text, as if a bug were to happen, it
	 * could cause us to read or write to someplace that could cause harm.
	 * Carefully read and modify the code with aarch64_insn_*() which uses
	 * probe_kernel_*(), and make sure what we read is what we expected it
	 * to be before modifying it.
	 */
	if (validate) {
		if (aarch64_insn_read((void *)pc, &replaced))
			return -EFAULT;

		if (replaced != old)
			return -EINVAL;
	}
	if (aarch64_insn_patch_text_nosync((void *)pc, new))
		return -EPERM;

	return 0;
}

/*
 * Replace tracer function in ftrace_caller()
 */
int ftrace_update_ftrace_func(ftrace_func_t func)
{
	unsigned long pc;
	u32 new;

	pc = (unsigned long)&ftrace_call;
	new = aarch64_insn_gen_branch_imm(pc, (unsigned long)func,
					  AARCH64_INSN_BRANCH_LINK);

	return ftrace_modify_code(pc, 0, new, false);
}

参考资料

Linux 5.4.18

trace系列4 - kprobe学习笔记


网站公告

今日签到

点亮在社区的每一天
去签到