内核内存锁定机制与用户空间内存锁定的交互分析

发布于:2025-05-12 ⋅ 阅读:(11) ⋅ 点赞:(0)

内核内存锁定机制与用户空间内存锁定的交互分析

在Linux系统中,内存锁定机制通过mlockmlockall系统调用实现用户空间内存的物理驻留保证。但当应用程序通过ioctl等系统调用触发内核分配内存时,这种内核分配的内存的锁定行为需要从以下四个层面进行深入分析:

一、用户空间与内核空间的内存管理边界

1. 地址空间隔离机制

Linux采用双地址空间模型(用户空间0-3GB,内核空间3-4GB x86架构),通过CR3寄存器切换页表实现隔离。用户进程通过系统调用陷入内核态时,CPU自动切换到内核页表,此时访问的内核内存属于全局地址空间,与用户进程无关。

2. 内存分配路径差异

  • 用户空间分配:通过mallocbrk/mmap→页错误→内核分配物理页→建立用户页表映射
  • 内核空间分配:通过kmalloc/vmalloc直接调用SLAB或伙伴系统,建立内核页表映射

3. 锁定机制作用域

mlockall(MCL_CURRENT)仅锁定当前用户页表项(PTE)中已存在的映射,内核通过struct mm_struct管理进程内存,锁定操作通过设置VM_LOCKED标志实现,该标志仅影响用户VMA区域。

二、内核内存分配的具体场景分析

1. 直接内核内存分配

当驱动程序通过ioctl调用kmalloc(GFP_KERNEL)分配内存时:

// 典型驱动代码片段
static long my_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
    void *kernel_buf = kmalloc(BUF_SIZE, GFP_KERNEL);
    copy_from_user(kernel_buf, user_buf, BUF_SIZE);
    // 数据处理
    kfree(kernel_buf);
    return 0;
}

此类内存:

  • 分配于内核地址空间的高端内存区域(ZONE_HIGHMEM)
  • 不被任何用户页表映射
  • 通过__get_free_pages最终调用伙伴系统分配

2. DMA缓冲区分配

使用dma_alloc_coherent接口时:

void *dma_buf = dma_alloc_coherent(dev, size, &dma_handle, GFP_KERNEL);

此时:

  • 内存可能来自DMA区域(ZONE_DMA)
  • 建立永久的内核线性映射(可通过kmap访问)
  • 产生/proc/iomem中的资源记录

3. 用户态直接访问的内核内存

通过mmap实现用户空间直接访问:

// 驱动mmap实现
static int my_mmap(struct file *filp, struct vm_area_struct *vma)
{
    remap_pfn_range(vma, vma->vm_start, pfn, size, vma->vm_page_prot);
    return 0;
}

这种情况:

  • 用户页表建立到内核物理页的映射
  • 内存仍属于内核管理范畴
  • mlock可锁定此类映射页面(因属于用户VMA)

三、内存锁定的实现机制对比

1. 用户空间锁定流程

// mlockall系统调用路径
SYSCALL_DEFINE1(mlockall, int, flags)
{
    vm_flags |= VM_LOCKED;
    apply_to_page_range(...mlock_fixup...);
}

关键步骤:

  • 遍历进程所有VMA区域
  • 设置VM_LOCKED标志
  • 调用mlock_fixup立即锁定现有页面

2. 内核内存锁定特性

内核页面默认具有以下属性:

  • 页表项_PAGE_PRESENT始终有效
  • 不被加入LRU链表(通过__SetPageLRU
  • 通过mark_page_accessed维护访问状态
  • 部分关键页面标记为PG_reserved

3. 锁定效果监测

通过/proc//smaps可观察:

7f8e6c000000-7f8e6c021000 rw-p 00000000 00:00 0 
Size:                132 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB  # 内核分配页面无锁定计数

四、实际测试与性能影响

1. 测试方案设计

使用以下模块验证:

// 测试驱动模块
static long test_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
    struct page *page = alloc_pages(GFP_KERNEL, 0);
    // 记录物理地址供后续检查
    return 0;
}

// 用户程序
mlockall(MCL_CURRENT);
ioctl(fd, ALLOC_CMD);
// 读取/proc/pagemap验证页面状态

2. 结果分析

通过pagemap工具解析:

# pagemap解析脚本
with open('/proc/pid/pagemap', 'rb') as f:
    f.seek(vpn * 8)
    entry = struct.unpack('Q', f.read(8))[0]
    pfn = entry & 0x7fffffffffffff
    swapped = (entry >> 62) & 1

测试发现:

  • 内核分配的页面未出现在用户空间VMA区域
  • pagemap中对应虚拟地址无有效PFN
  • vmstat统计的nr_mlock计数无变化

3. 性能影响评估

当大量内核内存分配导致系统内存压力时:

  • 用户空间锁定内存受到RLIMIT_MEMLOCK保护
  • 内核通过psi监控触发直接内存回收
  • 可能产生mm_lock竞争导致调度延迟

五、结论与最佳实践

通过上述分析可得出结论:

  1. 作用域隔离mlockall仅影响用户空间VMA映射的页面,内核分配的内存不受其控制
  2. 生命周期差异:内核内存由SLAB/伙伴系统管理,独立于进程生命周期
  3. 安全边界:防止用户空间通过内存锁定干扰内核内存管理

对于需要保证内核内存驻留的场景,建议:

  • 驱动程序使用GFP_NOIOGFP_NOFS避免递归I/O
  • 关键数据结构采用vmalloc并配合mlock用户映射区域
  • 对于DMA操作使用dma_alloc_attrs设置DMA_ATTR_NO_KERNEL_MAPPING

最终架构示意图如下:

+-------------------+     +-------------------+
|  User Space       |     |  Kernel Space     |
|                   |     |                   |
|  mlock()区域       |     |  kmalloc内存池     |
|  (VM_LOCKED)     |     |  (无锁定标志)       |
+--------+----------+     +---------+---------+
         |                          |
         |          Page Table      |
         +--------------------------> PFN管理
                                     |
                              +------v------+
                              | 物理内存      |
                              | (DRAM)     |
                              +-------------+

Citations:
[1] https://man7.org/linux/man-pages/man2/mlockall.2.html
[2] https://man.archlinux.org/man/mlockall.2.en
[3] https://pubs.opengroup.org/onlinepubs/7908799/xsh/mlockall.html
[4] https://www.kernel.org/doc/html/v6.9/admin-guide/mm/pagemap.html
[5] https://community.osr.com/t/massive-data-exchange-between-user-and-kernel-spaces-best-practice-question/50419
[6] https://stackoverflow.com/questions/4535379/do-kernel-pages-get-swapped-out
[7] https://docs.kernel.org/mm/unevictable-lru.html
[8] https://stackoverflow.com/questions/10017928/how-do-you-understand-mlockall-man-page
[9] https://stackoverflow.com/questions/63929431/if-i-mmap-a-memory-region-with-no-access-bits-set-does-mlockall-still-force-it
[10] https://www.osronline.com/article.cfm%5Eid=39.htm
[11] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/using_mlock_to_avoid_page_io
[12] https://linux.kernel.narkive.com/Dni31jcZ/how-to-get-the-physical-page-addresses-from-a-kernel-virtual-address-for-dma-sg-list
[13] https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/mm/mlock.c
[14] http://man.he.net/man2/mlockall
[15] https://www.kernel.org/doc/html/v5.4/vm/unevictable-lru.html
[16] https://www.ibm.com/docs/en/aix/7.2?topic=m-mlockall-munlockall-subroutine
[17] https://discuss.elastic.co/t/cannot-set-up-mlockall-true-on-redhat-6-6/1059
[18] https://www.usenix.org/system/files/conference/atc13/atc13-menychtas.pdf
[19] https://stackoverflow.com/questions/63929431/if-i-mmap-a-memory-region-with-no-access-bits-set-does-mlockall-still-force-it
[20] https://www3.physnet.uni-hamburg.de/physnet/Tru64-Unix/HTML/APS33DTE/DOCU_005.HTM
[21] https://eric-lo.gitbook.io/memory-mapped-io/pin-the-page
[22] https://www.gnu.org/s/libc/manual/html_node/Page-Lock-Functions.html
[23] https://www.ibm.com/docs/en/zos/2.4.0?topic=functions-mlockall-lock-address-space-process
[24] https://forums.codeguru.com/showthread.php?383608-mlockall
[25] https://www.kernel.org/doc/html/v5.18/vm/unevictable-lru.html
[26] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/8/html/optimizing_rhel_8_for_real_time_for_low_latency_operation/assembly_using-mlock-system-calls-on-rhel-for-real-time_optimizing-rhel8-for-real-time-for-low-latency-operation
[27] https://www.cs.auckland.ac.nz/references/unix/digital/APS33DTE/DOCU_005.HTM
[28] https://stackoverflow.com/questions/56411164/can-i-ask-the-kernel-to-populate-fault-in-a-range-of-anonymous-pages
[29] https://wiki.linuxfoundation.org/realtime/documentation/howto/applications/memory
[30] https://developer.ibm.com/articles/l-kernel-memory-access/
[31] https://forums.raspberrypi.com/viewtopic.php?t=296233
[32] https://stackoverflow.com/questions/36593457/protecting-shared-memory-segment-between-kernel-and-user-space
[33] https://man7.org/linux/man-pages/man2/perf_event_open.2.html
[34] https://docs.kernel.org/arch/x86/mtrr.html
[35] https://www.tutorialspoint.com/unix_system_calls/mlock.htm
[36] https://www.qnx.com/developers/docs/7.1/
[37] https://www.kernel.org/doc/html/v4.13/gpu/drm-mm.html
[38] https://www.kernel.org/doc/gorman/html/understand/understand013.html
[39] https://askubuntu.com/questions/157793/why-is-swap-being-used-even-though-i-have-plenty-of-free-ram
[40] https://stackoverflow.com/questions/42312978/
[41] https://docs.couchbase.com/server/current/install/install-swap-space.html
[42] https://www.reddit.com/r/linux/comments/1ecg0ov/does_swap_cost_kernel_memory/
[43] https://www.kernel.org/doc/gorman/html/understand/understand014.html
[44] https://serverfault.com/questions/48486/what-is-swap-memory
[45] https://machaddr.substack.com/p/linux-swap-memory-evolution-tuning
[46] https://www.infradead.org/git/?p=users%2Fjedix%2Flinux-maple.git%3Ba%3Dblob_plain%3Bf%3Dmm%2Fmlock.c%3Bhb%3D5499315668dae0e0935489075aadac4a91ff04ff
[47] https://lkml2.uits.iu.edu/hypermail/linux/kernel/0201.1/0205.html
[48] https://unix.stackexchange.com/questions/600699/does-page-swapping-happen-when-the-main-memory-is-still-available
[49] https://kernel.org/doc/gorman/html/understand/understand014.html
[50] https://www.scoutapm.com/blog/understanding-page-faults-and-memory-swap-in-outs-when-should-you-worry
[51] https://serverfault.com/questions/1007070/is-it-possible-to-manually-swap-out-a-page-by-its-virtual-address
[52] https://www.openeuler.org/en/blog/liqunsheng/2020-11-26-swap.html
[53] https://www.reddit.com/r/linuxquestions/comments/17t3110/how_does_the_kernel_use_swap_space_and_what_are/
[54] https://unix.stackexchange.com/questions/678806/how-does-the-kernel-decide-between-disk-cache-vs-swap
[55] https://www.kernel.org/doc/html/v5.0/vm/unevictable-lru.html
[56] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/using_mlock_to_avoid_page_io
[57] https://lkml.indiana.edu/1709.3/01588.html
[58] https://kernel.googlesource.com/pub/scm/linux/kernel/git/daeinki/drm-exynos/+/refs/tags/drm-fixes-2022-04-23/Documentation/vm/unevictable-lru.rst
[59] https://github.com/tinganho/linux-kernel/blob/master/mm/mlock.c


网站公告

今日签到

点亮在社区的每一天
去签到