Using Huge Pages in Linux for Big Data Processing
Huge pages can significantly improve performance for big data processing by reducing TLB (Translation Lookaside Buffer) misses and memory management overhead. Here’s how to use them in Linux with C/C++ examples.
1. Configuring Huge Pages in Linux
First, configure huge pages on your system:
# Check current huge page settings
cat /proc/meminfo | grep Huge
# Set number of huge pages (e.g., 1024 pages of 2MB each = 2GB)
sudo sysctl vm.nr_hugepages=1024
# Make it persistent by adding to /etc/sysctl.conf
echo "vm.nr_hugepages=1024" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
2. C/C++ Example Using Huge Pages
Here’s a complete example demonstrating huge page allocation:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#define HUGE_PAGE_SIZE (2 * 1024 * 1024) // 2MB for x86_64
#define ARRAY_SIZE (1024 * 1024 * 1024) // 1GB array
// Method 1: Using mmap with MAP_HUGETLB flag
void* allocate_huge_pages_mmap(size_t size) {
void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
-1, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
return NULL;
}
printf("Allocated %zu bytes using mmap+MAP_HUGETLB at %p\n", size, ptr);
return ptr;
}
// Method 2: Using hugetlbfs filesystem
void* allocate_huge_pages_hugetlbfs(size_t size) {
char path[] = "/dev/hugepages/hugepagefile";
int fd = open(path, O_CREAT | O_RDWR, 0755);
if (fd < 0) {
perror("open");
return NULL;
}
void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
close(fd);
return NULL;
}
printf("Allocated %zu bytes using hugetlbfs at %p\n", size, ptr);
close(fd);
return ptr;
}
void process_large_data(double* data, size_t size) {
// Simulate big data processing
for (size_t i = 0; i < size / sizeof(double); i++) {
data[i] = i * 0.5;
}
// Do some computation
double sum = 0;
for (size_t i = 0; i < size / sizeof(double); i++) {
sum += data[i];
}
printf("Processing completed. Sum: %f\n", sum);
}
int main() {
// Allocate memory using huge pages
double* huge_data = (double*)allocate_huge_pages_mmap(ARRAY_SIZE);
if (!huge_data) {
fprintf(stderr, "Failed to allocate using mmap+MAP_HUGETLB. Trying hugetlbfs...\n");
huge_data = (double*)allocate_huge_pages_hugetlbfs(ARRAY_SIZE);
if (!huge_data) {
fprintf(stderr, "Failed to allocate huge pages. Falling back to regular pages.\n");
huge_data = (double*)malloc(ARRAY_SIZE);
if (!huge_data) {
perror("malloc");
return 1;
}
}
}
// Process data
process_large_data(huge_data, ARRAY_SIZE);
// Free memory
if (munmap(huge_data, ARRAY_SIZE) {
perror("munmap");
}
return 0;
}
3. Compiling and Running
Compile the program with:
gcc -o hugepage_demo hugepage_demo.c
Run it with:
./hugepage_demo
4. Verifying Huge Page Usage
Check huge page usage after running your program:
cat /proc/meminfo | grep Huge
5. Important Notes
- Permissions: Your program may need appropriate permissions to use huge pages.
- Page Size: Default huge page size is typically 2MB. 1GB pages are also available on some systems.
- Allocation: Huge page allocation must be contiguous in physical memory.
- Transparent Huge Pages (THP): Linux also supports THP which automatically promotes regular pages to huge pages. Enable with:
echo "always" | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
6. When to Use Huge Pages
Huge pages are particularly beneficial for:
- Large in-memory databases
- Scientific computing applications
- Big data processing frameworks
- Any memory-intensive application processing large datasets
The performance improvement comes from reduced TLB pressure and fewer page faults when working with large datasets.
资料
Understanding Huge Pages: Optimizing Memory Usage
Linux HugePages(大内存页) 原理与使用
Performance Benefits of Using Huge Pages for Code
Optimizing Linux for AMD EPYC™ 9005 Series Processors with SUSE Linux Enterprise 15 SP6