1. Basic CPU / Mem / Disk Info
1.1 CPU Cores 物理 CPU 的核数
1. CPU Cores 物理 CPU 的核数 cat /proc/cpuinfo| grep "cpu cores"| uniq
type:Singlestat
Unit: short
metrics:
count(count(node_cpu_seconds_total{instance=~\"$node:$port\",job=~\"$job\"}) by (cpu))
1.2 Total RAM 内存大小
2. Total RAM 内存大小 cat /proc/meminfo | grep MemTotal
type:Singlestat
Unit: bytes
metrics:
node_memory_MemTotal_bytes{instance=~\"$node:$port\",job=~\"$job\"}
1.3Total SWAP 交换分区的大小
3. Total SWAP 交换分区的大小 cat /proc/swaps
type:Singlestat
Unit: bytes
metrics:
node_memory_SwapTotal_bytes{instance=~\"$node:$port\",job=~\"$job\"}
1.4 Total RootFS 根文件系统总空间
4. Total RootFS 根文件系统总空间
type:Singlestat
Unit: bytes
metrics:
node_filesystem_size_bytes{instance=~\"$node:$port\",job=~\"$job\",mountpoint=\"/\",fstype!=\"rootfs\"}
1.5 System Load (1m avg) 系统一分钟内的负载
5. System Load (1m avg) 系统一分钟内的负载 cat /proc/loadavg 第一列,单核 cpu 的load小于1 表示没有等待的任务, 等于1 表示系统
已经没有额外的资源跑更多进程了,大于1表示进程拥堵在等待资源
type:Singlestat
Unit: short
metrics:
node_load1{instance=~\"$node:$port\",job=~\"$job\"}
1.6 Uptime 系统正常运行的时间
6. Uptime 系统正常运行的时间
type:Singlestat
Unit: seconds (s)
metrics:
node_time_seconds{instance=~\"$node:$port\",job=~\"$job\"} - node_boot_time_seconds{instance=~\"$node:$port\",job=~\"$job\"}
node_time_seconds 当前系统时间
node_boot_time_seconds 系统启动时间
2. Basic CPU / Mem / Disk Gauge
2.1 CPU Busy :收集所有 cpu 内核 busy 状态占比
1. CPU Busy :收集所有 cpu 内核 busy 状态占比
type: Singlestat
Unit: perent(0-100)
(所有 cpu使用情况 - 5分钟内 cpu 空闲的平均值) / 所有 cpu使用情况
metrics:
(((count(count(node_cpu_seconds_total{instance=~\"$node:$port\",job=~\"$job\"}) by (cpu))) - avg(sum by (mode)(irate(node_cpu_seconds_total{mode='idle',instance=~\"$node:$port\",job=~\"$job\"}[5m])))) * 100) / count(count(node_cpu_seconds_total{instance=~\"$node:$port\",job=~\"$job\"}) by (cpu))
最大值: 100%
2.2 内存百分比
2. Used RAM Memory free -m
type: Singlestat
Unit: perent(0-100)
已使用的内存占比(包括Buffer缓存和Cached缓存)
metrics:
((node_memory_MemTotal_bytes{instance=~\"$node:$port\",job=~\"$job\"} - node_memory_MemFree_bytes{instance=~\"$node:$port\",job=~\"$job\"}) / (node_memory_MemTotal_bytes{instance=~\"$node:$port\",job=~\"$job\"} )) * 100
node_memory_MemFree_bytes 空闲内存
已使用的内存占比(不包括Buffer缓存和Cached缓存)
metrics:
100 - ((node_memory_MemAvailable_bytes{instance=~"$node:$port",job=~"$job"} * 100) / node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"})
MemAvailable: Free + Buffers + Cached - 不可回收的部分。不可回收部分包括:共享内存段,tmpfs,ramfs等
2.3 交换分区使用率
3. Used SWAP: 交换分区使用率
type: Singlestat
Unit: perent(0-100)
metrics:
((node_memory_SwapTotal_bytes{instance=~\"$node:$port\",job=~\"$job\"} - node_memory_SwapFree_bytes{instance=~\"$node:$port\",job=~\"$job\"}) / (node_memory_SwapTotal_bytes{instance=~\"$node:$port\",job=~\"$job\"} )) * 100
node_memory_SwapFree_bytes 交换分区的空闲大小
2.4 Used Root FS根文件系统使用率
4. Used Root FS 根文件系统使用率
type: Singlestat
Unit: perent(0-100)
metrics:
100 - ((node_filesystem_avail_bytes{instance=~"$node:$port",job=~"$job",mountpoint="/",fstype!="rootfs"} * 100) / node_filesystem_size_bytes{instance=~"$node:$port",job=~"$job",mountpoint="/",fstype!="rootfs"})
node_filesystem_avail_bytes 文件系统可用空间
2.5 一分钟内 CPU 所有内核的平均负载率
5. CPU System Load (1m avg) 一分钟内 CPU 所有内核的平均负载率
type: Singlestat
Unit: perent(0-100)
metrics:
avg(node_load1{instance=~"$node:$port",job=~"$job"}) / count(count(node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"}) by (cpu)) * 100
node_load1 : 系统一分钟内的负载
2.6 五分钟内 CPU 所有内核的平均负载率
6. CPU System Load (5m avg) 五分钟内 CPU 所有内核的平均负载率
type: Singlestat
Unit: perent(0-100)
metrics:
avg(node_load5{instance=~"$node:$port",job=~"$job"}) / count(count(node_cpu_seconds_total{instance=~"$node:$port",job=~"$job"}) by (cpu)) * 100
node_load5 : 指5分钟内cpu的负载
3. Basic CPU / Mem Graph
3.1 CPU Basic cpu 的基本信息 /proc/stat
1. CPU Basic cpu 的基本信息 /proc/stat
type: Graph
Unit: short
Busy System: cpu 处于核心态的占比
metrics:
sum by (instance)(rate(node_cpu_seconds_total{mode="system",instance=~"$node:$port",job=~"$job"}[5m])) * 100
Busy User: cpu 处于用户态的占比
metrics:
sum by (instance)(rate(node_cpu_seconds_total{mode='user',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Busy Iowait: cpu 处于 io 等待的时间占比
metrics:
sum by (instance)(rate(node_cpu_seconds_total{mode='iowait',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Busy IRQs: cpu 处于中断状态占比
metrics:
sum by (instance)(rate(node_cpu_seconds_total{mode=~".*irq",instance=~"$node:$port",job=~"$job"}[5m])) * 100
Idle: cpu 处于空闲状态占比
metrics:
sum by (mode)(rate(node_cpu_seconds_total{mode='idle',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Busy Other: cpu 处于其他状态占比(非系统状态、非用户态、非io等待状态、非空闲态、非中断状态)
metrics:
sum (rate(node_cpu_seconds_total{mode!='idle',mode!='user',mode!='system',mode!='iowait',mode!='irq',mode!='softirq',instance=~"$node:$port",job=~"$job"}[5m])) * 100
3.2 Memory Basic 内存基本信息
- Memory Basic 内存基本信息
type: Graph
Unit: short
3.2.1 内存总大小
RAM Total: 内存大小
metrics:
node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"}
3.2.2 已使用的内存大小
RAM Used: 已使用的内存大小(内存总量-空闲的内存大小-Buffer缓存和Cached缓存占的内存大小)
metrics:
node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_MemFree_bytes{instance=~"$node:$port",job=~"$job"} - (node_memory_Cached_bytes{instance=~"$node:$port",job=~"$job"} + node_memory_Buffers_bytes{instance=~"$node:$port",job=~"$job"})
3.2.3 RAM Cache + Buffer: Cached缓存占的内存大小
RAM Cache + Buffer: Cached缓存占的内存大小
metrics:
node_memory_Cached_bytes{instance=~"$node:$port",job=~"$job"} + node_memory_Buffers_bytes{instance=~"$node:$port",job=~"$job"}
3.2.4 空闲的内存大小
RAM Free: 空闲的内存大小
metrics:
node_memory_MemFree_bytes{instance=~"$node:$port",job=~"$job"}
3.2.5 已使用的交换内存的大小
SWAP Used: 已使用的交换内存的大小
metrics:
(node_memory_SwapTotal_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_SwapFree_bytes{instance=~"$node:$port",job=~"$job"})
交换分区的总大小- 空闲大小
4. Basic Net / Disk Info
4.1 每个接口的基本网络信息
1. Network Traffic Basic 每个接口的基本网络信息
type: Graph
Unit: bytes
recv {{device}} 各个网络接口的下载量
recv lo: 本地环回接口
recv eth0: 以太网接口
recv docker0: docker0 网络接口
metrics:
rate(node_network_receive_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
trans {{device}} 各个网络接口的上传量
metrics:
rate(node_network_transmit_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
4.2 所有挂载的文件系统的磁盘空间占比
2. Disk Space Used Basic 所有挂载的文件系统的磁盘空间占比
type: Graph
unit: perent(0-100)
metrics:
100 - ((node_filesystem_avail_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'} * 100) / node_filesystem_size_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'})
5. CPU Memory Net Disk
5.1. CPU
cpu
type: Graph
Unit: short
max: “100”
min: “0”
Label: Percentage
5.1.1 cpu 在内核模式下执行的进程占比
System - cpu 在内核模式下执行的进程占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode="system",instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.1.2 cpu 在用户模式下执行的正常进程占比
User - cpu 在用户模式下执行的正常进程占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='user',instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.1.3 cpu 在用户模式下执行的 nice 进程占比
Nice - cpu 在用户模式下执行的 nice 进程占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='nice',instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.1.4 cpu 在空闲模式下的占比
Idle - cpu 在空闲模式下的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='idle',instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.1.5 cpu 在 io 等待的占比
Iowait - cpu 在 io 等待的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='iowait',instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.1.6 cpu 在服务中断的占比
Irq - cpu 在服务中断的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='irq',instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.1.7 cpu 在服务软中断的占比
Softirq - cpu 在服务软中断的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='softirq',instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.1.8 在 VM 中运行时其他 VM 占用的本 VM 的 cpu 的占比
Steal - 在 VM 中运行时其他 VM 占用的本 VM 的 cpu 的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='steal',instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.1.9 运行各种 VM 使用的 CPU 占比
Guest - 运行各种 VM 使用的 CPU 占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='guest',instance=~"$node:$port",job=~"$job"}[5m])) * 100
5.2 Memory Stack 内存堆栈 /proc/meminfo
Memory Stack 内存堆栈 /proc/meminfo
type: Graph
Unit: bytes
min: “0”
Label: Bytes
5.2.1 用户空间应用程序使用的内存
Apps - 用户空间应用程序使用的内存
metrics:
node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_MemFree_bytes{instance=~"$node:$port",job=~"$job"}
- node_memory_Buffers_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_Cached_bytes{instance=~"$node:$port",job=~"$job"}
- node_memory_Slab_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_PageTables_bytes{instance=~"$node:$port",job=~"$job"}
- node_memory_SwapCached_bytes{instance=~"$node:$port",job=~"$job"}
5.2.2 用于在虚拟和物理内存地址之间映射的内存
PageTables - 用于在虚拟和物理内存地址之间映射的内存
metrics:
node_memory_PageTables_bytes{instance=~"$node:$port",job=~"$job"}
5.2.3 用于跟踪已从交换区中提取出来但尚未修改的页面的内存
SwapCache - 用于跟踪已从交换区中提取出来但尚未修改的页面的内存
metrics:
node_memory_SwapCached_bytes{instance=~"$node:$port",job=~"$job"}
5.2.4 内核用于缓存数据结构以供自己使用的内存(如 inode,dentry 等缓存)
Slab - 内核用于缓存数据结构以供自己使用的内存(如 inode,dentry 等缓存)
metrics:
node_memory_Slab_bytes{instance=~"$node:$port",job=~"$job"}
5.2.5 频繁访问的文件数据或内容的缓存
Cache - 频繁访问的文件数据或内容的缓存
metrics:
node_memory_Cached_bytes{instance=~"$node:$port",job=~"$job"}
5.2.6 块设备(例如硬盘)缓存
Buffers - 块设备(例如硬盘)缓存
metrics:
node_memory_Buffers_bytes{instance=~"$node:$port",job=~"$job"}
5.2.7 未使用的内存大小
Unused - 未使用的内存大小
metrics:
node_memory_MemFree_bytes{instance=~"$node:$port",job=~"$job"}
5.2.8 交换分区使用的空间
Swap - 交换分区使用的空间
metrics:
(node_memory_SwapTotal_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_SwapFree_bytes{instance=~"$node:$port",job=~"$job"})
5.2.9 内核识别为已损坏或不工作的内存量
Harware Corrupted - 内核识别为已损坏或不工作的内存量
metrics:
node_memory_HardwareCorrupted_bytes{instance=~"$node:$port",job=~"$job"}
5.3 Network Traffic 各个网络接口的传输速率
- Network Traffic 各个网络接口的传输速率
type: Graph
Unit: bytes/sec
Label: Bytes out(-)/in(+)
5.3.1 Receive 各个网络接口下载速率
{{device}} - Receive 各个网络接口下载速率
metrics:
irate(node_network_receive_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
5.3.2 Transmit 各个网络接口上传速率
{{device}} - Transmit 各个网络接口上传速率
metrics:
irate(node_network_transmit_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
5.4 所有挂载的文件系统的磁盘空间大小
4. Disk Space Used 所有挂载的文件系统的磁盘空间大小
type: Graph
Unit: bytes
min: "0"
Label: Bytes
metrics:
node_filesystem_size_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'} - node_filesystem_avail_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'}
5.5 磁盘读写
- Disk IOps 磁盘读写
type: Graph
Unit: I/O ops/sec (iops)
Label: IO read(-)/write(+)
5.5.1 Reads completed 磁盘的读取速率(五分钟内)
{{device}} - Reads completed 磁盘的读取速率(五分钟内)
metrics:
irate(node_disk_reads_completed_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"}[5m])
5.5.2 Writes completed 磁盘的写入速率(五分钟内)
{{device}} - Writes completed 磁盘的写入速率(五分钟内)
metrics:
irate(node_disk_writes_completed_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"}[5m])
5.6 I/O Usage Read / Write
- I/O Usage Read / Write
type: Graph
Unit: bytes
Label: Bytes read(-)/write(+)
5.6.1 成功读取的字节数(五分钟内)
成功读取的字节数(五分钟内)
metrics:
irate(node_disk_read_bytes_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"}[5m])
5.6.2 成功写入的字节数(五分钟内)
成功写入的字节数(五分钟内)
metrics:
irate(node_disk_written_bytes_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"}[5m])
5.7 I/O Usage Times 使用 I/O 的毫秒数
7. I/O Usage Times 使用 I/O 的毫秒数
type: Graph
Unit: ms
Label: Milliseconds
metrics:
irate(node_disk_io_time_seconds_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"} [5m])
6. Memory Detail Meminfo /proc/meminfo
- Memory Active / Inactive
type: Graph
Unit: bytes
Label: Bytes
6.1 最近使用较少的内存, 优先被回收利用
Inactive - 最近使用较少的内存, 优先被回收利用 /proc/meminfo Inactive
metrics:
node_memory_Inactive_bytes{instance=~"$node:$port",job=~"$job"}
6.2 最近被频繁使用的内存,除非绝对必要,否则通常不会回收
Active - 最近被频繁使用的内存,除非绝对必要,否则通常不会回收 /proc/meminfo Active
metrics:
node_memory_Active_bytes{instance=~"$node:$port",job=~"$job"}
6.2 Memory Commited
- Memory Commited
type: Graph
Unit: bytes
Label: Bytes
6.2.1 当前系统已经分配的内存量
Committed_AS - 当前系统已经分配的内存量,包括已分配但尚未使用的内存大小 /proc/meminfo Committed_AS
metrics:
node_memory_Committed_AS_bytes{instance=~"$node:$port",job=~"$job"}
6.2.2 当前系统可分配的内存量
CommitLimit - 当前系统可分配的内存量 /proc/meminfo CommitLimit
metrics:
node_memory_CommitLimit_bytes{instance=~"$node:$port",job=~"$job"}
6.3 Memory Active / Inactive Detail
- Memory Active / Inactive Detail
type: Graph
Unit: bytes
Label: Bytes
6.3.1 Inactive_file - LRU list 上长时间未被访问过的与文件对应的内存页
Inactive_file - LRU list 上长时间未被访问过的与文件对应的内存页 /proc/meminfo LRU_INACTIVE_FILE
metrics:
node_memory_Inactive_file_bytes{instance=~"$node:$port",job=~"$job"}
6.3.2 Inactive_anon - 上长时间未被访问过的匿名页和交换区缓存
Inactive_anon - 上长时间未被访问过的匿名页和交换区缓存(包括 tmpfs) /proc/meminfo LRU_INACTIVE_ANON
metrics:
node_memory_Inactive_anon_bytes{instance=~"$node:$port",job=~"$job"}
6.3.3 Active_file - LRU list 最近被访问过的与文件对应的内存页
Active_file - LRU list 最近被访问过的与文件对应的内存页 /proc/meminfo LRU_ACTIVE_FILE
metrics:
node_memory_Active_file_bytes{instance=~"$node:$port",job=~"$job"}
6.3.4 Active_anon - 最近被访问过的匿名页和交换区缓存(包括 tmpfs)
Active_anon - 最近被访问过的匿名页和交换区缓存(包括 tmpfs) /proc/meminfo LRU_ACTIVE_ANON
metrics:
node_memory_Active_anon_bytes{instance=~"$node:$port",job=~"$job"}
6.4 Memory Writeback and Dirty
- Memory Writeback and Dirty
type: Graph
Unit: bytes
Label: Bytes
6.4.1 正准备主动回写硬盘的缓存页
Writeback - 正准备主动回写硬盘的缓存页 /proc/meminfo Writeback
metrics:
node_memory_Writeback_bytes{instance=~"$node:$port",job=~"$job"}
6.4.2 WritebackTmp - FUSE用于临时写回缓冲区的内存
WritebackTmp - FUSE用于临时写回缓冲区的内存 /proc/meminfo WritebackTmp
metrics:
node_memory_WritebackTmp_bytes{instance=~"$node:$port",job=~"$job"}
6.4.3 Dirty - 需要写回磁盘的数据大小
Dirty - 需要写回磁盘的数据大小 /proc/meminfo Dirty
metrics:
node_memory_Dirty_bytes{instance=~"$node:$port",job=~"$job"}
6.5 Memory Shared and Mapped
- Memory Shared and Mapped
type: Graph
Unit: bytes
Label: Bytes
6.5.1
Mapped - mapped 缓存页占用的内存 /proc/meminfo Mapped
metrics:
node_memory_Mapped_bytes{instance=~"$node:$port",job=~"$job"}
6.5.2 共享内存
Shmem - 共享内存 /proc/meminfo Shared
metrics:
node_memory_Shmem_bytes{instance=~"$node:$port",job=~"$job"}
6.6 Memory Slab
- Memory Slab
type: Graph
Unit: bytes
Label: Bytes
6.6.1 通过slab分配的内存中不可回收的部分
SUnreclaim - 通过slab分配的内存中不可回收的部分 /proc/meminfo SUnreclaim
metrics:
node_memory_SUnreclaim_bytes{instance=~"$node:$port",job=~"$job"}
6.6.2 通过slab分配的内存中可回收的部分
SReclaimable - 通过slab分配的内存中可回收的部分 /proc/meminfo SReclaimable
metrics:
node_memory_SReclaimable_bytes{instance=~"$node:$port",job=~"$job"}
6.7 Memory Vmalloc
- Memory Vmalloc
type: Graph
Unit: bytes
Label: Bytes
6.7.1 vmalloc 可分配的最大的逻辑连续的内存大小
VmallocChunk - vmalloc 可分配的最大的逻辑连续的内存大小 /proc/meminfo VmallocChunk
metrics:
node_memory_VmallocChunk_bytes{instance=~"$node:$port",job=~"$job"}
6.7.2 vmalloc 可使用的总内存大小
VmallocTotal - vmalloc 可使用的总内存大小 /proc/meminfo VmallocTotal
metrics:
node_memory_VmallocTotal_bytes{instance=~"$node:$port",job=~"$job"}
vmalloc 已用的总内存大小
VmallocUsed - vmalloc 已用的总内存大小 /proc/meminfo VmallocUsed
metrics:
node_memory_VmallocUsed_bytes{instance=~"$node:$port",job=~"$job"}
6.8 Memory Bounce /proc/meminfo Bounce
- Memory Bounce /proc/meminfo Bounce
type: Graph
Unit: bytes
Label: Bytes
6.8.1 Bounce - bounce buffers 占用的内存
Bounce - bounce buffers 占用的内存
metrics:
node_memory_Bounce_bytes{instance=~"$node:$port",job=~"$job"}
6.9 Memory Anonymous
- Memory Anonymous
type: Graph
Unit: bytes
Label: Bytes
6.9.1 AnonHugePages 占用的内存大小
AnonHugePages - AnonHugePages 占用的内存大小 /proc/meminfo AnonHugePages
metrics:
node_memory_AnonHugePages_bytes{instance=~"$node:$port",job=~"$job"}
6.9.2 用户进程中匿名内存页大小
AnonPages - 用户进程中匿名内存页大小 /proc/meminfo AnonPages
metrics:
node_memory_AnonPages_bytes{instance=~"$node:$port",job=~"$job"}
6.10 Memory Kernel
- Memory Kernel /proc/meminfo KernelStack
type: Graph
Unit: bytes
Label: Bytes
KernelStack - 内核栈大小(常驻内存,不可回收)
metrics:
node_memory_KernelStack_bytes{instance=~"$node:$port",job=~"$job"}
6.11 Memory HugePages Counter
- Memory HugePages Counter
type: Graph
Unit: short
Label: Pages
6.11.1 系统当前总共拥有的空闲 HugePages 数目
HugePages_Free - 系统当前总共拥有的空闲 HugePages 数目 /proc/meminfo HugePages_Free
metrics:
node_memory_HugePages_Free{instance=~"$node:$port",job=~"$job"}
6.11.2 系统当前总共保留的HugePages数目
HugePages_Rsvd - 系统当前总共保留的HugePages数目,更具体点就是指程序已经向系统申请,但是由于程序还没有实质的HugePages读写操作,因此系统尚未实际分配给程序的HugePages数目 /proc/meminfo HugePages_Rsvd
metrics:
node_memory_HugePages_Rsvd{instance=~"$node:$port",job=~"$job"}
6.11.3 指超过系统设定的常驻HugePages数目的数目
HugePages_Surp - 指超过系统设定的常驻HugePages数目的数目 /proc/meminfo HugePages_Surp
metrics:
node_memory_HugePages_Surp{instance=~"$node:$port",job=~"$job"}
6.12 Memory HugePages Size
- Memory HugePages Size
type: Graph
Unit: bytes
Label: Bytes
6.12.1 系统当前总共拥有的HugePages数目
HugePages - 系统当前总共拥有的HugePages数目 /proc/meminfo HugePages
metrics:
node_memory_HugePages_Total{instance=~"$node:$port",job=~"$job"}
6.12.2 每一页 HugePages 的大小
Hugepagesize - 每一页 HugePages 的大小 /proc/meminfo Hugepagesize
metrics:
node_memory_Hugepagesize_bytes{instance=~"$node:$port",job=~"$job"}
6.13 Memory DirectMap /proc/meminfo DirectMap
- Memory DirectMap /proc/meminfo DirectMap
type: Graph
Unit: bytes
Label: Bytes
6.13.1 映射为 1G 的内存页的内存数量
DirectMap1G - 映射为 1G 的内存页的内存数量
metrics:
node_memory_DirectMap1G{instance=~"$node:$port",job=~"$job"}
6.13.2 映射为 2M 的内存页的内存数量
DirectMap2M - 映射为 2M 的内存页的内存数量
metrics:
node_memory_DirectMap2M_bytes{instance=~"$node:$port",job=~"$job"}
6.13.3 映射为 4kB 的内存页的内存数量
DirectMap4K - 映射为 4kB 的内存页的内存数量
metrics:
node_memory_DirectMap4k_bytes{instance=~"$node:$port",job=~"$job"}
6.14 Memory Unevictable and MLocked
- Memory Unevictable and MLocked
type: Graph
Unit: bytes
Label: Bytes
6.14.1 不可被回收的内存
Unevictable - 不可被回收的内存 /proc/meminfo Unevictable
metrics:
node_memory_Unevictable_bytes{instance=~"$node:$port",job=~"$job"}
6.14.2
MLocked - 被mlock()系统调用锁定的内存大小 /proc/meminfo MLocked
metrics:
node_memory_Mlocked_bytes{instance=~"$node:$port",job=~"$job"}
6.15 Memory NFS /proc/meminfo NFS_Unstable
- Memory NFS /proc/meminfo NFS_Unstable
type: Graph
Unit: bytes
Label: Bytes
6.15.1 发给NFS server但尚未写入硬盘的缓存页
NFS Unstable - 发给NFS server但尚未写入硬盘的缓存页
metrics:
node_memory_NFS_Unstable_bytes{instance=~"$node:$port",job=~"$job"}
7. Memory Detail Vmstat
7.1 Memory Pages In / Out
- Memory Pages In / Out
type: Graph
Unit: short
Label: Pages
7.1.1 数据从硬盘读到物理内存的速率(5分钟内)
Pagesin - 数据从硬盘读到物理内存的速率(5分钟内) /proc/vmstat pgpgin
metrics:
irate(node_vmstat_pgpgin{instance=~"$node:$port",job=~"$job"}[5m])
7.1.2 数据从物理内存写到硬盘的速率(5分钟内)
Pagesout - 数据从物理内存写到硬盘的速率(5分钟内) /proc/vmstat pgpgout
metrics:
irate(node_vmstat_pgpgout{instance=~"$node:$port",job=~"$job"}[5m])
7.2 Memory Pages Swap In / Out
- Memory Pages Swap In / Out
type: Graph
Unit: short
Label: Pages
7.2.1 数据从磁盘交换区装入内存的速率(5分钟内)
Pswpin - 数据从磁盘交换区装入内存的速率(5分钟内) /proc/vmstat pswpin
metrics:
irate(node_vmstat_pswpin{instance=~"$node:$port",job=~"$job"}[5m])
7.2.2 数据从内存转储到磁盘交换区的速率(5分钟内)
Pswpout - 数据从内存转储到磁盘交换区的速率(5分钟内) /proc/vmstat pswpout
metrics:
irate(node_vmstat_pswpout{instance=~"$node:$port",job=~"$job"}[5m])
7.3 Memory Page Operations
- Memory Page Operations
type: Graph
Unit: short
Label: Pages
7.3.1 激活的平均页数(5分钟内)
Pgdeactivate - 激活的平均页数(5分钟内) /proc/vmstat pgdeactivate
metrics:
irate(node_vmstat_pgdeactivate{instance=~"$node:$port",job=~"$job"}[5m])
7.3.2 释放的平均页数(5分钟内)
Pgfree - 释放的平均页数(5分钟内) /proc/vmstat pgfree
metrics:
irate(node_vmstat_pgfree{instance=~"$node:$port",job=~"$job"}[5m])
7.3.3 未激活的平均页数(5分钟内)
Pgactivate - 未激活的平均页数(5分钟内) /proc/vmstat pgactivate
metrics:
irate(node_vmstat_pgactivate{instance=~"$node:$port",job=~"$job"}[5m])
7.4 Memory Page Faults
- Memory Page Faults
type: Graph
Unit: short
Label: Faults
7.4.1 一级页面和二级页面的平均错误数(5分钟内)
Pgfault - 一级页面和二级页面的平均错误数(5分钟内) /proc/vmstat pgfault
metrics:
irate(node_vmstat_pgfault{instance=~"$node:$port",job=~"$job"}[5m])
7.4.2 一级页面的平均错误数(5分钟内)
Pgmajfault - 一级页面的平均错误数(5分钟内) /proc/vmstat pgmajfault
metrics:
irate(node_vmstat_pgmajfault{instance=~"$node:$port",job=~"$job"}[5m])
7.4.2
Pgminfault - 二级页面的平均错误数(5分钟内)
metrics:
irate(node_vmstat_pgfault{instance=~"$node:$port",job=~"$job"}[5m]) - irate(node_vmstat_pgmajfault{instance=~"$node:$port",job=~"$job"}[5m])
7.5 Memory Pages Reclaimed
- Memory Pages Reclaimed
type: Graph
Unit: short
Label: Pages
7.5.1 由 kswapd 回收用于其它目的的平均页面数(5分钟内)
Kswapd_inodesteal - 由 kswapd 回收用于其它目的的平均页面数(5分钟内) /proc/vmstat kswapd_inodesteal
metrics:
irate(node_vmstat_kswapd_inodesteal{instance=~"$node:$port",job=~"$job"}[5m])
7.5.2 由 inode 释放回收的平均页面数(5分钟内)
Pgindesteal - 由 inode 释放回收的平均页面数(5分钟内) /proc/vmstat pgindesteal
metrics:
irate(node_vmstat_pginodesteal{instance=~"$node:$port",job=~"$job"}[5m])
7.6 Memory Calls Reclaimed
- Memory Calls Reclaimed
type: Graph
Unit: short
Label: Cells
7.6.1 由 kswapd调用来回收的平均页面数
Pageoutrun - 由 kswapd调用来回收的平均页面数(5分钟内) /proc/vmstatpageoutrun
metrics:
irate(node_vmstat_pageoutrun{instance=~"$node:$port",job=~"$job"}[5m])
7.6.2 请求直接回收的平均页面数(5分钟内)
Allocstall - 请求直接回收的平均页面数(5分钟内) /proc/vmstat allocstall
metrics:
irate(node_vmstat_allocstall{instance=~"$node:$port",job=~"$job"}[5m])
7.6.3 内存域回收失败的平均页面数(5分钟内)
Zone_reclaim_failed - 内存域回收失败的平均页面数(5分钟内) /proc/vmstat zone_reclaim_failed
metrics:
irate(node_vmstat_zone_reclaim_failed{instance=~"$node:$port",job=~"$job"}[5m])
7.7 Memory Page Rotate /proc/vmstat pgrotated
- Memory Page Rotate /proc/vmstat pgrotated
type: Graph
Unit: short
Label: Pages
7.7.1 轮换的平均页面数(5分钟内)
Pgrotated - 轮换的平均页面数(5分钟内)
metrics:
irate(node_vmstat_pgrotated{instance=~"$node:$port",job=~"$job"}[5m])
7.8 Memory Page Drop
- Memory Page Drop
type: Graph
Unit: short
Label: Cells
7.8.1 调用释放缓存的平均页面数(5分钟内)
Drop_pagecache - 调用释放缓存的平均页面数(5分钟内) /proc/vmstat drop_pagecache
metrics:
node_vmstat_drop_pagecache{instance=~"$node:$port",job=~"$job"}
7.8.2
Drop_slab - 调用释放 slab 缓存的平均页面数(5分钟内) /proc/vmstat drop_slab
metrics:
node_vmstat_drop_slab{instance=~"$node:$port",job=~"$job"}
7.9 Memory Scan Slab /proc/vmstat slabs_scanned
- Memory Scan Slab /proc/vmstat slabs_scanned
type: Graph
Unit: short
Slabs_scanned - 被扫描的 Slab 页面的平均页面数(5分钟内)
metrics:
irate(node_vmstat_slabs_scanned{instance=~"$node:$port",job=~"$job"}[5m])
7.10 Memory Unevictable Pages
- Memory Unevictable Pages
type: Graph
Unit: short
Label: Pages
7.10.1 Unevictable_pgs_cleared - Unevictable pages cleared
Unevictable_pgs_cleared - Unevictable pages cleared
metrics:
irate(node_vmstat_unevictable_pgs_cleared{instance=~"$node:$port",job=~"$job"}[5m])
7.10.2 Unevictable_pgs_culled - Unevictable pages culled
Unevictable_pgs_culled - Unevictable pages culled
metrics:
irate(node_vmstat_unevictable_pgs_culled{instance=~"$node:$port",job=~"$job"}[5m])
7.10.3 Unevictable_pgs_mlocked - Unevictable pages mlocked
Unevictable_pgs_mlocked - Unevictable pages mlocked
metrics:
irate(node_vmstat_unevictable_pgs_mlocked{instance=~"$node:$port",job=~"$job"}[5m])
7.10.4 Unevictable_pgs_munlocked - Unevictable pages munlocked
Unevictable_pgs_munlocked - Unevictable pages munlocked
metrics:
irate(node_vmstat_unevictable_pgs_munlocked{instance=~"$node:$port",job=~"$job"}[5m])
7.10.5 Unevictable_pgs_rescued- Unevictable pages rescued
Unevictable_pgs_rescued- Unevictable pages rescued
metrics:
irate(node_vmstat_unevictable_pgs_rescued{instance=~"$node:$port",job=~"$job"}[5m])
7.10.6 Unevictable_pgs_scanned - Unevictable pages scanned
Unevictable_pgs_scanned - Unevictable pages scanned
metrics:
irate(node_vmstat_unevictable_pgs_scanned{instance=~"$node:$port",job=~"$job"}[5m])
7.10.7 unevictable_pgs_stranded - Unevictable pages stranded
unevictable_pgs_stranded - Unevictable pages stranded
metrics:
irate(node_vmstat_unevictable_pgs_stranded{instance=~"$node:$port",job=~"$job"}[5m])
7.11 Memory Page Allocation
- Memory Page Allocation
type: Graph
Unit: short
Label: Pages
7.11.1 DMA 存储区分配的平均页数(5分钟内)
Pgalloc_dma - DMA 存储区分配的平均页数(5分钟内) /proc/vmstat pgalloc_dma
metrics:
irate(node_vmstat_pgalloc_dma{instance=~"$node:$port",job=~"$job"}[5m])
7.11.2 DMA32 存储区分配的平均页数(5分钟内)
Pgalloc_dma32 - DMA32 存储区分配的平均页数(5分钟内) /proc/vmstat pgalloc_dma32
metrics:
irate(node_vmstat_pgalloc_dma32{instance=~"$node:$port",job=~"$job"}[5m])
7.11.3 movable 存储区分配的平均页数(5分钟内
Pgalloc_movable - movable 存储区分配的平均页数(5分钟内) /proc/vmstat pgalloc_movable
metrics:
irate(node_vmstat_pgalloc_movable{instance=~"$node:$port",job=~"$job"}[5m])
7.11.4 普通存储区分配的平均页数(5分钟内)
Pgalloc_normal - 普通存储区分配的平均页数(5分钟内) /proc/vmstat pgalloc_normal
metrics:
irate(node_vmstat_pgalloc_normal{instance=~"$node:$port",job=~"$job"}[5m])
7.12 Memory Page Refill 内存页填充
- Memory Page Refill
type: Graph
Unit: short
Label: Pages
7.12.1 DMA 再填充的平均页数(5分钟内)
Pgrefill_dma - DMA 再填充的平均页数(5分钟内) /proc/vmstat pgrefill_dma
metrics:
irate(node_vmstat_pgrefill_dma{instance=~"$node:$port",job=~"$job"}[5m])
7.12.2 DMA32 存储区再填充的平均页数(5分钟内)
Pgrefill_dma32 - DMA32 存储区再填充的平均页数(5分钟内) /proc/vmstat pgrefill_dma32
metrics:
irate(node_vmstat_pgrefill_dma32{instance=~"$node:$port",job=~"$job"}[5m])
7.12.3 movable 存储区再填充的平均页数(5分钟内)
Pgrefill_movable - movable 存储区再填充的平均页数(5分钟内) /proc/vmstat pgrefill_movable
metrics:
irate(node_vmstat_pgrefill_movable{instance=~"$node:$port",job=~"$job"}[5m])
普通存储区再填充的平均页数(5分钟内)
Pgrefill_normal - 普通存储区再填充的平均页数(5分钟内) /proc/vmstat pgrefill_normal
metrics:
irate(node_vmstat_pgrefill_normal{instance=~"$node:$port",job=~"$job"}[5m])
7.13 Memory Page Steal Direct (内存页直接回收)
- Memory Page Steal Direct
type: Graph
Unit: short
Label: Pages
7.13.1 DMA 存储区被直接回收用于其它目的的平均页面数(5分钟内)
Pgsteal_direct_dma - DMA 存储区被直接回收用于其它目的的平均页面数(5分钟内) /proc/vmstat pgsteal_direct_dma
metrics:
irate(node_vmstat_pgsteal_direct_dma{instance=~"$node:$port",job=~"$job"}[5m])
7.13.2 DMA32 存储区被直接回收用于其它目的的平均页面数(5分钟内)
Pgsteal_direct_dma32 - DMA32 存储区被直接回收用于其它目的的平均页面数(5分钟内) /proc/vmstat pgsteal_direct_dma32
metrics:
irate(node_vmstat_pgsteal_direct_dma32{instance=~"$node:$port",job=~"$job"}[5m])
7.13.3 movable 存储区被直接回收用于其它目的的平均页面数(5分钟内)
Pgsteal_direct_movable - movable 存储区被直接回收用于其它目的的平均页面数(5分钟内) /proc/vmstat pgsteal_direct_movable
metrics:
irate(node_vmstat_pgsteal_direct_movable{instance=~"$node:$port",job=~"$job"}[5m])
7.13.4 普通存储区被直接回收用于其它目的的平均页面数(5分钟内)
Pgsteal_direct_normal - 普通存储区被直接回收用于其它目的的平均页面数(5分钟内) /proc/vmstat pgsteal_direct_normal
metrics:
irate(node_vmstat_pgsteal_direct_normal{instance=~"$node:$port",job=~"$job"}[5m])
7.14 Memory Page Steal Kswapd
- Memory Page Steal Kswapd
type: Graph
Unit: short
Label: Pages
7.14.1 kswapd 后台进程回收 DMA 存储区用于其它目的的平均页面数(5分钟内)
Pgsteal_kswapd_dma - kswapd 后台进程回收 DMA 存储区用于其它目的的平均页面数(5分钟内) /proc/vmstat pgsteal_kswapd_dma
metrics:
irate(node_vmstat_pgsteal_kswapd_dma{instance=~"$node:$port",job=~"$job"}[5m])
7.14.2 kswapd 后台进程回收 DMA32 存储区用于其它目的的平均页面数(5分钟内)
Pgsteal_kswapd_dma32 - kswapd 后台进程回收 DMA32 存储区用于其它目的的平均页面数(5分钟内) /proc/vmstat pgsteal_kswapd_dma32
metrics:
irate(node_vmstat_pgsteal_kswapd_dma32{instance=~"$node:$port",job=~"$job"}[5m])
7.14.3 kswapd 后台进程回收 movable 存储区用于其它目的的平均页面数(5分钟内)
Pgsteal_kswapd_movable - kswapd 后台进程回收 movable 存储区用于其它目的的平均页面数(5分钟内 /proc/vmstat pgsteal_kswapd_movable
metrics:
irate(node_vmstat_pgsteal_kswapd_movable{instance=~"$node:$port",job=~"$job"}[5m])
7.14.4
Pgsteal_kswapd_normal - swapd后台进程回收普通存储区用于其它目的的平均页面数(5分钟内 /proc/vmstat pgsteal_kswapd_normal
metrics:
irate(node_vmstat_pgsteal_kswapd_normal{instance=~"$node:$port",job=~"$job"}[5m])
7.15 Memory Scan Direct
15. Memory Scan Direct
type: Graph
Unit: short
Label: Pages
Pgscan_direct_dma - DMA 存储区被直接回收的平均页面数(5分钟内) /proc/vmstat pgscan_direct_dma
metrics:
irate(node_vmstat_pgscan_direct_dma{instance=~"$node:$port",job=~"$job"}[5m])
Pgscan_direct_dma32 - DMA32 存储区被直接回收的平均页面数(5分钟内) /proc/vmstat pgscan_direct_dma32
metrics:
irate(node_vmstat_pgscan_direct_dma32{instance=~"$node:$port",job=~"$job"}[5m])
Pgscan_direct_movable - movable 存储区被直接回收的平均页面数(5分钟内) /proc/vmstat pgscan_direct_movable
metrics:
irate(node_vmstat_pgscan_direct_movable{instance=~"$node:$port",job=~"$job"}[5m])
Pgscan_direct_normal - 普通存储区被直接回收的平均页面数(5分钟内) /proc/vmstat pgscan_direct_normal
metrics:
irate(node_vmstat_pgscan_direct_normal{instance=~"$node:$port",job=~"$job"}[5m])
Pgscan_direct_throttle - throttle 存储区被直接回收的平均页面数(5分钟内) /proc/vmstat pgscan_direct_throttle
metrics:
irate(node_vmstat_pgscan_direct_throttle{instance=~"$node:$port",job=~"$job"}[5m])
7.16 Memory Scan Kswapd
16. Memory Scan Kswapd
type: Graph
Unit: short
Label: Pages
Pgscan_kswapd_dma - kswapd 后台进程扫描的 DMA 存储区平均页面数(5分钟内) /proc/vmstat pgscan_kswapd_dma
metrics:
irate(node_vmstat_pgscan_kswapd_dma{instance=~"$node:$port",job=~"$job"}[5m])
Pgscan_kswapd_dma32 - kswapd 后台进程扫描的 DMA32 存储区平均页面数(5分钟内) /proc/vmstat pgscan_kswapd_dma32
metrics:
irate(node_vmstat_pgscan_kswapd_dma32{instance=~"$node:$port",job=~"$job"}[5m])
Pgscan_kswapd_movable - kswapd 后台进程扫描的 movable 存储区平均页面数(5分钟内) /proc/vmstat pgscan_kswapd_movable
metrics:
irate(node_vmstat_pgscan_kswapd_movable{instance=~"$node:$port",job=~"$job"}[5m])
Pgscan_kswapd_normal - kswapd 后台进程扫描的普通存储区平均页面数(5分钟内) /proc/vmstat pgscan_kswapd_normal
metrics:
irate(node_vmstat_pgscan_kswapd_normal{instance=~"$node:$port",job=~"$job"}[5m])
7.17 Memory Page Compact
17. Memory Page Compact
type: Graph
Unit: short
Label: Pages
Compact_free_scanned - 扫描由压缩守护程序释放的页面 /proc/vmstat compact_free_scanned
metrics:
irate(node_vmstat_compact_free_scanned{instance=~"$node:$port",job=~"$job"}[5m])
Compact_isolated - 用于内存压缩隔离的页面 /proc/vmstat compact_isolated
metrics:
irate(node_vmstat_compact_isolated{instance=~"$node:$port",job=~"$job"}[5m])
Compact_migrate_scanned - 通过内存压缩守护程序扫描以进行迁移的页面 /proc/vmstat compact_migrate_scanned
metrics:
irate(node_vmstat_compact_migrate_scanned{instance=~"$node:$port",job=~"$job"}[5m])
7.18 Memory Compactions 内存紧缩
18. Memory Compactions 内存紧缩
type: Graph
Unit: short
Label: Compactions
Compact_fail - 高阶分配的内存碎片整理失败的页面数(5分钟内) /proc/vmstat compact_fail
metrics:
irate(node_vmstat_compact_fail{instance=~"$node:$port",job=~"$job"}[5m])
Compact_stall - 开始执行内存碎片失败的页面数(5分钟内) /proc/vmstat compact_stall
metrics:
irate(node_vmstat_compact_stall{instance=~"$node:$port",job=~"$job"}[5m])
Compact_sucess - 高阶分配的内存碎片整理成功的页面数(5分钟内)
metrics:
irate(node_vmstat_compact_success{instance=~"$node:$port",job=~"$job"}[5m])
7.19 Memory Kswapd Watermark
19. Memory Kswapd Watermark
type: Graph
Unit: short
Label: Counter
Kswapd_high_wmark_hit_quickly - 剩余内存达到 high 的水位线的时间 /proc/vmstat kswapd_high_wmark_hit_quickly
metrics:
node_vmstat_kswapd_high_wmark_hit_quickly{instance=~"$node:$port",job=~"$job"}
Kswapd_low_wmark_hit_quickly - - 剩余内存达到 low 的水位线的时间 /proc/vmstat kswapd_low_wmark_hit_quickly
metrics:
node_vmstat_kswapd_low_wmark_hit_quickly{instance=~"$node:$port",job=~"$job"}
7.20 Memory Buddy Alloc
20. Memory Buddy Alloc
type: Graph
Unit: short
Label: Allocations
Htlb_buddy_alloc_fail - buddy 给 hugetlb 分配失败的次数 /proc/vmstat htlb_buddy_alloc_fail
metrics:
node_vmstat_htlb_buddy_alloc_fail{instance=~"$node:$port",job=~"$job"}
Htlb_buddy_alloc_success - buddy 给 hugetlb 分配成功的次数 /proc/vmstat htlb_buddy_alloc_success
metrics:
node_vmstat_htlb_buddy_alloc_success{instance=~"$node:$port",job=~"$job"}
7.21 Memory Numa Allocations
21. Memory Numa Allocations
type: Graph
Unit: short
Label: Allocations
Numa_foreign - 计划使用其他节点内存但是却使用本地内存次数 /proc/vmstat numa_foreign
metrics:
irate(node_vmstat_numa_foreign{instance=~"$node:$port",job=~"$job"}[5m])
Numa_hit - 使用本节点内存次数 /proc/vmstat numa_hit
metrics:
irate(node_vmstat_numa_hit{instance=~"$node:$port",job=~"$job"}[5m])
Numa_interleave - 交叉分配使用的内存中使用本节点的内存次数 /proc/vmstat numa_interleave
metrics:
irate(node_vmstat_numa_interleave{instance=~"$node:$port",job=~"$job"}[5m])
Numa_local - 在本节点运行的程序使用本节点内存次数 /proc/vmstat numa_local
metrics:
irate(node_vmstat_numa_local{instance=~"$node:$port",job=~"$job"}[5m])
Numa_miss - 计划使用本节点内存而被调度到其他节点次数 /proc/vmstat numa_miss
metrics:
irate(node_vmstat_numa_miss{instance=~"$node:$port",job=~"$job"}[5m])
Numa_other - 在其他节点运行的程序使用本节点内存次数 /proc/vmstat numa_other
metrics:
irate(node_vmstat_numa_other{instance=~"$node:$port",job=~"$job"}[5m])
7.22 Memory Numa Page Migrations
22. Memory Numa Page Migrations
type: Graph
Unit: short
Label: Pages
Numa_pages_migrated - NUMA page 数 /proc/vmstat numa_pages_migrated
metrics:
irate(node_vmstat_numa_pages_migrated{instance=~"$node:$port",job=~"$job"}[5m])
Pgmigrate_fail - 迁移失败的页面数 /proc/vmstat pgmigrate_fail
metrics:
irate(node_vmstat_pgmigrate_fail{instance=~"$node:$port",job=~"$job"}[5m])
Pgmigrate_success - 成功迁移的页面数 /proc/vmstat pgmigrate_success
metrics:
irate(node_vmstat_pgmigrate_success{instance=~"$node:$port",job=~"$job"}[5m])
7.23
23. Memory Numa Hints
type: Graph
Unit: short
Label: Hints
Numa_hint_faults - NUMA hint faults trapped
metrics:
irate(node_vmstat_numa_hint_faults{instance=~"$node:$port",job=~"$job"}[5m])
Numa_hint_faults_local - Hinting faults to local nodes
metrics:
irate(node_vmstat_numa_hint_faults_local{instance=~"$node:$port",job=~"$job"}[5m])
7.24 Memory Numa Table Updates
24. Memory Numa Table Updates
type: Graph
Unit: short
Label: Updates
Numa_pte_updates - NUMA page table entry updates
metrics:
irate(node_vmstat_numa_pte_updates{instance=~"$node:$port",job=~"$job"}[5m])
Numa_huge_pte_updates - NUMA huge page table entry updates
metrics:
irate(node_vmstat_numa_huge_pte_updates{instance=~"$node:$port",job=~"$job"}[5m])
7.25 Memory THP Splits
25. Memory THP Splits
type: Graph
Unit: short
Label: Splits
Thp_split - 大型页面分割成多个常规页面 /proc/vmstat thp_split
metrics:
irate(node_vmstat_thp_split{instance=~"$node:$port",job=~"$job"}[5m])
7.26 Memory Workingset
26. Memory Workingset
type: Graph
Unit: short
Label: Counter
Workingset_activate - Page activations to form the working set
metrics:
irate(node_vmstat_workingset_activate{instance=~"$node:$port",job=~"$job"}[5m])
Workingset_nodereclaim - NUMA node working set page reclaims
metrics:
irate(node_vmstat_workingset_nodereclaim{instance=~"$node:$port",job=~"$job"}[5m])
Workingset_refault - Refaults of previously evicted pages
metrics:
irate(node_vmstat_workingset_refault{instance=~"$node:$port",job=~"$job"}[5m])
7.27 Memory THP Allocations
27. Memory THP Allocations
type: Graph
Unit: short
Label: Allocations
Thp_collapse_alloc - Transparent huge page collapse allocations
metrics:
irate(node_vmstat_thp_collapse_alloc{instance=~"$node:$port",job=~"$job"}[5m])
Thp_collapse_alloc_failed - Transparent huge page collapse allocation failures
metrics:
irate(node_vmstat_thp_collapse_alloc_failed{instance=~"$node:$port",job=~"$job"}[5m])
Thp_zero_page_alloc - Transparent huge page zeroed page allocations
metrics:
irate(node_vmstat_thp_zero_page_alloc{instance=~"$node:$port",job=~"$job"}[5m])
Thp_zero_page_alloc_failed - Transparent huge page zeroed page allocation failures
metrics:
irate(node_vmstat_thp_zero_page_alloc_failed{instance=~"$node:$port",job=~"$job"}[5m])
Thp_fault_alloc - Transparent huge page fault allocations
metrics:
irate(node_vmstat_thp_fault_alloc{instance=~"$node:$port",job=~"$job"}[5m])
Thp_fault_fallback - Transparent huge page fault fallbacks
metrics:
irate(node_vmstat_thp_fault_fallback{instance=~"$node:$port",job=~"$job"}[5m])
8. Memory Detail Vmstat Counters
8.1 Memory Page Active
1. Memory Page Active
type: Graph
Unit: short
Label: Pages
Active_anon - pages最近被使用过的匿名虚拟内存页 /proc/vmstat nr_active_anon
metrics:
node_vmstat_nr_active_anon{instance=~"$node:$port",job=~"$job"}
Active_file - 最近被使用过的文件虚拟内存页 /proc/vmstat nr_active_file
metrics:
node_vmstat_nr_active_file{instance=~"$node:$port",job=~"$job"}
8.2 Memory Page Reclaimed / Unreclaimed
2. Memory Page Reclaimed / Unreclaimed
type: Graph
Unit: short
Label: Pages
Reclaimable - 可回收的 slab 虚拟内存页 /proc/vmstat nr_slab_reclaimable
metrics:
node_vmstat_nr_slab_reclaimable{instance=~"$node:$port",job=~"$job"}
Unreclaimable - 不可回收的 slab 虚拟内存页 /proc/vmstat nr_slab_unreclaimable
metrics:
node_vmstat_nr_slab_unreclaimable{instance=~"$node:$port",job=~"$job"}
8.3 Memory Page Inactive
3. Memory Page Inactive
type: Graph
Unit: short
Label: Pages
Inactive_anon - 每个 NUMA node 的每个域中的长时间未被访问过的匿名内存页 /proc/vmstat nr_inactive_anon
metrics:
node_vmstat_nr_inactive_anon{instance=~"$node:$port",job=~"$job"}
Inactive_file - 每个 NUMA node 的每个域中的长时间未被访问过的与文件对应的内存页 /proc/vmstat nr_inactive_file
metrics:
node_vmstat_nr_inactive_file{instance=~"$node:$port",job=~"$job"}
8.4 Memory Page Dirty / Bounce
4. Memory Page Dirty / Bounce
type: Graph
Unit: short
Label: Pages
Dirty - 脏页数 /proc/vmstat nr_dirty
metrics:
node_vmstat_nr_dirty{instance=~"$node:$port",job=~"$job"}
Bounce - Bounce buffer 页面数 /proc/vmstat nr_bounce
metrics:
node_vmstat_nr_bounce{instance=~"$node:$port",job=~"$job"}
8.5 Memory Page Free / Written
5. Memory Page Free / Written
type: Graph
Unit: short
Label: Pages
Free_pages - 空闲页数 /proc/vmstat nr_free_pages
metrics:
node_vmstat_nr_free_pages{instance=~"$node:$port",job=~"$job"}
Written - 每个 NUMA node 的每个域中写出的页面 /proc/vmstat nr_written
metrics:
node_vmstat_nr_written{instance=~"$node:$port",job=~"$job"}
8.6 Memory Page Shmem / Mapped
6.Memory Page Shmem / Mapped
type: Graph
Unit: short
Label: Pages
Shmem - 共享内存页数 /proc/vmstat nr_shmem
metrics:
node_vmstat_nr_shmem{instance=~"$node:$port",job=~"$job"}
Mapped - 每个 NUMA node 的每个域 mapped 缓存页的页数 /proc/vmstat nr_mapped
metrics:
node_vmstat_nr_mapped{instance=~"$node:$port",job=~"$job"}
8.7 Memory Page Unevictable / MLock
7.Memory Page Unevictable / MLock
type: Graph
Unit: short
Label: Pages
Unevictable - 不可回收的页数 /proc/vmstat nr_unevictable
metrics:
node_vmstat_nr_unevictable{instance=~"$node:$port",job=~"$job"}
Mlock - 被 mlock()系统调用锁定的页数 /proc/vmstat nr_mlock
metrics:
node_vmstat_nr_mlock{instance=~"$node:$port",job=~"$job"}
8.8 Memory Page Writeback
8.Memory Page Writeback
type: Graph
Unit: short
Label: Pages
Writeback - 回写页数 /proc/vmstat nr_writeback
metrics:
node_vmstat_nr_writeback{instance=~"$node:$port",job=~"$job"}
Writeback_temp - 临时回写页数 /proc/vmstat nr_writeback_temp
metrics:
node_vmstat_nr_writeback_temp{instance=~"$node:$port",job=~"$job"}
8.9 Memory Page Kernel_stack
9.Memory Page Kernel_stack
type: Graph
Unit: short
Label: Pages
Kernel_stack - 内核栈的页数 /proc/vmstat nr_kernel_stack
metrics:
node_vmstat_nr_kernel_stack{instance=~"$node:$port",job=~"$job"}
8.10 Memory Page Dirty Threshold
10.Memory Page Dirty Threshold
type: Graph
Unit: short
Label: Pages
Dirty_background_threshold - 脏页后台回写阈值 /proc/vmstat nr_dirty_background_threshold
metrics:
node_vmstat_nr_dirty_background_threshold{instance=~"$node:$port",job=~"$job"}
Dirty_threshold - 脏页限制阈值 /proc/vmstat nr_dirty_threshold
metrics:
node_vmstat_nr_dirty_threshold{instance=~"$node:$port",job=~"$job"}
8.11 Memory Page File_pages
11.Memory Page File_pages
type: Graph
Unit: short
Label: Pages
File_pages - 每个 NUMA node 的每个域文件缓存页的页数 /proc/vmstat nr_file_pages
metrics:
node_vmstat_nr_file_pages{instance=~"$node:$port",job=~"$job"}
8.12 Memory Page Page_table_pages
12.Memory Page Page_table_pages
type: Graph
Unit: short
Label: Pages
Page_table_pages - 每个 NUMA node 的每个域页面表的页数 /proc/vmstat nr_page_table_pages
metrics:
node_vmstat_nr_page_table_pages{instance=~"$node:$port",job=~"$job"}
8.13 Memory Page Unstable / Dirtied
13.Memory Page Unstable / Dirtied
type: Graph
Unit: short
Label: Pages
Unstable - 每个 NUMA node 的每个域中处于不稳定页面的页数 /proc/vmstat nr_unstable
metrics:
node_vmstat_nr_unstable{instance=~"$node:$port",job=~"$job"}
Dirtied - 每个 NUMA node 的每个域中进入脏页面的页数 /proc/vmstat nr_dirtied
metrics:
node_vmstat_nr_dirtied{instance=~"$node:$port",job=~"$job"}
8.14 Memory Page Isolated
14.Memory Page Isolated
type: Graph
Unit: short
Label: Pages
Isolated_anon - 每个 NUMA node 的每个域中隔离的匿名内存页面的页数 /proc/vmstat nr_isolated_anon
metrics:
node_vmstat_nr_isolated_anon{instance=~"$node:$port",job=~"$job"}
Isolated_file - 每个 NUMA node 的每个域中隔离的文件存储页面的页数 /proc/vmstat nr_isolated_file
metrics:
node_vmstat_nr_isolated_file{instance=~"$node:$port",job=~"$job"}
8.15 Memory Page Alloc_batch
15.Memory Page Alloc_batch
type: Graph
Unit: short
Label: Pages
Alloc_batch - 每个 NUMA node 的每个域中由于内存不足分配给其他域的页面 /proc/vmstat nr_alloc_batch
metrics:
node_vmstat_nr_alloc_batch{instance=~"$node:$port",job=~"$job"}
8.16 Memory Page Misc
16.Memory Page Misc
type: Graph
Unit: short
Label: Pages
Free_cma - 每个 NUMA node 的每个域中空闲的连续内存分配器页面 /proc/vmstat nr_free_cma
metrics:
node_vmstat_nr_free_cma{instance=~"$node:$port",job=~"$job"}
Vmscan_write - LRU 内存回收写入的页面 /proc/vmstat nr_vmscan_write
metrics:
node_vmstat_nr_vmscan_write{instance=~"$node:$port",job=~"$job"}
Immediate_reclaim - 每个 NUMA node 的每个域中当回写结束时优先回收的页面 /proc/vmstat nr_vmscan_immediate_reclaim
metrics:
node_vmstat_nr_vmscan_immediate_reclaim{instance=~"$node:$port",job=~"$job"}
8.17 Memory Page Anon
17.Memory Page Anon
type: Graph
Unit: short
Label: Pages
Anon_pages - 每个 NUMA node 的每个域中匿名 mapped 缓存页 /proc/vmstat nr_anon_pages
metrics:
node_vmstat_nr_anon_pages{instance=~"$node:$port",job=~"$job"}
Anon_transparent_hugepages - 每个 NUMA node 的每个域中 THP(Transparent Huge Pages) /proc/vmstat nr_anon_transparent_hugepages
metrics:
node_vmstat_nr_anon_transparent_hugepages{instance=~"$node:$port",job=~"$job"}
9. System Detail 系统详情
9.1 Context Switches / Interrupts
1. Context Switches / Interrupts
type: Graph
Unit: short
Label: Counter
Context switches - CPU 的 context switch 平均次数(5分钟内)
metrics:
irate(node_context_switches_total{instance=~"$node:$port",job=~"$job"}[5m])
Interrupts - 服务的平均中断总数(5分钟内)
metrics:
irate(node_intr_total{instance=~"$node:$port",job=~"$job"}[5m])
9.2 System Load 系统负载
. System Load
type: Graph
Unit: short
Label: Load
Load 1m - 系统1分钟内的平均负载
metrics:
node_load1{instance=~"$node:$port",job=~"$job"}
Load 5m - 系统5分钟内的平均负载
metrics:
node_load5{instance=~"$node:$port",job=~"$job"}
Load 15m - 系统15分钟内的平均负载
metrics:
node_load15{instance=~"$node:$port",job=~"$job"}
9.3 Interrupts Detail /proc/interrupts
3. Interrupts Detail /proc/interrupts
type: Graph
Unit: short
Label: Counter
{{ type }} - {{ info }} - 当前系统的软中断列表和对应的中断号平均中断次数(5分钟内)
metrics:
irate(node_interrupts_total{instance=~"$node:$port",job=~"$job"}[5m])
9.4 File Descriptors 文件描述符
4. File Descriptors
type: Graph
Unit: short
Label: Descriptors
Maximum open file descriptors - 最大打开文件描述符数
metrics:
process_max_fds{instance=~"$node:$port",job=~"$job"}
Open file descriptors - 打开文件描述符的数量
metrics:
process_open_fds{instance=~"$node:$port",job=~"$job"}
9.5 Entropy
5. Entropy
type: Graph
Unit: short
Label: Entropy
Entropy available to random number generators
metrics:
node_entropy_available_bits{instance=~"$node:$port",job=~"$job"}
9.6 Processes State 任务状态
6. Processes State
type: Graph
Unit: short
Label: Processes
Processes blocked - 当前被阻塞的任务的数目 /proc/stat procs_blocked
metrics:
node_procs_blocked{instance=~"$node:$port",job=~"$job"}
Processes in runnable state - 当前运行队列的任务的数目 /proc/stat procs_running
metrics:
node_procs_running{instance=~"$node:$port",job=~"$job"}
9.7 Processes Forks
7. Processes Forks
type: Graph
Unit: short
Label: Forks / sec
Processes forks second - 每秒创建的进程个数
metrics:
rate(node_forks_total{instance=~"$node:$port",job=~"$job"}[5m])
9.8 Processes Memory
- Processes Memory
type: Graph
Unit: bytes
Label: Bytes
9.8.1 进程占用的虚拟内存的大小
metrics:
process_virtual_memory_bytes{instance=~"$node:$port",job=~"$job"}
9.8.2 进程常驻内存大小
metrics:
process_resident_memory_bytes{instance=~"$node:$port",job=~"$job"}
9.9 Time Syncronized Status 时钟同步状态
9. Time Syncronized Status
type: Graph
Unit: short
Label: Counter
时钟是否与一个可靠的服务器同步:
metrics:
node_timex_sync_status{instance=~"$node:$port",job=~"$job"}
本地时钟调整频率:
metrics:
node_timex_frequency_adjustment_ratio{instance=~"$node:$port",job=~"$job"}
9.10 Time Syncronized Drift 时间同步偏移
10. Time Syncronized Drift
type: Graph
Unit: seconds
Label: Seconds
估算误差(秒):
metrics:
node_timex_estimated_error_seconds{instance=~"$node:$port",job=~"$job"}
本地系统和参考时钟之间的时间偏移:
metrics:
node_timex_offset_seconds{instance=~"$node:$port",job=~"$job"}
最大误差(秒):
metrics:
node_timex_maxerror_seconds{instance=~"$node:$port",job=~"$job"}
9.11 硬件的温度监控
11. Hardware temperature monitor 硬件的温度监控
type: Graph
Unit: Celsius(摄氏度)
Label: Temperature
{{ chip }} {{ sensor }} temp -
metrics:
node_hwmon_temp_celsius{instance=~"$node:$port",job=~"$job"}
{{ chip }} {{ sensor }} Critical Alarm
metrics:
node_hwmon_temp_crit_alarm_celsius{instance=~"$node:$port",job=~"$job"}
{{ chip }} {{ sensor }} Critical
metrics:
node_hwmon_temp_crit_celsius{instance=~"$node:$port",job=~"$job"}
{{ chip }} {{ sensor }} Critical Historical
metrics:
node_hwmon_temp_crit_hyst_celsius{instance=~"$node:$port",job=~"$job"}
{{ chip }} {{ sensor }} Max
metrics:
node_hwmon_temp_max_celsius{instance=~"$node:$port",job=~"$job"}
10.Disk Datail 磁盘详情信息
10.1 Disk IOps Completed
1. Disk IOps Completed
type: Graph
Unit: I/O ops/sec(iops)
Label: IO read(-)/write(+)
{{device}} - Reads completed: 每个磁盘分区每秒读完成次数
metrics:
irate(node_disk_reads_completed_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Writes completed: 每个磁盘分区每秒写完成次数
metrics:
irate(node_disk_writes_completed_total{instance=~"$node:$port",job=~"$job"}[5m])
10.2 Disk R/W Data
2. Disk R/W Data
type: Graph
Unit: bytes/sec
Label: Bytes read(-)/write(+)
{{device}} - Read bytes 每个磁盘分区每秒读取的比特数
metrics:
irate(node_disk_read_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Written bytes 每个磁盘分区每秒写入的比特数
metrics:
irate(node_disk_written_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
10.3 Disk R/W Time
3. Disk R/W Time
type: Graph
Unit: Milliseconds(ms)
Label: Millisec. read(-)/write(+)
{{device}} - Read time ms 每个磁盘分区读花费的毫秒数
metrics:
irate(node_disk_read_time_seconds_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Write time ms 每个磁盘分区写操作花费的毫秒数
metrics:
irate(node_disk_write_time_seconds_total{instance=~"$node:$port",job=~"$job"}[5m])
10.4 Disk IOs Weighted
4. Disk IOs Weighted
type: Graph
Unit: Milliseconds(ms)
Label: Milliseconds
{{device}} - IO time weighted 每个磁盘分区输入/输出操作花费的加权毫秒数
metrics:
irate(node_disk_io_time_weighted_seconds_total{instance=~"$node:$port",job=~"$job"}[5m])
10.5 Disk R/W Merged
5. Disk R/W Merged
type: Graph
Unit: I/O ops/sec(iops)
Label: I/Os
{{device}} - Read merged 每个磁盘分区每秒合并读完成次数
metrics:
irate(node_disk_reads_merged_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Write merged 每个磁盘分区每秒合并写完成次数
metrics:
irate(node_disk_writes_merged_total{instance=~"$node:$port",job=~"$job"}[5m])
10.6 Milliseconds Spent Doing I/Os
6. Milliseconds Spent Doing I/Os
type: Graph
Unit: Milliseconds(ms)
Label: Milliseconds
{{device}} - IO time ms 每个磁盘分区输入/输出操作花费的毫秒数
metrics:
irate(node_disk_io_time_seconds_total{instance=~"$node:$port",job=~"$job"}[5m])
10.7 Disk IOs Current in Progress
7. Disk IOs Current in Progress
type: Graph
Unit: I/O ops/sec(iops)
Label: I/Os
{{device}} - IO now 每个磁盘分区每秒正在处理的输入/输出请求数
metrics:
irate(node_disk_io_now{instance=~"$node:$port",job=~"$job"}[5m])
10.8 Open Error File (打开错误文件)
8. Open Error File
type: Graph
Unit: short
Label: Errors
Textfile scrape error (1 = true) 打开文件错误的个数
metrics:
node_textfile_scrape_error{instance=~"$node:$port",job=~"$job"}
11. FileSystem Detail 文件系统详情
11.1 挂载的文件系统空间
- Filesystem space available
type: Graph
Unit: bytes
Label: Bytes
11.11.1 挂载的文件系统可用空间
{{mountpoint}} - 挂载的文件系统可用空间
metrics:
node_filesystem_avail_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'}
11.11.2 挂载的文件系统剩余空间
{{mountpoint}} - 挂载的文件系统剩余空间
metrics:
node_filesystem_free_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'}
11.11.3 挂载的文件系统占用空间
{{mountpoint}} - 挂载的文件系统占用空间
metrics:
node_filesystem_size_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'}
11.2 文件节点
2. File Nodes Free
type: Graph
Unit: short
Label: File Nodes
{{mountpoint}} - 挂载的文件系统空闲的文件节点个数
metrics:
node_filesystem_files_free{instance=~"$node:$port",job=~"$job",device!~'rootfs'}
11.3 文件描述符
3. File Descriptor
type: Graph
Unit: short
Label: Files
最大打开文件描述符数:
metrics:
node_filefd_maximum{instance=~"$node:$port",job=~"$job"}
打开文件描述符数:
metrics:
node_filefd_allocated{instance=~"$node:$port",job=~"$job"}
11.4 文件节点大小
4. File Nodes Size
type: Graph
Unit: short
Label: File Nodes
{{mountpoint}} - File nodes total:挂载的文件系统的文件节点大小
metrics:
node_filesystem_files{instance=~"$node:$port",job=~"$job",device!~'rootfs'}
11.5 只读模式挂载的文件系统
5. Filesystem in ReadOnly
type: Graph
Unit: short
Label: Read Only
{{mountpoint}} - ReadOnly 只读模式挂载的文件系统
metrics:
node_filesystem_readonly{instance=~"$node:$port",job=~"$job",device!~'rootfs'}
12. Network Traffic Detail (网络流量详细信息)
12.1 分组网络流量
1. Network Traffic by Packets
type: Graph
Unit: packets/sec
Label: Packets out (-) / in (+)
{{device}} - Receive 各个接口每秒接收的数据包总数
metrics:
irate(node_network_receive_packets_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Transmit 各个接口每秒发送的数据包总数
metrics:
irate(node_network_transmit_packets_total{instance=~"$node:$port",job=~"$job"}[5m])
12.2 网络错误数据包
2. Network Traffic Errors
type: Graph
Unit: packets/sec
Label: Packets out (-) / in (+)
{{device}} - Receive errors 监测到各个接口每秒接收的错误数据包总数
metrics:
irate(node_network_receive_errs_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Rransmit errors 监测到各个接口每秒发送的错误数据包总数
metrics:
irate(node_network_transmit_errs_total{instance=~"$node:$port",job=~"$job"}[5m])
12.3 丢弃的网络数据包
3. Network Traffic Drop
type: Graph
Unit: packets/sec
Label: Packets out (-) / in (+)
{{device}} - Receive drop 各个接口每秒接收的丢弃的数据包总数
metrics:
irate(node_network_receive_drop_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Transmit drop 各个接口每秒发送的丢弃的数据包总数
metrics:
irate(node_network_transmit_drop_total{instance=~"$node:$port",job=~"$job"}[5m])
12.4 网络流量压缩
4. Network Traffic Compressed
type: Graph
Unit: packets/sec
Label: Packets out (-) / in (+)
{{device}} - Receive compressed 各个接口每秒接收的压缩数据包总数
metrics:
irate(node_network_receive_compressed_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Transmit compressed 各个接口每秒发送的压缩数据包总数
metrics:
irate(node_network_transmit_compressed_total{instance=~"$node:$port",job=~"$job"}[5m])
12.5 Network Traffic Multicast
5. Network Traffic Multicast
type: Graph
Unit: packets/sec
Label: Packets out (-) / in (+)
{{device}} - Receive multicast 各个接口每秒接收的多播包数
metrics:
irate(node_network_receive_multicast_total{instance=~"$node:$port",job=~"$job"}[5m])
12.6 网络流量Fifo
6. Network Traffic Fifo
type: Graph
Unit: packets/sec
Label: Packets out (-) / in (+)
{{device}} - Receive fifo 各个接口每秒接收的 fifo 包总数
metrics:
irate(node_network_receive_fifo_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Transmit fifo 各个接口每秒发送的 fifo 包总数
metrics:
irate(node_network_transmit_fifo_total{instance=~"$node:$port",job=~"$job"}[5m])
12.7 Network Traffic Frame
7. Network Traffic Frame
type: Graph
Unit: packets/sec
Label: Packets out (-) / in (+)
{{device}} - Receive frame 各个接口每秒接收的帧数
metrics:
irate(node_network_receive_frame_total{instance=~"$node:$port",job=~"$job"}[5m])
12.8 Network Traffic Carrier
8. Network Traffic Carrier
type: Graph
Unit: short
Label: Counter
{{device}} - Statistic transmit_carrier 由各个接口检测到的载波损耗的数量
metrics:
irate(node_network_transmit_carrier_total{instance=~"$node:$port",job=~"$job"}[5m])
12.9 Network Traffic Colls 网络包冲突
9. Network Traffic Colls
type: Graph
Unit: short
Label: Counter
{{device}} - Transmit colls 各个接口上检测到的冲突数
metrics:
irate(node_network_transmit_colls_total{instance=~"$node:$port",job=~"$job"}[5m])
12.10 NF Contrack
10. NF Contrack
type: Graph
Unit: short
Label: Entries
NF conntrack entries 跟踪连接数
metrics:
node_nf_conntrack_entries{instance=~"$node:$port",job=~"$job"}
NF conntrack limit
metrics:
node_nf_conntrack_entries_limit{instance=~"$node:$port",job=~"$job"}
12.11 ARP Entries(ARP 统计)
11. ARP Entries
type: Graph
Unit: short
Label: Entries
{{ device }} - ARP entries 各个接口上 ARP 表中包的统计
metrics:
node_arp_entries{instance=~"$node:$port",job=~"$job"}
13. Network Sockstat(网络套接)
13.1 Sockstat TCP
1. Sockstat TCP
type: Graph
Unit: short
Label: Sockets
TCP_alloc - 已分配(已建立、已申请到sk_buff)的TCP套接字数量
metrics:
node_sockstat_TCP_alloc{instance=~"$node:$port",job=~"$job"}
TCP_inuse - 正在使用(正在侦听)的TCP套接字数量
metrics:
node_sockstat_TCP_inuse{instance=~"$node:$port",job=~"$job"}
TCP_mem - TCP 套接字缓冲区使用量
metrics:
node_sockstat_TCP_mem{instance=~"$node:$port",job=~"$job"}
TCP_orphan - 无主(不属于任何进程)的TCP连接数(无用、待销毁的TCP socket数)
metrics:
node_sockstat_TCP_orphan{instance=~"$node:$port",job=~"$job"}
TCP_tw - 等待关闭的TCP连接数
metrics:
node_sockstat_TCP_tw{instance=~"$node:$port",job=~"$job"}
13.2 Sockstat UDP
2. Sockstat UDP
type: Graph
Unit: short
Label: Sockets
UDPLITE_inuse - 正在使用的 UDP-Lite 套接字数量
metrics:
node_sockstat_UDPLITE_inuse{instance=~"$node:$port",job=~"$job"}
UDP_inuse - 正在使用的 UDP 套接字数量
metrics:
node_sockstat_UDP_inuse{instance=~"$node:$port",job=~"$job"}
UDP_mem - UDP 套接字缓冲区使用量
metrics:
node_sockstat_UDP_mem{instance=~"$node:$port",job=~"$job"}
13.3 Sockstat Used
3. Sockstat Used
type: Graph
Unit: short
Label: Sockets
Sockets_used - 已使用的所有协议套接字总量
metrics:
node_sockstat_sockets_used{instance=~"$node:$port",job=~"$job"}
13.4 Sockstat Memory Size
4. Sockstat Memory Size
type: Graph
Unit: bytes
Label: Bytes
TCP_mem_bytes - TCP 套接字缓冲区比特数
metrics:
node_sockstat_TCP_mem_bytes{instance=~"$node:$port",job=~"$job"}
UDP_mem_bytes - UDP 套接字缓冲区比特数
metrics:
node_sockstat_UDP_mem_bytes{instance=~"$node:$port",job=~"$job"}
13.5 Sockstat FRAG / RAW
5. Sockstat FRAG / RAW
type: Graph
Unit: short
Label: Sockets
FRAG_inuse - 正在使用的 Frag 套接字数量
metrics:
node_sockstat_FRAG_inuse{instance=~"$node:$port",job=~"$job"}
FRAG_memory - 使用的 Frag 缓冲区
metrics:
node_sockstat_FRAG_memory{instance=~"$node:$port",job=~"$job"}
RAW_inuse - 正在使用的 Raw 套接字数量
metrics:
node_sockstat_RAW_inuse{instance=~"$node:$port",job=~"$job"}
14. Network Netstat (网络)
14.1 Netstat IP In / Out
1. Netstat IP In / Out
type: Graph
Unit: short
Label: Datagrams out (-) / in (+)
InReceives - 接收到的 ip 数据报
metrics:
irate(node_netstat_Ip_InReceives{instance=~"$node:$port",job=~"$job"}[5m])
DefaultTTL - 接收的默认生存时间的 IP 数据报
metrics:
irate(node_netstat_Ip_DefaultTTL{instance=~"$node:$port",job=~"$job"}[5m])
InDelivers - 传递的 IP 数据报
metrics:
irate(node_netstat_Ip_InDelivers{instance=~"$node:$port",job=~"$job"}[5m])
OutRequests - 发送的 ip 数据报
metrics:
irate(node_netstat_Ip_OutRequests{instance=~"$node:$port",job=~"$job"}[5m])
14.2 Netstat IP In / Out (ip 数据报)
2. Netstat IP In / Out
type: Graph
Unit: short
Label: Octets out (-) / in (+)
InOctets - 接收到的 ip 数据报(octets)
metrics:
irate(node_netstat_IpExt_InOctets{instance=~"$node:$port",job=~"$job"}[5m])
OutOctets - 发送的 ip 数据报(octets)
metrics:
irate(node_netstat_IpExt_OutOctets{instance=~"$node:$port",job=~"$job"}[5m])
14.3 Netstat IP Bcast(广播数据报报文)
3. Netstat IP Bcast
type: Graph
Unit: short
Label: Datagrams out (-) / in (+)
InBcastPkts - 接收的 IP 广播数据报报文
metrics:
irate(node_netstat_IpExt_InBcastPkts{instance=~"$node:$port",job=~"$job"}[5m])
OutBcastPkts - 发送的 IP 广播数据报报文
metrics:
irate(node_netstat_IpExt_OutBcastPkts{instance=~"$node:$port",job=~"$job"}[5m])
14.4 Netstat IP Bcast Octets( IP 广播数据报 octet 数)
4. Netstat IP Bcast Octets
type: Graph
Unit: short
Label: Octets out (-) / in (+)
InBcastOctets - 接收的 IP 广播数据报 octet 数
metrics:
irate(node_netstat_IpExt_InBcastOctets{instance=~"$node:$port",job=~"$job"}[5m])
OutBcastOctets - 发送的 IP 广播数据报 octet 数
metrics:
irate(node_netstat_IpExt_OutBcastOctets{instance=~"$node:$port",job=~"$job"}[5m])
14.5 Netstat IP Mcast( IP 多播数据报报文)
5. Netstat IP Mcast
type: Graph
Unit: short
Label: Datagrams out (-) / in (+)
InMcastPkts - 接收的 IP 多播数据报报文
metrics:
irate(node_netstat_IpExt_InMcastPkts{instance=~"$node:$port",job=~"$job"}[5m])
OutMcastPkts - 发送的 IP 多播数据报报文
metrics:
irate(node_netstat_IpExt_OutMcastPkts{instance=~"$node:$port",job=~"$job"}[5m])
14.6 Netstat IP Mcast Octets(IP 多播数据报octet 数)
6. Netstat IP Mcast Octets
type: Graph
Unit: short
Label: Octets out (-) / in (+)
InMcastOctets - 接收的 IP 多播数据报octet 数
metrics:
irate(node_netstat_IpExt_InMcastOctets{instance=~"$node:$port",job=~"$job"}[5m])
OutMcastOctets - 发送的 IP 多播数据报报文 octet 数
metrics:
irate(node_netstat_IpExt_OutMcastOctets{instance=~"$node:$port",job=~"$job"}[5m])
14.7 Netstat IP Forwarding(IP 转发报文数)
7. Netstat IP Forwarding
type: Graph
Unit: short
Label: Datagrams
ForwDatagrams - IP 转发报文数
metrics:
irate(node_netstat_Ip_ForwDatagrams{instance=~"$node:$port",job=~"$job"}[5m])
Forwarding - IP 转发
metrics:
irate(node_netstat_Ip_Forwarding{instance=~"$node:$port",job=~"$job"}[5m])
Netstat IP Fragmented(创建的 IP 分片报文数)
8. Netstat IP Fragmented
type: Graph
Unit: short
Label: Datagrams
FragCreates - 创建的 IP 分片报文数
metrics:
irate(node_netstat_Ip_FragCreates{instance=~"$node:$port",job=~"$job"}[5m])
FragFails - 失败的 IP 分片报文数
metrics:
irate(node_netstat_Ip_FragFails{instance=~"$node:$port",job=~"$job"}[5m])
FragOKs - 成功的 IP 分片报文数
metrics:
irate(node_netstat_Ip_FragOKs{instance=~"$node:$port",job=~"$job"}[5m])
14.9 Netstat IP ECT / CEP(拥塞转发的数据报)
9. Netstat IP ECT / CEP
type: Graph
Unit: short
Label: Datagrams
InCEPkts - 拥塞转发的数据报
metrics:
irate(node_netstat_IpExt_InCEPkts{instance=~"$node:$port",job=~"$job"}[5m])
InECT0Pkts - 接收到的带有 ECT(0) 代码点的 ip 数据报
metrics:
irate(node_netstat_IpExt_InECT0Pkts{instance=~"$node:$port",job=~"$job"}[5m])
InECT1Pkt - 接收到的带有 ECT(1) 代码点的 ip 数据报
metrics:
irate(node_netstat_IpExt_InECT1Pkts{instance=~"$node:$port",job=~"$job"}[5m])
InNoECTPkts - 接收到的带有 NOECT 的 ip 数据报
metrics:
irate(node_netstat_IpExt_InNoECTPkts{instance=~"$node:$port",job=~"$job"}[5m])
14.10 Netstat IP Reasambled
10. Netstat IP Reasambled
type: Graph
Unit: short
Label: Datagrams
ReasmFails - IP 重组失败的数据报
metrics:
irate(node_netstat_Ip_ReasmFails{instance=~"$node:$port",job=~"$job"}[5m])
ReasmOKs - IP 重组成功的数据报
metrics:
irate(node_netstat_Ip_ReasmOKs{instance=~"$node:$port",job=~"$job"}[5m])
ReasmReqds - 需要进行 IP 重组的数据报
metrics:
irate(node_netstat_Ip_ReasmReqds{instance=~"$node:$port",job=~"$job"}[5m])
ReasmTimeout - IP 重组超时的数据报
metrics:
irate(node_netstat_Ip_ReasmTimeout{instance=~"$node:$port",job=~"$job"}[5m])
14.11 Netstat IP Errors / Discards(接收的丢弃的 ip 数据报)
11. Netstat IP Errors / Discards
type: Graph
Unit: short
Label: Datagrams out (-) / in (+)
InDiscards - 接收的丢弃的 ip 数据报
metrics:
irate(node_netstat_Ip_InDiscards{instance=~"$node:$port",job=~"$job"}[5m])
InHdrErrors - IP inhdrerrors
metrics:
irate(node_netstat_Ip_InHdrErrors{instance=~"$node:$port",job=~"$job"}[5m])
InUnknownProtos - 由于未知协议而丢弃的 IP 数据报
metrics:
irate(node_netstat_Ip_InUnknownProtos{instance=~"$node:$port",job=~"$job"}[5m])
OutDiscards - IP outdiscards
metrics:
irate(node_netstat_Ip_OutDiscards{instance=~"$node:$port",job=~"$job"}[5m])
OutNoRoutes - 由于没有输出路由而丢弃的 IP 数据报
metrics:
irate(node_netstat_Ip_OutNoRoutes{instance=~"$node:$port",job=~"$job"}[5m])
InNoRoutes - 由于转发路径中没有路由而丢弃的 IP 数据报
metrics:
irate(node_netstat_IpExt_InNoRoutes{instance=~"$node:$port",job=~"$job"}[5m])
InCsumErrors - 具有校验和错误的 IP 数据报
metrics:
irate(node_netstat_IpExt_InCsumErrors{instance=~"$node:$port",job=~"$job"}[5m])
InTruncatedPkts - 由于帧没有携带足够的数据而丢弃的 IP 数据报
metrics:
irate(node_netstat_IpExt_InTruncatedPkts{instance=~"$node:$port",job=~"$job"}[5m])
InAddrErrors - 由于内部地址错误而丢弃的 IP 数据报
metrics:
irate(node_netstat_Ip_InAddrErrors{instance=~"$node:$port",job=~"$job"}[5m])
15. Network Netstat TCP
15.1 TCP Segments(TCP 段)
1. TCP Segments
type: Graph
Unit: short
Label: Segments out (-) / in (+)
InCsumErrors - 接收的带有校验和错误的报文数(5分钟内)
metrics:
irate(node_netstat_Tcp_InCsumErrors{instance=~"$node:$port",job=~"$job"}[5m])
InErrs - TCP 接收的错误报文数(5分钟内)(例如:错误的校验和)
metrics:
irate(node_netstat_Tcp_InErrs{instance=~"$node:$port",job=~"$job"}[5m])
InSegs - TCP 接收的目前所有建立连接的错误报文数(5分钟内)(例如:错误的校验和)
metrics:
irate(node_netstat_Tcp_InSegs{instance=~"$node:$port",job=~"$job"}[5m])
OutRsts - TCP 发送的报文数(5分钟内)(包括 RST flag)
metrics:
irate(node_netstat_Tcp_OutRsts{instance=~"$node:$port",job=~"$job"}[5m])
OutSegs - TCP 发送的报文数(5分钟内)(包括当前连接的段但是不包括重传的段)
metrics:
irate(node_netstat_Tcp_OutSegs{instance=~"$node:$port",job=~"$job"}[5m])
RetransSegs - TCP 重传报文数(5分钟内)
metrics:
irate(node_netstat_Tcp_RetransSegs{instance=~"$node:$port",job=~"$job"}[5m])
15.2 TCP 连接
2. TCP Connections
type: Graph
Unit: short
Label: Connections
CurrEstab - 当前状态为 ESTABLISHED 或 CLOSE-WAIT 的 TCP 连接数
metrics:
node_netstat_Tcp_CurrEstab{instance=~"$node:$port",job=~"$job"}
MaxConn - 限制实体可以支持的 TCP 最大连接总数
metrics:
node_netstat_Tcp_MaxConn{instance=~"$node:$port",job=~"$job"}
15.3 TCP 重传
3. TCP Retransmission
type: Graph
Unit: milliseconds
Label: Milliseconds
RtoAlgorithm - TCP 重传超时时间
metrics:
node_netstat_Tcp_RtoAlgorithm{instance=~"$node:$port",job=~"$job"}
RtoMax - TCP允许的重传超时的最大值,以毫秒为单位
metrics:
node_netstat_Tcp_RtoMax{instance=~"$node:$port",job=~"$job"}
RtoMin - TCP允许的重传超时的最小值,以毫秒为单位
metrics:
node_netstat_Tcp_RtoMin{instance=~"$node:$port",job=~"$job"}
15.4 TCP Segments
4. TCP Segments
type: Graph
Unit: short
Label: Connections
ActiveOpens - 已从 CLOSED 状态直接转换到 SYN-SENT 状态的 TCP 平均连接数(5分钟内)
metrics:
irate(node_netstat_Tcp_ActiveOpens{instance=~"$node:$port",job=~"$job"}[5m])
AttemptFails - 从 SYN-SENT 和 SYN-RCVD 转换到 CLOSED 状态的 TCP 平均连接数(5分钟内)
metrics:
irate(node_netstat_Tcp_AttemptFails{instance=~"$node:$port",job=~"$job"}[5m])
EstabResets - 从 ESTABLISHED 状态或 CLOSE-WAIT 状态直接转换到 CLOSED 状态的 TCP 平均连接数(5分钟内)
metrics:
irate(node_netstat_Tcp_EstabResets{instance=~"$node:$port",job=~"$job"}[5m])
PassiveOpens - 已从 LISTEN 状态直接转换到 SYN-RCVD 状态的 TCP 平均连接数(5分钟内)
metrics:
irate(node_netstat_Tcp_PassiveOpens{instance=~"$node:$port",job=~"$job"}[5m])
16. Network Netstat TCP Linux MIP
16.1 TCP Aborts / Tiemouts
1. TCP Aborts / Tiemouts
type: Graph
Unit: short
Label: Connections
TCPAbortOnClose - 由于用户关闭中止的连接数
metrics:
irate(node_netstat_TcpExt_TCPAbortOnClose{instance=~"$node:$port",job=~"$job"}[5m])
TCPAbortOnData - 由于意外数据而中止的连接数
metrics:
irate(node_netstat_TcpExt_TCPAbortOnData{instance=~"$node:$port",job=~"$job"}[5m])
TCPAbortOnLinger - 关闭后,在徘徊状态中止的连接数
metrics:
irate(node_netstat_TcpExt_TCPAbortOnLinger{instance=~"$node:$port",job=~"$job"}[5m])
TCPAbortOnMemory - 连接到 socket 之前中止的连接数
metrics:
irate(node_netstat_TcpExt_TCPAbortOnMemory{instance=~"$node:$port",job=~"$job"}[5m])
TCPAbortOnTimeout - 由于超时中止的连接数
metrics:
irate(node_netstat_TcpExt_TCPAbortOnTimeout{instance=~"$node:$port",job=~"$job"}[5m])
TCPAbortFailed - 由于内存不足,连接中止但未发送RST的连接数
metrics:
irate(node_netstat_TcpExt_TCPAbortFailed{instance=~"$node:$port",job=~"$job"}[5m])
TCPTimeouts - 其他 TCP 连接超时的连接数
metrics:
irate(node_netstat_TcpExt_TCPTimeouts{instance=~"$node:$port",job=~"$job"}[5m])
16.2 TCP Delayed ACK
2. TCP Delayed ACK
type: Graph
Unit: short
Label: Counter
DelayedACKLocked - 由于 socket 锁定 延时ACK 进一步延迟的数量
metrics:
irate(node_netstat_TcpExt_DelayedACKLocked{instance=~"$node:$port",job=~"$job"}[5m])
DelayedACKLost - 快速回复 ACK 模式被激活的数量
metrics:
irate(node_netstat_TcpExt_DelayedACKLost{instance=~"$node:$port",job=~"$job"}[5m])
DelayedACKs - 发送延迟 AC K的数量
metrics:
irate(node_netstat_TcpExt_DelayedACKs{instance=~"$node:$port",job=~"$job"}[5m])
16.3 TCP SynCookie / Challenge
3. TCP SynCookie / Challenge
type: Graph
Unit: short
Label: Counter out (-) / in (+)
SyncookiesFailed - 接收的无效的 SYN cookies 的数量
metrics:
irate(node_netstat_TcpExt_SyncookiesFailed{instance=~"$node:$port",job=~"$job"}[5m])
SyncookiesRecv - 接收的 SYN cookies 的数量
metrics:
irate(node_netstat_TcpExt_SyncookiesRecv{instance=~"$node:$port",job=~"$job"}[5m])
SyncookiesSent - 发送的 SYN cookies 的数量
metrics:
irate(node_netstat_TcpExt_SyncookiesSent{instance=~"$node:$port",job=~"$job"}[5m])
SynChallenge - 发送的 SYNChallenge 数量
metrics:
irate(node_netstat_TcpExt_TCPSYNChallenge{instance=~"$node:$port",job=~"$job"}[5m])
TCPChallengeACK - 发送的 Challenge ACK 数量
metrics:
irate(node_netstat_TcpExt_TCPChallengeACK{instance=~"$node:$port",job=~"$job"}[5m])
16.4 TCP LOSS( Loss 状态下的 TCP 包数量)
4. TCP LOSS
type: Graph
Unit: short
Label: Counter
TCPLossFailures - 处于 Loss 状态下的 TCP 包数量
metrics:
irate(node_netstat_TcpExt_TCPLossFailures{instance=~"$node:$port",job=~"$job"}[5m])
TCPLossProbeRecovery - 恢复的 TCP 丢失探测定时器的数量
metrics:
irate(node_netstat_TcpExt_TCPLossProbeRecovery{instance=~"$node:$port",job=~"$job"}[5m])
TCPLossProbes - 发送的 TCP 丢失探测定时器的数量
metrics:
irate(node_netstat_TcpExt_TCPLossProbes{instance=~"$node:$port",job=~"$job"}[5m])
TCPLossUndo - 在部分确认后,拥塞窗口没有缓慢启动而恢复的数量
metrics:
irate(node_netstat_TcpExt_TCPLossUndo{instance=~"$node:$port",job=~"$job"}[5m])
TCPLostRetransmit - TCP 包丢失重传的数量
metrics:
irate(node_netstat_TcpExt_TCPLostRetransmit{instance=~"$node:$port",job=~"$job"}[5m])
16.5 TCP DROPS(监听队列连接丢弃数)
5. TCP DROPS
type: Graph
Unit: short
Label: Counter
ListenDrops - 监听队列连接丢弃数
metrics:
irate(node_netstat_TcpExt_ListenDrops{instance=~"$node:$port",job=~"$job"}[5m])
LockDroppedIcmps - 因 socket 锁定而丢弃的 ICMP 数据包数量
metrics:
irate(node_netstat_TcpExt_LockDroppedIcmps{instance=~"$node:$port",job=~"$job"}[5m])
TCPDeferAcceptDrop - 在 SYN_RECV 状态下由 socket 接收的丢弃的 ACK 帧
metrics:
irate(node_netstat_TcpExt_TCPDeferAcceptDrop{instance=~"$node:$port",job=~"$job"}[5m])
TCPBacklogDrop - 由于 socket 接收队列已满,丢弃的TCP数据包数量
metrics:
irate(node_netstat_TcpExt_TCPBacklogDrop{instance=~"$node:$port",job=~"$job"}[5m])
OutOfWindowIcmps - 由于 out-of-window 丢弃的 ICMP 包数量
metrics:
irate(node_netstat_TcpExt_OutOfWindowIcmps{instance=~"$node:$port",job=~"$job"}[5m])
TCPMinTTLDrop - 在 minTTL 条件下丢弃的 TCP数据包的数量
metrics:
irate(node_netstat_TcpExt_TCPMinTTLDrop{instance=~"$node:$port",job=~"$job"}[5m])
16.6 TCP Retrans(重新传输丢失的数据包的数量)
6. TCP Retrans
type: Graph
Unit: short
Label: Counter
TCPForwardRetrans - 使用 F-RTO 重新传输丢失的数据包的数量
metrics:
irate(node_netstat_TcpExt_TCPForwardRetrans{instance=~"$node:$port",job=~"$job"}[5m])
TCPSlowStartRetrans - 在慢启动后重传丢失的数据包数量
metrics:
irate(node_netstat_TcpExt_TCPSlowStartRetrans{instance=~"$node:$port",job=~"$job"}[5m])
TCPSynRetrans - SYN-SYN/ACK重传以分解 SYN 中的重传,快速/超时重传
metrics:
irate(node_netstat_TcpExt_TCPSynRetrans{instance=~"$node:$port",job=~"$job"}[5m])
TCPSpuriousRTOs - TCP 虚假 RTOs 数量
metrics:
irate(node_netstat_TcpExt_TCPSpuriousRTOs{instance=~"$node:$port",job=~"$job"}[5m])
TCPSpuriousRtxHostQueues - Times detected that the fast clone is not yet freed in tcp_transmit_skb()
metrics:
irate(node_netstat_TcpExt_TCPSpuriousRtxHostQueues{instance=~"$node:$port",job=~"$job"}[5m])
TCPFullUndo - 重传 undoRetransmits that undid the CWND reduction
metrics:
irate(node_netstat_TcpExt_TCPFullUndo{instance=~"$node:$port",job=~"$job"}[5m])
TCPRetransFail - tcp_retransmit_skb() 调用失败的数量
metrics:
irate(node_netstat_TcpExt_TCPRetransFail{instance=~"$node:$port",job=~"$job"}[5m])
TCPPartialUndo - 使用 Hoe heuristic 部分恢复拥塞窗口
metrics:
irate(node_netstat_TcpExt_TCPPartialUndo{instance=~"$node:$port",job=~"$job"}[5m])
16.7 TCP Pruned (删除的数据包数量)
7. TCP Pruned
type: Graph
Unit: short
Label: Counter
PruneCalled - 由于 socket 缓冲区溢出而从接收队列中删除的数据包数量
metrics:
irate(node_netstat_TcpExt_PruneCalled{instance=~"$node:$port",job=~"$job"}[5m])
RcvPruned - 从接收队列中删除的数据包数量
metrics:
irate(node_netstat_TcpExt_RcvPruned{instance=~"$node:$port",job=~"$job"}[5m])
OfoPruned - 由于 socket 缓冲区溢出,从无序队列中删除的数据包数量
metrics:
irate(node_netstat_TcpExt_OfoPruned{instance=~"$node:$port",job=~"$job"}[5m])
16.8 TCP Direct Copy
8. TCP Direct Copy
type: Graph
Unit: short
Label: Counter
TCPDirectCopyFromBacklog - 接收的来自 accept queue 的数据包
metrics:
irate(node_netstat_TcpExt_TCPDirectCopyFromBacklog{instance=~"$node:$port",job=~"$job"}[5m])
TCPDirectCopyFromPrequeue - 接收的来自 TCP prequeue 的数据包
metrics:
irate(node_netstat_TcpExt_TCPDirectCopyFromPrequeue{instance=~"$node:$port",job=~"$job"}[5m])
16.9 TCP TimeWait
9. TCP TimeWait
type: Graph
Unit: short
Label: Counter
TW - 在快速计时器中完成 TIME_WAITTCP 套接字
metrics:
irate(node_netstat_TcpExt_TW{instance=~"$node:$port",job=~"$job"}[5m])
TWKilled - 在慢速计时器中完成 TIME_WAITTCP 套接字
metrics:
irate(node_netstat_TcpExt_TWKilled{instance=~"$node:$port",job=~"$job"}[5m])
TWRecycled - 按时间戳回收的 TIME_WAIT 套接字
metrics:
irate(node_netstat_TcpExt_TWRecycled{instance=~"$node:$port",job=~"$job"}[5m])
TCPTimeWaitOverflow - 发生 TIME_WAIT 溢出的数量
metrics:
irate(node_netstat_TcpExt_TCPTimeWaitOverflow{instance=~"$node:$port",job=~"$job"}[5m])
16.10 TCP PAWS
10. TCP PAWS
type: Graph
Unit: short
Label: Counter
PAWSActive - 由于 TCP 时间戳PAWS而拒绝激活的连接数
metrics:
irate(node_netstat_TcpExt_PAWSActive{instance=~"$node:$port",job=~"$job"}[5m])
PAWSEstab - 由于 TCP 时间戳PAWS而拒绝建立连接的数据包数量
metrics:
irate(node_netstat_TcpExt_PAWSEstab{instance=~"$node:$port",job=~"$job"}[5m])
PAWSPassive - 由于 TCP 时间戳PAWS而被拒绝的被动连接数
metrics:
irate(node_netstat_TcpExt_PAWSPassive{instance=~"$node:$port",job=~"$job"}[5m])
16.11 TCP SACK(Sack 恢复丢失的包)
11. TCP SACK
type: Graph
Unit: short
Label: Counter
TCPSackRecovery - 使用 Sack 恢复丢失的包
metrics:
irate(node_netstat_TcpExt_TCPSackRecovery{instance=~"$node:$port",job=~"$job"}[5m])
TCPSackRecoveryFail - 使用 Sack 恢复丢失的包失败
metrics:
irate(node_netstat_TcpExt_TCPSackRecoveryFail{instance=~"$node:$port",job=~"$job"}[5m])
TCPSackShiftFallback
metrics:
irate(node_netstat_TcpExt_TCPSackShiftFallback{instance=~"$node:$port",job=~"$job"}[5m])
TCPSackShifted
metrics:
irate(node_netstat_TcpExt_TCPSackShifted{instance=~"$node:$port",job=~"$job"}[5m])
TCPSackDiscard
metrics:
irate(node_netstat_TcpExt_TCPSACKDiscard{instance=~"$node:$port",job=~"$job"}[5m])
TCPSackFailures
metrics:
irate(node_netstat_TcpExt_TCPSackFailures{instance=~"$node:$port",job=~"$job"}[5m])
TCPSackMerged
metrics:
irate(node_netstat_TcpExt_TCPSackMerged{instance=~"$node:$port",job=~"$job"}[5m])
TCPSACKReneging
metrics:
irate(node_netstat_TcpExt_TCPSACKReneging{instance=~"$node:$port",job=~"$job"}[5m])
TCPSACKReorder
metrics:
irate(node_netstat_TcpExt_TCPSACKReorder{instance=~"$node:$port",job=~"$job"}[5m])
16.12 TCP DSACK
12. TCP DSACK
type: Graph
Unit: short
Label: Counter
TCPDSACKIgnoredOld - 在重新传输时丢弃具有重复 SACK 的数据包
metrics:
irate(node_netstat_TcpExt_TCPDSACKIgnoredOld{instance=~"$node:$port",job=~"$job"}[5m])
TCPDSACKOfoRecv - 接收到无序的 DSACK 数据包
metrics:
irate(node_netstat_TcpExt_TCPDSACKOfoRecv{instance=~"$node:$port",job=~"$job"}[5m])
TCPDSACKOfoSent - 发送的无序的 DSACK 数据包
metrics:
irate(node_netstat_TcpExt_TCPDSACKOfoSent{instance=~"$node:$port",job=~"$job"}[5m])
TCPDSACKOldSent - 发送的旧 DSACKs 数据包
metrics:
irate(node_netstat_TcpExt_TCPDSACKOldSent{instance=~"$node:$port",job=~"$job"}[5m])
TCPDSACKRecv - 接收的 DSACK 数据包
metrics:
irate(node_netstat_TcpExt_TCPDSACKRecv{instance=~"$node:$port",job=~"$job"}[5m])
TCPDSACKUndo
metrics:
irate(node_netstat_TcpExt_TCPDSACKUndo{instance=~"$node:$port",job=~"$job"}[5m])
TCPDSACKIgnoredNoUndo
metrics:
irate(node_netstat_TcpExt_TCPDSACKIgnoredNoUndo{instance=~"$node:$port",job=~"$job"}[5m])
16.13 TCP FastOpen / FastRetrans
13. TCP FastOpen / FastRetrans
type: Graph
Unit: short
Label: Counter
TCPFastOpenActive - 成功的出站 TFO 连接
metrics:
irate(node_netstat_TcpExt_TCPFastOpenActive{instance=~"$node:$port",job=~"$job"}[5m])
TCPFastOpenActiveFail - 收到的 SYN-ACK 数据包未确认 SYN 数据包中发送的数据,并导致无 SYN 数据的重传
metrics:
irate(node_netstat_TcpExt_TCPFastOpenActiveFail{instance=~"$node:$port",job=~"$job"}[5m])
TCPFastOpenCookieReqd - 请求设置 TFO 但没有 cookie 的入站 SYN 数据包
metrics:
irate(node_netstat_TcpExt_TCPFastOpenCookieReqd{instance=~"$node:$port",job=~"$job"}[5m])
TCPFastOpenListenOverflow - TFO 监听队列溢出
metrics:
irate(node_netstat_TcpExt_TCPFastOpenListenOverflow{instance=~"$node:$port",job=~"$job"}[5m])
TCPFastOpenPassive - 成功的入站 TFO 连接
metrics:
irate(node_netstat_TcpExt_TCPFastOpenPassive{instance=~"$node:$port",job=~"$job"}[5m])
TCPFastOpenPassiveFail - 带有TFO cookie 的无效的入站 SYN 数据包
metrics:
irate(node_netstat_TcpExt_TCPFastOpenPassiveFail{instance=~"$node:$port",job=~"$job"}[5m])
TCPFastRetrans - 丢失快速重传的数据包
metrics:
irate(node_netstat_TcpExt_TCPFastRetrans{instance=~"$node:$port",job=~"$job"}[5m])
16.14 TCP HP
14. TCP HP
type: Graph
Unit: short
Label: Counter
TCPHPAcks - 接收到的不包含数据的 Acks
metrics:
irate(node_netstat_TcpExt_TCPHPAcks{instance=~"$node:$port",job=~"$job"}[5m])
TCPHPHits - HP 数据包
metrics:
irate(node_netstat_TcpExt_TCPHPHits{instance=~"$node:$port",job=~"$job"}[5m])
TCPHPHitsToUser
metrics:
irate(node_netstat_TcpExt_TCPHPHitsToUser{instance=~"$node:$port",job=~"$job"}[5m])
16.15 TCP ZeroWindow
15. TCP ZeroWindow
type: Graph
Unit: short
Label: Counter
TCPToZeroWindowAdv
metrics:
irate(node_netstat_TcpExt_TCPToZeroWindowAdv{instance=~"$node:$port",job=~"$job"}[5m])
TCPWantZeroWindowAdv
metrics:
irate(node_netstat_TcpExt_TCPWantZeroWindowAdv{instance=~"$node:$port",job=~"$job"}[5m])
TCPFromZeroWindowAdv
metrics:
irate(node_netstat_TcpExt_TCPFromZeroWindowAdv{instance=~"$node:$port",job=~"$job"}[5m])
16. TCP Reorder
type: Graph
Unit: short
Label: Counter
TCPFACKReorder - 如果在需要更新时判断支持FACK,使用 TCPFACKReorder 计数器
metrics:
irate(node_netstat_TcpExt_TCPFACKReorder{instance=~"$node:$port",job=~"$job"}[5m])
TCPTSReorder - 如果是被一个partial ack确认后需要更新reorder值,使用 TCPTSReorder 计数器
metrics:
irate(node_netstat_TcpExt_TCPTSReorder{instance=~"$node:$port",job=~"$job"}[5m])
16.17 TCP Reno
17. TCP Reno
type: Graph
Unit: short
Label: Counter
TCPRenoFailures - reno 后快速重传超时的数量
metrics:
irate(node_netstat_TcpExt_TCPRenoFailures{instance=~"$node:$port",job=~"$job"}[5m])
TCPRenoRecovery
metrics:
irate(node_netstat_TcpExt_TCPRenoRecovery{instance=~"$node:$port",job=~"$job"}[5m])
TCPRenoRecoveryFail
metrics:
irate(node_netstat_TcpExt_TCPRenoRecoveryFail{instance=~"$node:$port",job=~"$job"}[5m])
TCPRenoReorder
metrics:
irate(node_netstat_TcpExt_TCPRenoReorder{instance=~"$node:$port",job=~"$job"}[5m])
16.18 TCP ReqQ
18. TCP ReqQ
type: Graph
Unit: short
Label: Counter
TCPReqQFullDoCookies
metrics:
irate(node_netstat_TcpExt_TCPReqQFullDoCookies{instance=~"$node:$port",job=~"$job"}[5m])
TCPReqQFullDrop
metrics:
irate(node_netstat_TcpExt_TCPReqQFullDrop{instance=~"$node:$port",job=~"$job"}[5m])
16.19 TCP Out of order
19. TCP Out of order
type: Graph
Unit: short
Label: Counter
TCPOFODrop - 在 OFO 中排队但由于达到了 socket rcvbuf 限制而丢弃的数据包
metrics:
irate(node_netstat_TcpExt_TCPOFODrop{instance=~"$node:$port",job=~"$job"}[5m])
TCPOFOMerge - OFO 中与其他数据包合并的数据包
metrics:
irate(node_netstat_TcpExt_TCPOFOMerge{instance=~"$node:$port",job=~"$job"}[5m])
TCPOFOQueue - OFO 队列的数据包
metrics:
irate(node_netstat_TcpExt_TCPOFOQueue{instance=~"$node:$port",job=~"$job"}[5m])
16.20 TCP MD5
20. TCP MD5
type: Graph
Unit: short
Label: Counter
TCPMD5NotFound - 希望收到带 MD5 选项的包,但是包里面没有 MD5 选项
metrics:
irate(node_netstat_TcpExt_TCPMD5NotFound{instance=~"$node:$port",job=~"$job"}[5m])
TCPMD5Unexpected - 不希望收到带 MD5 选项的包,但是包里面有 MD5 选项
metrics:
irate(node_netstat_TcpExt_TCPMD5Unexpected{instance=~"$node:$port",job=~"$job"}[5m])
16.21 TCP Prequeued
21. TCP Prequeued
type: Graph
Unit: short
Label: Counter
TCPPrequeued
metrics:
irate(node_netstat_TcpExt_TCPPrequeued{instance=~"$node:$port",job=~"$job"}[5m])
TCPPrequeueDropped - prequeue 队列丢弃的数据包
metrics:
irate(node_netstat_TcpExt_TCPPrequeueDropped{instance=~"$node:$port",job=~"$job"}[5m])
16.22 TCP Rcv
22. TCP Rcv
type: Graph
Unit: short
Label: Counter
TCPRcvCoalesce - 在接收队列中崩溃的数据包
metrics:
irate(node_netstat_TcpExt_TCPRcvCoalesce{instance=~"$node:$port",job=~"$job"}[5m])
TCPRcvCollapsed - 由于低的 socket 缓冲区,在接收队列中崩溃的数据包
metrics:
irate(node_netstat_TcpExt_TCPRcvCollapsed{instance=~"$node:$port",job=~"$job"}[5m])
23. TCP Original Data
type: Graph
Unit: short
Label: Counter
TCPOrigDataSent - 带有原始数据的传出数据包
metrics:
irate(node_netstat_TcpExt_TCPOrigDataSent{instance=~"$node:$port",job=~"$job"}[5m])
16.24 TCP Filters
24. TCP Filters
type: Graph
Unit: short
Label: Counter
ArpFilter - 过滤的 Arp 数据包
metrics:
irate(node_netstat_TcpExt_ArpFilter{instance=~"$node:$port",job=~"$job"}[5m])
IPReversePathFilter - 从非直连网络到达的数据包
metrics:
irate(node_netstat_TcpExt_IPReversePathFilter{instance=~"$node:$port",job=~"$job"}[5m])
16.25 TCP Pure ACK
25. TCP Pure ACK
type: Graph
Unit: short
Label: Counter
TCPPureAcks - 接收到不包含的数据负载的 ACKs
metrics:
irate(node_netstat_TcpExt_TCPPureAcks{instance=~"$node:$port",job=~"$job"}[5m])
16.26 TCP Auto Corking
26. TCP Auto Corking
type: Graph
Unit: short
Label: Counter
TCPAutoCorking - Tcp 自动闭塞
metrics:
irate(node_netstat_TcpExt_TCPAutoCorking{instance=~"$node:$port",job=~"$job"}[5m])
16.27 TCP Issues
27. TCP Issues
type: Graph
Unit: short
Label: Counter
BusyPollRxPackets - 低延迟应用程序获取的数据包
metrics:
irate(node_netstat_TcpExt_BusyPollRxPackets{instance=~"$node:$port",job=~"$job"}[5m])
EmbryonicRsts - Resets received for embryonic SYN_RECV sockets
metrics:
irate(node_netstat_TcpExt_EmbryonicRsts{instance=~"$node:$port",job=~"$job"}[5m])
ListenOverflows - 监听 socket 的队列溢出
metrics:
irate(node_netstat_TcpExt_ListenOverflows{instance=~"$node:$port",job=~"$job"}[5m])
TCPSchedulerFailed
metrics:
irate(node_netstat_TcpExt_TCPSchedulerFailed{instance=~"$node:$port",job=~"$job"}[5m])
TCPMemoryPressures
metrics:
irate(node_netstat_TcpExt_TCPMemoryPressures{instance=~"$node:$port",job=~"$job"}[5m])
17. Node Exporter
17.1 Node Exporter Scrape Time
1. Node Exporter Scrape Time
type: Graph
Unit: seconds
Label: Seconds
{{collector}} - 各个收集器持续时间
metrics:
node_scrape_collector_duration_seconds{instance=~"$node:$port",job=~"$job"}
17.2 Node Exporter Scrape Success
2. Node Exporter Scrape Success
type: Graph
Unit: short
Label: Counter
{{collector}} - 各个收集器正常工作数量
metrics:
node_scrape_collector_success{instance=~"$node:$port",job=~"$job"}
本文含有隐藏内容,请 开通VIP 后查看