人大金仓KADB监控工具及问题处理

发布于:2024-07-05 ⋅ 阅读:(14) ⋅ 点赞:(0)

目录

1.  Kmonitor拆包安装部署. 3

1.1.环境准备. 3

1.2.拷贝并解压. 4

1.3.kadb_exporter 4

1.3.1 修改application.yml文件. 4

1.3.2 修改连接池. 5

1.3.3 修改启动文件(可选) 6

1.4.H2库. 7

1.4.1进入h2db并修改启动文件(可选) 7

1.4.2 打开h2库网址并连接. 8

1.4.3 启动kadb_exporter 8

1.5.node_exporter 8

1.6.Prometheus 10

1.6.1 node_conf 10

1.6.2 修改prometheus.yml配置文件并启动. 14

1.6.3 打开prometheus网址并查看探针状态. 16

1.7.Grafana 16

1.7.1 插件所在目录(可选). 16

1.7.2 启动grafana面板. 17

1.7.3 打开grafana面板并查看状态. 17

2. 注意事项. 20

2.1.h2库. 20

2.2.node_exporter 20

2.3.kadb_exporter 20

2.4.KADB 20

2.5.grafana面板参数. 20

  1. Kmonitor拆包安装部署
  1. 环境准备

操作系统:centos7+

集群主机名称和IP地址对应关系:

IP地址(内网)

内网网卡

IP地址(外网)

外网网卡

主机名称

172.18.35.208

bondib0

10.1.35.208

bond0

dwabamg01

172.18.35.209

bondib0

10.1.35.209

bond0

dwabamg02

172.18.35.211

bondib0

10.1.35.211

bond0

dwabasg01

172.18.35.212

bondib0

10.1.35.212

bond0

dwabasg02

172.18.35.213

bondib0

10.1.35.213

bond0

dwabasg03

172.18.35.214

bondib0

10.1.35.214

bond0

dwabasg04

172.18.35.215

bondib0

10.1.35.215

bond0

dwabasg05

172.18.35.216

bondib0

10.1.35.216

bond0

dwabasg06

172.18.35.217

bondib0

10.1.35.217

bond0

dwabasg07

172.18.35.218

bondib0

10.1.35.218

bond0

dwabasg08

浮动IP:10.1.35.210

网关地址:10.1.35.254

数据库:postgres

集群用户名:xinjiang

密码:统一为123456,如需更改,需要修改编译之后的密码

在集群所有节点上创建操作系统用户:xinjiang,使用root用户在下面的机器上执行:

主机名称

执行操作

dwabamg01

useradd -g mppadmin xinjiang

dwabamg02

useradd -g mppadmin xinjiang

dwabasg01

useradd -g mppadmin xinjiang

dwabasg02

useradd -g mppadmin xinjiang

dwabasg03

useradd -g mppadmin xinjiang

dwabasg04

useradd -g mppadmin xinjiang

dwabasg05

useradd -g mppadmin xinjiang

dwabasg06

useradd -g mppadmin xinjiang

dwabasg07

useradd -g mppadmin xinjiang

dwabasg08

useradd -g mppadmin xinjiang

  1. 拷贝并解压

拷贝centos7_amd64.tar.gz到集群节点10.1.35.209的用户xinjiang的home目录下并解压,必须在监控的集群用户下,root用户执行

su - xinjiang

tar -xvf centos7_amd64.tar.gz

  1. kadb_exporter

1.3.1 修改application.yml文件

使用xinjiang用户编辑/home/xinjiang/centos7_amd64/kadb_exporter文件

[xinjiang@dwabamg02 kadb_exporter]$ pwd

/home/xinjiang/centos7_amd64/kadb_exporter

[xinjiang@dwabamg02 kadb_exporter]$ vi application.yml

spring:

  profiles:

    active: development

server:

  port: 10000   #kadb_exporter端口号

---

spring:

  profiles: development

  datasource:

url:jdbc:h2:tcp://10.1.35.209:10002//home/xinjiang/centos7_amd64/h2db/data/operator   

#h2库url串

    username: root

    password: ENC(rVBkqsNjKhfSrkZazEoIQzMUlEejr6qNfP6U8m66JS2nSupFuAJpeRReWH_w_y39eGI6pZwbKenptjFxD4KJiuTIncrK2h3mBCiaTOTQHKX32rXD6NrW1gmGSVmE0blBXOdLZZkEfTVEWOAHR8IdldVCkK8anzOEC7em68qJZ98)   #这里的密码是h2库连接的root密码是123456

    hikari:

      pool-name: default

      connection-test-query: select current_timestamp;

      minimum-idle: 2

      maximum-pool-size: 10

1.3.2 修改连接池

使用xinjiang用户编辑/home/xinjiang/centos7_amd64/kadb_exporter/conf文件

[xinjiang@dwabamg02 conf]$ pwd

/home/xinjiang/centos7_amd64/kadb_exporter/conf

[xinjiang@dwabamg02 conf]$ vi jdbc_pool_default.xml

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">

<properties>

    <comment>动态数据源池中默认JDBC连接池配置参数</comment>

    <entry key="clusterName"><![CDATA[KADB_CLUSTER]]></entry>

    <entry key="driverClass"><![CDATA[org.postgresql.Driver]]></entry>

<entry key="jdbcUrl"><![CDATA[jdbc:postgresql://10.1.35.209:5888/postgres]]></entry>

#数据库访问url,格式:jdbc:postgresql://ip:port/db_name

    <entry key="database"><![CDATA[postgres]]></entry>

    <entry key="username"><![CDATA[xinjiang]]></entry>

<entry key="password"><![CDATA[rVBkqsNjKhfSrkZazEoIQzMUlEejr6qNfP6U8m66JS2nSupFuAJpeRReWH_w_y39eGI6pZwbKenptjFxD4KJiuTIncrK2h3mBCiaTOTQHKX32rXD6NrW1gmGSVmE0blBXOdLZZkEfTVEWOAHR8IdldVCkK8anzOEC7em68qJZ98]]></entry>

gzkadb@sx0sxrf

#数据库用户密码为加密密码123456,获取加密密码使用的脚本为:

#./pass_enc.sh --prikey conf/conf_encrypt.pri --dbpass $PASSWORD

    <entry key="minimumIdle"><![CDATA[2]]></entry>

    <entry key="maximumPoolSize"><![CDATA[4]]></entry>

    <entry key="testQuery"><![CDATA[select current_timestamp]]></entry>

</properties>

1.3.3 修改启动文件(可选)

如果限制kadb_exporter进程使用的内存为2.5GB,执行下面的操作

使用xinjiang用户编辑/home/xinjiang/centos7_amd64/kadb_exporter/start.sh文件

[xinjiang@dwabamg02 kadb_exporter]$ vi start.sh

#!/bin/bash

source ~/.bashrc

if [ ! -d logs/prometheus ]; then

    mkdir -p logs/prometheus

fi

# Start PrometheusExporter log.

# nohup python PrometheusExporter.py > /dev/null 2>&1 &

# Start KADB monitor.

nohup java -Xmx2048m -Xms2048m -cp .:./lib/*:./conf/* cn.com.kingbase.kmonitor.kadb.KMonitor &

#内存限制为一个kadb_exporter2.5g的话,添加-Xmx2048m -Xms2048m限制

修改完毕source ~/.bashrc刷新

  1. H2

1.4.1进入h2db并修改启动文件(可选)

如果限制h2db进程使用的内存,执行下面的操作,加入-Xmx2048m -Xms2048m,否则无需修改

使用xinjiang用户编辑/home/xinjiang/centos7_amd64/h2db/start.sh文件

[xinjiang@dwabamg02 h2db]$ pwd

/home/xinjiang/centos7_amd64/h2db

[xinjiang@mpp170 h2db]$ vi start.sh

#!/bin/sh

dir=$(dirname "$0")

nohup java -cp "$dir/kadb_h2.jar:$H2DRIVERS:$CLASSPATH" org.h2.tools.Server -ifNotExists -tcpAllowOthers -webAllowOthers -webPort 10001 -tcpPort 10002 "$@" &

#10002为h2库tcp连接端口

[xinjiang@dwabamg02 h2db]$ sh start.sh

[xinjiang@dwabamg02 h2db]$ cat nohup.out

TCP server running at tcp://192.168.0.30:10002 (others can connect)

PG server running at pg://192.168.0.30:5435 (only local connections)

Web Console server running at http://192.168.0.30:10001 (others can connect)

#http://192.168.0.30:10001为进入h2库的网址

1.4.2 打开h2库网址并连接

#账号/密码:root/123456

1.4.3 启动kadb_exporter

[xinjiang@dwabamg02 kadb_exporter]$ pwd

/home/xinjiang/centos7_amd64/kadb_exporter

[xinjiang@mpp170 kadb_exporter]$ sh start.sh

  1. node_exporter

分别在各个节点下xinjiang用户启动node_exporter

[xinjiang@dwabamg02 centos7_amd64]$ pwd

/home/xinjiang/centos7_amd64

******将node_exporter执行程序拷贝到集群每个节点的/home/xinjiang/目录下*********

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabamg01:/home/xinjiang/

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabasg01:/home/xinjiang/

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabasg02:/home/xinjiang/

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabasg03:/home/xinjiang/

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabasg04:/home/xinjiang/

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabasg05:/home/xinjiang/

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabasg06:/home/xinjiang/

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabasg07:/home/xinjiang/

[xinjiang@dwabamg02 centos7_amd64]$ scp node_exporter/ xinjiang@ dwabasg08:/home/xinjiang/

************分别登录到集群的每个节点,启动node_exporter******************

******** dwabamg01启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabamg01

[xinjiang@dwabamg01 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabamg01 node_exporter]$ sh start.sh

[xinjiang@dwabamg01 node_exporter]$ exit

******** dwabasg01启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabasg01

[xinjiang@dwabasg01 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabasg01 node_exporter]$ sh start.sh

[xinjiang@dwabasg01 node_exporter]$ exit

******** dwabasg02启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabasg02

[xinjiang@dwabasg02 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabasg02 node_exporter]$ sh start.sh

[xinjiang@dwabasg02 node_exporter]$ exit

******** dwabasg03启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabasg03

[xinjiang@dwabasg03 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabasg03 node_exporter]$ sh start.sh

[xinjiang@dwabasg03 node_exporter]$ exit

******** dwabasg04启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabasg04

[xinjiang@dwabasg04 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabasg04 node_exporter]$ sh start.sh

[xinjiang@dwabasg04 node_exporter]$ exit

******** dwabasg05启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabasg05

[xinjiang@dwabasg05 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabasg05 node_exporter]$ sh start.sh

[xinjiang@dwabasg05 node_exporter]$ exit

******** dwabasg06启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabasg06

[xinjiang@dwabasg06 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabasg06 node_exporter]$ sh start.sh

[xinjiang@dwabasg06 node_exporter]$ exit

******** dwabasg07启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabasg07

[xinjiang@dwabasg07 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabasg07 node_exporter]$ sh start.sh

[xinjiang@dwabasg07 node_exporter]$ exit

******** dwabasg08启动node_exporter**************

[xinjiang@dwabamg02 node_exporter]$ ssh dwabasg08

[xinjiang@dwabasg08 centos7_amd64]$ cd /home/xinjiang/node_exporter/

[xinjiang@dwabasg08 node_exporter]$ sh start.sh

[xinjiang@dwabasg08 node_exporter]$ exit

  1. Prometheus

1.6.1 node_conf

使用用户xinjiang新建并进入node_conf目录,创建node_kadb_info.json文件

[xinjiang@dwabamg02 prometheus]$ pwd

/home/xinjiang/centos7_amd64/prometheus

[xinjiang@dwabamg02 prometheus]$ mkdir -p node_conf

[xinjiang@dwabamg02 prometheus]$ cd node_conf/

[xinjiang@dwabamg02 node_conf]$ vim node_kadb_info.json

[

    {

       "labels": {

            "desc": "集群standby节点",

            "project": "新疆农信",

            "system": "xinjiang",

            "instance": "10.1.35.209",

            "hostname": "dwabamg02",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.209:10003"

        ]

    },

    {

        "labels": {

            "desc": "集群master节点",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.208",

            "hostname": " dwabamg01",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.208:10003"

        ]

    },

    {

       "labels": {

            "desc": "集群计算节点1",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.211",

            "hostname": "dwabasg01",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.211:10003"

        ]

},

{

       "labels": {

            "desc": "集群计算节点2",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.212",

            "hostname": "dwabasg02",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.212:10003"

        ]

},

{

       "labels": {

            "desc": "集群计算节点3",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.212",

            "hostname": "dwabasg03",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.213:10003"

        ]

  },

{

       "labels": {

            "desc": "集群计算节点4",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.214",

            "hostname": "dwabasg04",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.214:10003"

        ]

  },

      {

       "labels": {

            "desc": "集群计算节点5",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.215",

            "hostname": "dwabasg05",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.215:10003"

        ]

 },

{

       "labels": {

            "desc": "集群计算节点6",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.216",

            "hostname": "dwabasg06",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.216:10003"

        ]

  },

{

       "labels": {

            "desc": "集群计算节点7",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.217",

            "hostname": "dwabasg07",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.217:10003"

        ]

  },

{

       "labels": {

            "desc": "集群计算节点8",

            "project": "新疆农信",

            "system": " xinjiang ",

            "instance": "10.1.35.218",

            "hostname": "dwabasg08",

            "cluster": "kadb_cluster",

            "service": "node_exporter"

        },

        "targets": [

            "10.1.35.218:10003"

        ]

    }

]

1.6.2 修改prometheus.yml配置文件并启动

[xinjiang@dwabamg02 prometheus]$ pwd

/home/xinjiang/centos7_amd64/prometheus

[xinjiang@dwabamg02 prometheus]$ vi prometheus.yml

global:

  scrape_interval: 10s

  evaluation_interval: 10s

scrape_configs:

  - job_name: 'consul'

    static_configs:

      - targets: ['10.1.35.209:10000']

        labels:

          cluster: 'kadb_cluster'

          service: 'kadb_exporter'

          kadburl: 'http://192.168.0.30:10000'

      - targets: ['10.1.35.209:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http://10.1.35.209:10003'

      - targets: ['10.1.35.208:10003']             

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http:// 10.1.35.208:10003'

      - targets: ['10.1.35.211:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http:// 10.1.35.211:10003'

      - targets: ['10.1.35.212:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http://10.1.35.212:10003'

      - targets: ['10.1.35.213:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http://10.1.35.213:10003'

      - targets: ['10.1.35.214:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http://10.1.35.214:10003'

      - targets: ['10.1.35.215:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http://10.1.35.215:10003'

      - targets: ['10.1.35.216:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http:/10.1.35.216:10003'

      - targets: ['10.1.35.217:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http://10.1.35.217:10003'

      - targets: ['10.1.35.218:10003']           

        labels:

          cluster: 'kadb_cluster'

          service: 'node_exporter'

          kadburl: 'http://10.1.35.218:10003'

    file_sd_configs:

      - files:

        - /home/xinjiang/centos7_amd64/prometheus/node_conf/node_kadb_info.json

[xinjiang@mpp170 prometheus]$ sh start.sh            #启动prometheus

1.6.3 打开prometheus网址并查看探针状态

图例显示所以的探针处于“UP”状态为正常

  1. Grafana

1.7.1 插件所在目录(可选)

无新增监控面板,可不做修改

[xinjiang@dwabamg02 plugins]$ pwd

/home/xinjiang/centos7_amd64/kadb_monitor/plugins

[xinjiang@dwabamg02 kadb_monitor]$ cd plugins/

[xinjiang@mpp170 plugins]$ ll

总用量 20

drwxrwxr-x. 3 xinjiang xinjiang   70 7月   5 16:01 AlertManagerPanel

drwxrwxr-x. 3 xinjiang xinjiang   38 10月 20 16:09 ClusterVersionStatusPanel

drwxrwxr-x. 3 xinjiang xinjiang 4096 7月   5 16:01 data-table-plugin

drwxrwxr-x. 3 xinjiang xinjiang   70 7月   5 16:01 InstanceStatusPanel

drwxrwxr-x. 3 xinjiang xinjiang   98 10月  9 09:44 kadb_AlertNowList_plugin

drwxrwxr-x. 3 xinjiang xinjiang   98 9月  24 11:35 kadb_ClusterInfoTable_plugin

drwxrwxr-x. 3 xinjiang xinjiang 4096 7月   5 16:01 kadb_piechart_panel

drwxrwxr-x. 3 xinjiang xinjiang 4096 10月  9 11:15 kadb_TopologyTable_plugin

drwxrwxr-x. 3 xinjiang xinjiang 4096 7月   5 16:01 selected-table-plugin

drwxrwxr-x. 3 xinjiang xinjiang   70 7月   5 16:01 SessionListPanel

drwxrwxr-x. 3 xinjiang xinjiang 4096 7月   5 16:01 topology-plugin

#集群版本和拓扑信息显示,插件为:kadb_AlertNowList_plugin、ClusterVersionStatusPanel、kadb_TopologyTable_plugin

1.7.2 启动grafana面板

[xinjiang@dwabamg02 kadb_monitor]$ pwd

/home/xinjiang/centos7_amd64/kadb_monitor

[xinjiang@mpp170 kadb_monitor]$ sh start.sh

1.7.3 打开grafana面板并查看状态

浏览器地址:http://10.1.35.208:3000

建议使用谷歌浏览器,如果打开grafana是可以的,则不需要修改参数,参数修改视情况而定

如果集群拓扑信息不能正常显示,则单击图中“下箭头”,选择“编辑”

在面板右边“可视化”菜单中,将IP地址修改为:10.1.35.209。

注意:这里的ip地址为kadb_export的地址和端口(10000),如果是内网地址,需要将kadb_export的地址映射为浏览器的外网地址后,进行修改。B端需要直接和kadb_export进行通讯

“最新警报”和“集群版本”两个监控面板也做同样的修改

如果节点资源信息信息不能正常显示,表现为监控界面打开缓慢,有如下报错,并且kadb_monitor.log日志文件有报错,不能连接192.168.0.30:10004,如图

则是kmonitor没有正确设置prometheus的地址信息,需要在上面左侧图中选择

齿轮,配置实际的prometheus地址

能curl到kadb_export:10000地址的下面信息,说明浏览器和kadb_export通讯正常

监控面板编辑页面:

节点监控页面:

主机监控页面:

2.1.h2

h2一定要起来,h2库网址连接正常

2.2.node_exporter

每个监控的ip下要有一个node_exporter目录并启动

2.3.kadb_exporter

一个集群要有一个kadb_exporter,在主节点上,并且一个kadb_exporter至少占实际内存2.5G

2.4.KADB

Kadb数据库要正常启动

2.6. 修改默认参数

选择这里的设置

选择这里的变量

修改:request_url为prometheus节点地址

2.7 频繁扫描日志造成磁盘I/O繁忙

修改kadb_export的配置文件:/home/mppadmin/centos7_amd64/kadb_exporter/conf/schedules.xml

将以下部分删除

日志抓取调度:

和磁盘数据分布相关的调度:

​​​​​​​​​​​​​​


网站公告

今日签到

点亮在社区的每一天
去签到