前言:在当今数字化时代,企业级应用的复杂性呈指数级增长,系统之间的通信和数据交互成为业务流程的核心。无论是金融交易的实时处理、电商平台的订单流转,还是物联网设备的数据采集,高效、可靠的消息传递机制都是确保系统稳定运行的关键。然而,传统的同步通信方式往往面临性能瓶颈、系统耦合度过高以及扩展性不足等问题。在这种背景下,消息中间件应运而生,而ActiveMQ作为其中的佼佼者,凭借其卓越的性能、丰富的功能和广泛的应用场景,成为了企业级应用集成的首选解决方案之一。
ActiveMQ是一个开源的、高性能的消息中间件,它基于Java开发,遵循JMS(Java Message Service)规范,同时支持多种消息协议,能够与不同语言和平台无缝集成。它不仅提供了强大的消息传递功能,还通过灵活的部署方式、高可用性设计以及丰富的社区支持,满足了企业级应用对可靠性和扩展性的严格要求。无论是初学者还是资深开发者,ActiveMQ都能提供从入门到进阶的全方位支持,帮助用户快速构建高效、稳定的消息通信架构。
在接下来的篇章中,我们将深入探索ActiveMQ的核心特性、部署方式、应用场景以及最佳实践,帮助您全面了解这一强大的消息中间件。无论您是希望优化现有系统架构,还是正在寻找新的技术解决方案,ActiveMQ都值得您深入了解和尝试。让我们一起走进ActiveMQ的世界,开启高效消息通信之旅。
以下是基于当前最新稳定版的 ActiveMQ(5.18.3 版本)的单机部署和集群部署步骤,以及 JDK 的具体安装步骤:
一、JDK 安装步骤
1. 下载适合您系统的 JDK 安装包,可从 Oracle 官方网站获取。
2. 上传 JDK 安装包到服务器指定目录,如 /mpjava
。
3. 使用以下命令安装 JDK:
cd /mpjava
rpm -ivh jdk-<version>-linux-x64.rpm
或使用 yum
安装:
yum install java-11-openjdk
4. 验证 JDK 是否安装成功:
java -version
二、ActiveMQ 5.18.3 单机部署
1. 安装 JDK,参看上述步骤。
2. 将 ActiveMQ 压缩包(apache-activemq-5.18.3-bin.tar.gz)上传到服务器目录(例如:/mpjava)。
3. 解压压缩包:
cd /mpjava
tar xvf apache-activemq-5.18.3-bin.tar.gz
4. 启动 ActiveMQ:
cd /mpjava/apache-activemq-5.18.3
bin/activemq start
5. 检测 ActiveMQ 是否安装成功:
- 检测 ActiveMQ 端口 61616 是否监听:
netstat -an | grep 61616
- 在能访问 ActiveMQ 服务器的机器上浏览器输入
http://<ActiveMQ服务器IP>:8161/admin/
验证管理服务是否启动。
6. 配置开机启动:
vim /etc/rc.d/rc.local
添加执行语句:
/mpjava/apache-activemq-5.18.3/bin/activemq start
保存后退出,并授权开机启动文件:
chmod +x /etc/rc.d/rc.local
三、ActiveMQ 5.18.3 集群部署
1. 集群规划:
- 作用:ZooKeeper 节点,地址:172.26.223.71、172.26.223.72、172.26.223.73,ZooKeeper 端口:2181。
- 作用:ActiveMQ 节点,地址:172.26.223.71、172.26.223.72、172.26.223.73,各节点端口规划:
- openwire 端口:61616、61626
- amqp 端口:5672、5682
- stomp 端口:61613、61623
- mqtt 端口:1883、1884
- ws 端口:61614、61624
- admin 端口:8161、8162
- 集群结构:mq11、mq12、mq13 组成高可用集群 cluster1,mq21、mq22、mq23 组成高可用集群 cluster2;cluster1、cluster2 进行负载组成集群 mqcluster。
- 三台服务器上传 apache-activemq-5.18.3-bin.tar.gz,并解压到两个目录:
- /mpjava/activemq-cluster1/
- /mpjava/activemq-cluster2/
2. 部署 ZooKeeper 集群(参照我的相关部署博客文档)。
3. 高可用集群配置:
- 修改 conf/activemq.xml:
<broker xmlns="http://activemq.apache.org/schema/core" brokerName="cluster1" dataDirectory="${activemq.data}"> <persistenceAdapter> <replicatedLevelDB directory="${activemq.data}/leveldb" replicas="3" bind="tcp://0.0.0.0:0" zkAddress="172.26.223.71:2181,172.26.223.72:2181,172.26.223.73:2181" hostname="172.26.223.71" sync="local_disk" zkPath="/activemq/leveldb-stores/cluster1"/> </persistenceAdapter> <transportConnectors> <transportConnector name="openwire" uri="tcp://0.0.0.0:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/> <transportConnector name="amqp" uri="amqp://0.0.0.0:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/> <transportConnector name="stomp" uri="stomp://0.0.0.0:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/> <transportConnector name="mqtt" uri="mqtt://0.0.0.0:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/> <transportConnector name="ws" uri="ws://0.0.0.0:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600"/> </transportConnectors> </broker>
- 修改 conf/jetty.xml:
<bean id="jettyPort" class="org.apache.activemq.web.WebConsolePort" init-method="start"> <property name="host" value="0.0.0.0"/> <property name="port" value="8161"/> </bean>
4. 负载集群:
- cluster1 集群的每个节点的 activemq.xml 中添加配置(在 persistenceAdapter 标签前):
<networkConnectors> <networkConnector uri="static:(tcp://172.26.223.71:61626,tcp://172.26.223.72:61626,tcp://172.26.223.73:61626)" duplex="true"/> </networkConnectors>
- cluster2 集群的每个节点的 activemq.xml 中添加配置(在 persistenceAdapter 标签前):
<networkConnectors> <networkConnector uri="static:(tcp://172.26.223.71:61616,tcp://172.26.223.72:61616,tcp://172.26.223.73:61616)" duplex="true"/> </networkConnectors>
5. 启动 ActiveMQ:
- 在每个 ActiveMQ 的 bin 目录下执行:
./activemq start
6. 配置开机启动:
vim /etc/rc.d/rc.local
添加执行语句:
/mpjava/activemq-cluster1/bin/activemq start
/mpjava/activemq-cluster2/bin/activemq start
保存后退出,并授权开机启动文件:
chmod +x /etc/rc.d/rc.local
运维增强与故障预防
1. 程序中配置:
- 将 ly-mp-other.properties 中 mp.component.amqUrl 配置为:
mp.component.amqType=2 mp.component.amqUrl = failover:(tcp://172.26.223.71:61616,tcp://172.26.223.71:61626,tcp://172.26.223.72:61616,tcp://172.26.223.72:61626,tcp://172.26.223.73:61616,tcp://172.26.223.73:61626)
2. ActiveMQ守护与监控脚本
功能简述:
本脚本用于监控ActiveMQ集群实例的运行状态,并在实例异常退出时自动重启。同时,脚本会检查ActiveMQ的日志文件,检测是否出现leveldb数据损坏的错误日志(如Could not load message seq或No reader available for position),并在检测到错误时记录详细的告警信息到activemq-cluster-error-alert.log文件中。此脚本适用于生产环境中ActiveMQ集群的高可用性监控和故障恢复。
核心功能矩阵:
功能模块 | 实现方式 | 关键指标 |
---|---|---|
进程存活监控 | 每60秒检测activemq-cluster1 和activemq-cluster2 进程状态 |
检测精度:100% |
异常自动恢复 | 进程消失时自动执行: 1. 数据目录备份(带时间戳) 2. 服务重启 |
恢复时间:<30秒 |
LevelDB健康检测 | 实时扫描日志中的关键错误: - Could not load message seq - No reader available |
错误检出率:95% |
智能告警系统 | 结构化日志输出到activemq-cluster-error-alert.log |
告警延迟:<60秒 |
自保护机制 | 通过nohup 实现后台运行,开机自启动配置 |
运行稳定性:7×24小时 |
使用方法:
① 手动启动:su - root -c '/mpjava/amqwatch.sh &'
② 开机自启动:
将以下命令加入/etc/rc.local:su - root -c '/mpjava/amqwatch.sh &'
③ 或者创建systemd服务文件(推荐):
cat <<EOF > /etc/systemd/system/amqwatch.service
[Unit]
Description=ActiveMQ Watcher Service
After=network.target
[Service]
ExecStart=/mpjava/amqwatch.sh
Restart=always
User=root
[Install]
WantedBy=multi-user.target
EOF
systemctl enable amqwatch.service
systemctl start amqwatch.service
示例日志输出:
当检测到ActiveMQ实例异常或日志错误时,脚本会记录以下格式的告警信息到activemq-cluster-error-alert.log:
start-----------------------------------------------------
日志编号:监控脚本-2001
时间:2025-02-17 14:30:00
主机名称:amq-cluster-01(192.168.1.101)
步骤:0
类:ActiveMQ-61616
功能描述:ActiveMQ数据文件leveldb同步异常,文件损坏!
用户名:SYS
日志:ActiveMQ数据文件leveldb同步异常,文件损坏!异常日志内容:Could not load message seq 和 No reader available for position
解决方案:建议在非生产时间,结束所有ActiveMQ进程。
级别:ERROR
其他参数:
end----------------------------------------------------
注意事项:
**① 权限问题:**脚本需要以root用户运行,以确保能够正常操作ActiveMQ实例和日志文件。
**② 备份数据目录:**在重启ActiveMQ实例时,脚本会自动备份data目录,以防止数据损坏导致的问题。
**③ 日志文件路径:**确保activemq-cluster-error-alert.log文件路径存在,否则需要手动创建。
**④ 监控频率:**脚本以60秒为周期运行,可根据实际需求调整监控频率。
通过此脚本,可以有效提升ActiveMQ集群的高可用性和稳定性,减少因实例异常或数据损坏导致的业务中断风险。
脚本内容:
#!/bin/bash
#add for chkconfig
#chkconfig: 2345 70 30
#description:AmqWatch shell #关于脚本的简短描述
#processname:AmqWatch #第一个进程名,后边设置自启动的时候会用到
#开机启动/etc/rc.local加入:su - root -c '/mpjava/amqwatch.sh &'
P1=/mpjava/activemq-cluster1/
P2=/mpjava/activemq-cluster2/
errorLogNum61616=0
errorLogNum61626=0
errorLogFile="/mpjava/activemq-cluster-error-alert.log"
local_host=`hostname`
local_ip=`/sbin/ifconfig -a|grep inet|grep -v 127.0.0.1|grep -v 192.168.122.1|grep -v inet6|awk '{print $2}'|tr -d "addr:"`
while true; do
#获取运行程序的路径
pidpath=$(ps x | grep activemq | grep -v grep | awk '{print $9}')
echo $pidpath
datetime=`date +%Y%m%d_%H%M%S_%N |cut -b1-20`
currTime=$(date +"%Y-%m-%d %T")
date=$(date +%Y%m%d)
if [[ $pidpath =~ $P1 ]]
then
echo "$P1 已经存在"
else
echo "start activemq. $P1 bin/activemq"
mv ${P1}data ${P1}data.bak$datetime
nohup ${P1}bin/activemq start >/dev/null 2>&1 &
fi
#检查61616日志
#tmpLogNum1=`cat /mpjava/activemq-cluster1/data/activemq.log |grep -E "Could not load message seq|No reader available for position" |wc -l`
tmpLogNum1=`grep -E "Could not load message seq|No reader available for position" ${P1}/data/activemq.log |wc -l`
#tmpLogNum1=$?
echo "tmpLogNum1:${tmpLogNum1}"
if [[ $tmpLogNum1 -gt $errorLogNum61616 ]]
then
echo "start-----------------------------------------------------" | tee -a $errorLogFile
echo " 日志编号:监控脚本-2001" | tee -a $errorLogFile
echo " 时间:$currTime" | tee -a $errorLogFile
echo " 主机名称:$local_host($local_ip)" | tee -a $errorLogFile
echo " 步骤:0" | tee -a $errorLogFile
echo " 类:ActiveMQ-61616" | tee -a $errorLogFile
echo " 功能描述:ActiveMQ数据文件levelDB同步异常,文件损坏!" | tee -a $errorLogFile
echo " 用户名:SYS" | tee -a $errorLogFile
echo " 日志:ActiveMQ数据文件levelDB同步异常,文件损坏!异常日志内容:Could not load message seq 和 No reader available for position" | tee -a $errorLogFile
echo " 解决方案:建议在非生产时间,结束3台ActiveMQ进程。" | tee -a $errorLogFile
echo " 级别:ERROR" | tee -a $errorLogFile
echo " 其他参数:" | tee -a $errorLogFile
echo "end----------------------------------------------------" | tee -a $errorLogFile
fi
errorLogNum61616=$tmpLogNum1;
if [[ $pidpath =~ $P2 ]]
then
echo "$P2 已经存在"
else
echo "start activemq. $P2 bin/activemq"
mv ${P2}data ${P2}data.bak$datetime
nohup ${P2}bin/activemq start >/dev/null 2>&1 &
fi
#检查61626日志
#tmpLogNum2=`cat /mpjava/activemq-cluster2/data/activemq.log |grep -E "Could not load message seq|No reader available for position" |wc -l`
tmpLogNum2=`grep -E "Could not load message seq|No reader available for position" ${P2}/data/activemq.log |wc -l`
#tmpLogNum2=$?
echo "tmpLogNum2:${tmpLogNum2}"
if [[ $tmpLogNum2 -gt $errorLogNum61626 ]]
then
echo "start-----------------------------------------------------" | tee -a $errorLogFile
echo " 日志编号:监控脚本-2001" | tee -a $errorLogFile
echo " 时间:$currTime" | tee -a $errorLogFile
echo " 主机名称:$local_host($local_ip)" | tee -a $errorLogFile
echo " 步骤:0" | tee -a $errorLogFile
echo " 类:ActiveMQ-61626(${P2})" | tee -a $errorLogFile
echo " 功能描述:ActiveMQ数据文件levelDB同步异常,文件损坏!" | tee -a $errorLogFile
echo " 用户名:SYS" | tee -a $errorLogFile
echo " 日志:ActiveMQ数据文件levelDB同步异常,文件损坏!异常日志内容:Could not load message seq 和 No reader available for position" | tee -a $errorLogFile
echo " 解决方案:建议在非生产时间,结束3台ActiveMQ进程。" | tee -a $errorLogFile
echo " 级别:ERROR" | tee -a $errorLogFile
echo " 其他参数:" | tee -a $errorLogFile
echo "end----------------------------------------------------" | tee -a $errorLogFile
fi
errorLogNum61626=$tmpLogNum2;
#每次循环沉睡60s
sleep 60
done
3.activemq定时归档脚本进程amqportwatch.sh
此脚本必须在三台ActiveMQ以外的服务器(如监控服务器)部署。实现1)端口全部异常扫描。2)定时对activemq进行归档,即在没有未消费消息情况下,对三台ActiveMQ服务器的6个进程进程关停。
手动启动命令:su - root -c '/mpjava/amqportwatch.sh &'
开机自启动命令,在/etc/rc.local,加入su - root -c '/mpjava/amqportwatch.sh &'
脚本内容:
#!/bin/bash
#add for chkconfig
#chkconfig: 2345 70 30
#description:AmqPortWatch shell #关于脚本的简短描述
#processname:AmqPortWatch #第一个进程名,后边设置自启动的时候会用到
#开机启动/etc/rc.local加入:su - root -c '/mpjava/amqportwatch.sh &'
P1=/mpjava/activemq-cluster1/
P2=/mpjava/activemq-cluster2/
#amq集群服务器
remote_hosts="192.168.1.101 192.168.1.102 192.168.1.103"
#amq服务器用户
remote_host_user=root
#amq服务器密码
remote_host_right_password="CZGC123!!"
#归档开关 1-开,开启定时归档;0-关,不归档;
data_keep_run=1;
#归档周:1-6即周一至六,0是周日
data_keep_week=3;
#归档时:00-23,05为凌晨5点,17点为下午5点
data_keep_hour=15;
#activemq访问网关地址(Nginx-VIP),默认地址,如:http://172.26.152.173:8161/admin/xml/queues.jsp
NginxGatewayServerIP="172.26.152.173"
#activemq访问账号密码
data_keep_xml_user="admin";
data_keep_xml_password="admin";
#归档时间未成功归档,离上次成功归档超过1个月(30天),在归档时间执行强制归档。
data_keep_day_count=30;
#归档日志目录
logPath="/mpjava/amqportwatch-log/"
mkdir -p $logPath
#归档日志文件
logfile=""
#告警日志输出,监控格式标准start---- end----
errorLogFile="${logPath}activemq-cluster-error-alert.log"
#节点MQ归档脚本(已增加远程归档,不是必须部署)
amqwatchPathFile="/mpjava/amqwatch.sh"
#当前时间
currTime=$(date +"%Y-%m-%d %T")
#文件名称
cur_datetime="`date +%Y-%m-%d-%H-%M-%S`";
#归档当前状态 0-未归档;1-归档中;
data_keep_state=0;
#归档状态,控制归档期间重试
data_keep_61616_state=0;
data_keep_61626_state=0;
#上次成功归档时间,三个月进行强制归档
data_keep_61616_last_date=$currTime
data_keep_61626_last_date=$currTime
checkPortReslut=0
checkPortServerIP=$NginxGatewayServerIP
check61616ServerIP=$NginxGatewayServerIP
check61626ServerIP=$NginxGatewayServerIP
local_host=`hostname`
local_ip=`/sbin/ifconfig -a|grep inet|grep -v 127.0.0.1|grep -v 192.168.122.1|grep -v inet6|awk '{print $2}'|tr -d "addr:"`
function checkAMQPort()
{
currTime=$(date +"%Y-%m-%d %T")
AMQPort=$1; AMQPath=$2
echo "${currTime}检查${AMQPort}端口bengin" | tee -a $logfile
checkncatreslut="`rpm -qa |grep ncat`"
echo "checkncatreslut:${checkncatreslut}" | tee -a $logfile
checksshpassreslut="`rpm -qa sshpass`"
echo "checksshpassreslut:${checksshpassreslut}" | tee -a $logfile
if [ -n "$checkncatreslut" ] && [ -n "$checksshpassreslut" ];then
#check begin
checkPortReslut=0
checkPortServerIP=$NginxGatewayServerIP
for itemServer in $remote_hosts
do
echo $itemServer | tee -a $logfile
ping -c2 -i0.3 -W1 $itemServer &>/dev/null
pingResult=$?;
echo "pingResult:${pingResult}" | tee -a $logfile
if [ $pingResult -ne 0 ]; then
echo "${itemServer},无法ping通!" | tee -a $logfile
else
echo "${itemServer},ping OK!" | tee -a $logfile
ncat -w 1 $itemServer $AMQPort </dev/null
ncatResult=$?;
echo "ncatResult:${ncatResult}" | tee -a $logfile
if [ $ncatResult -ne 1 ]; then
echo "ncat -w 1 ${itemServer}:${AMQPort},OK!" | tee -a $logfile
checkPortReslut=1;
checkPortServerIP=$itemServer;
else
echo "ncat -w 1 ${itemServer}:${AMQPort},NG!" | tee -a $logfile
fi
fi
done
echo "checkPortReslut:${checkPortReslut}" | tee -a $logfile
#check end
if [ $checkPortReslut -ne 0 ]; then
echo "${AMQPort}端口正常!" | tee -a $logfile
else
echo "${AMQPort}端口全部异常!" | tee -a $logfile
echo "start-----------------------------------------------------" | tee -a $errorLogFile
echo " 日志编号:监控脚本-2001" | tee -a $errorLogFile
echo " 时间:$currTime" | tee -a $errorLogFile
echo " 主机名称:$local_host($local_ip)" | tee -a $errorLogFile
echo " 步骤:0" | tee -a $errorLogFile
echo " 类:ActiveMQ" | tee -a $errorLogFile
echo " 功能描述:ActiveMQ${AMQPort}端口全部异常" | tee -a $errorLogFile
echo " 用户名:SYS" | tee -a $errorLogFile
echo " 日志:ActiveMQ${AMQPort}端口全部异常" | tee -a $errorLogFile
echo " 解决方案:结束3台ActiveMQ${AMQPort}进程进行归档。" | tee -a $errorLogFile
echo " 级别:ERROR" | tee -a $errorLogFile
echo " 其他参数:" | tee -a $errorLogFile
echo "end----------------------------------------------------" | tee -a $errorLogFile
for itemServer in $remote_hosts
do
ping -c2 -i0.3 -W1 $itemServer &>/dev/null
if [ $? -ne 0 ]; then
echo "${itemServer},无法ping通!" | tee -a $logfile
else
echo "开始结束进程:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
stopresult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "ps -ef |grep "$AMQPath" | grep -v grep| awk '{print \$2}'| xargs kill -9 >/dev/null 2>&1 &")
echo "stopresult:${stopresult}" | tee -a $logfile
amqwatchCheckResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "ps -ef |grep amqwatch.sh |grep -v grep |wc -l;")
echo "amqwatchCheckResult:${amqwatchCheckResult}" | tee -a $logfile
if [ $amqwatchCheckResult -ne 0 ]; then
echo "${itemServer}监控脚amqwatch.sh本正常!" | tee -a $logfile
else
echo "${itemServer}监控脚本amqwatch.sh不正常!" | tee -a $logfile
echo "确定脚本文件amqwatch.sh:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
amqwatchFileResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "[ -e ${amqwatchPathFile} ] && echo 1 || echo 0;")
echo "amqwatchFileResult:${amqwatchFileResult}" | tee -a $logfile
if [ $amqwatchFileResult = 1 ]; then
echo "远程启动amqwatch.sh:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
amqwatchRunResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "su - root -c '${amqwatchPathFile} &' >/dev/null 2>&1 &")
echo "amqwatchRunResult:${amqwatchRunResult}" | tee -a $logfile
else
echo "远程归档mv:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
amqMVDataResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "mv ${AMQPath}data ${AMQPath}data.bak${cur_datetime} &")
echo "amqMVDataResult${amqMVDataResult}" | tee -a $logfile
echo "远程启动./activemq start:${itemServer}:${AMQPort}:${AMQPath}" | tee -a $logfile
amqStartResult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "${AMQPath}bin/./activemq start >/dev/null 2>&1 &")
echo "amqStartResult:${amqStartResult}" | tee -a $logfile
fi
fi
fi
done
fi
else
echo "请安装nact和sshpass!" | tee -a $logfile
fi
echo "检查${AMQPort}端口end" | tee -a $logfile
}
while true; do
#AMQ定期归档begin
cur_date="`date +%Y-%m-%d`";
currTime=$(date +"%Y-%m-%d %T")
cur_datetime="`date +%Y-%m-%d-%H-%M-%S`";
#分
cur_date_M="`date +%M`";
#时
cur_date_H="`date +%H`";
#周
cur_date_W="`date +%w`";
#日志输出 echo "日志输出"| tee -a $logfile
logfile="${logPath}/${cur_date}.log"
#归档开关 1-开,开启定时归档;0-关,不归档;
if [ $data_keep_run = 1 ]; then
#if [ $cur_date_H = "05" ]; then
#凌晨5点执行
if [ $cur_date_W = $data_keep_week ] && [ $cur_date_H = $data_keep_hour ]; then
#每周日凌晨5点执行
echo "${currTime}执行定时归档[周${data_keep_week},时${data_keep_hour}]![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
if [ $data_keep_state = 0 ]; then
#归档状态进行中
data_keep_state=1;
#确认是否有待消费MQ消息begin
echo "开始确认是否有待消费(Number Of Pending Messages)MQ消息![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
cur_curl_datetime="`date +%Y-%m-%d-%H-%M-%S`";
#未消费消息检查结果0-下载队列xml文件错误,1-没有未消费队列或者只有死信队列,可执行归档,2-有未消费队列,不执行归档
check61616PendingMessagesResult=0;
activemq61616PathXml="${logPath}/activemq61616-${cur_curl_datetime}.xml";
activemq61616PathUrl="http://${check61616ServerIP}:8161/admin/xml/queues.jsp"
#curl -u admin:admin -o activemq61616-data.xml "http://172.26.153.110:8161/admin/xml/queues.jsp"
curl -u $data_keep_xml_user:$data_keep_xml_password -o $activemq61616PathXml $activemq61616PathUrl
#确认下载61616队列xml文件格式正常
curl61616Result=`grep -E '<queues>|<queue name=' $activemq61616PathXml |wc -l`
echo "curl61616Result:$curl61616Result" | tee -a $logfile
if [ $curl61616Result = 0 ]; then
echo "下载61616队列xml文件错误${activemq61616PathUrl},无法确认未消费队列[curl61616Result:$curl61616Result],此次定时归档失败!" | tee -a $logfile
echo "start-----------------------------------------------------" | tee -a $errorLogFile
echo " 日志编号:监控脚本-2001" | tee -a $errorLogFile
echo " 时间:$currTime" | tee -a $errorLogFile
echo " 主机名称:$local_host($local_ip)" | tee -a $errorLogFile
echo " 步骤:0" | tee -a $errorLogFile
echo " 类:ActiveMQ" | tee -a $errorLogFile
echo " 功能描述:定时归档[周${data_keep_week},时${data_keep_hour}]" | tee -a $errorLogFile
echo " 用户名:SYS" | tee -a $errorLogFile
echo " 日志:下载61616队列xml文件错误${activemq61616PathUrl},无法确认未消费队列[curl61616Result:$curl61616Result],此次定时归档失败!" | tee -a $errorLogFile
echo " 解决方案:确认ActiveMQ-8161web管理服务正常。" | tee -a $errorLogFile
echo " 级别:ERROR" | tee -a $errorLogFile
echo " 其他参数:activemq61616PathXml:${activemq61616PathXml},data_keep_xml_user:${data_keep_xml_user},data_keep_xml_password:${data_keep_xml_password}]" | tee -a $errorLogFile
echo "end----------------------------------------------------" | tee -a $errorLogFile
else
echo "下载61616队列xml文件正常${activemq61616PathUrl},[curl61616Result:$curl61616Result]" | tee -a $logfile
#统计61616未消费队列
#grep -E --color 'size="[1-9][0-9]*"' activemq61616-data.xml
ActiveMQ61616_Pending_Num=`grep -E --color 'size="[1-9][0-9]*"' $activemq61616PathXml |wc -l`
echo "统计61616未消费队列[ActiveMQ61616_Pending_Num:$ActiveMQ61616_Pending_Num]" | tee -a $logfile
#统计61616死信队列
ActiveMQ61616_DLQ_Num=`grep -C 2 -E --color '<queue name="ActiveMQ.DLQ">' $activemq61616PathXml |grep -E --color 'size="[1-9][0-9]*"' |wc -l`
echo "统计61616死信队列[ActiveMQ61616_DLQ_Num:$ActiveMQ61616_DLQ_Num]" | tee -a $logfile
if [ $ActiveMQ61616_Pending_Num = 0 ] || [ $ActiveMQ61616_Pending_Num = $ActiveMQ61616_DLQ_Num ]; then
#没有未消费队列或者只有死信队列,执行61616归档
check61616PendingMessagesResult=1;
else
#有未消费队列,无法执行61616归档,输出告警
check61616PendingMessagesResult=2;
fi
fi
echo "完成确认61616是否有待消费(Number Of Pending Messages)MQ消息![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}][check61616PendingMessagesResult:$check61616PendingMessagesResult]" | tee -a $logfile
#未消费消息检查结果0-下载队列xml文件错误,1-没有未消费队列或者只有死信队列,可执行归档,2-有未消费队列,不执行归档
check61626PendingMessagesResult=0;
activemq61626PathXml="${logPath}/activemq61626-${cur_curl_datetime}.xml";
activemq61626PathUrl="http://${check61626ServerIP}:8162/admin/xml/queues.jsp"
curl -u $data_keep_xml_user:$data_keep_xml_password -o $activemq61626PathXml $activemq61626PathUrl
#确认下载61626队列xml文件格式正常
curl61626Result=`grep -E '<queues>|<queue name=' $activemq61626PathXml |wc -l`
echo "curl61626Result:$curl61626Result" | tee -a $logfile
if [ $curl61626Result = 0 ]; then
echo "下载61626队列xml文件错误${$activemq61626PathUrl},无法确认未消费队列[curl61626Result:$curl61626Result],此次定时归档失败!" | tee -a $logfile
echo "start-----------------------------------------------------" | tee -a $errorLogFile
echo " 日志编号:监控脚本-2001" | tee -a $errorLogFile
echo " 时间:$currTime" | tee -a $errorLogFile
echo " 主机名称:$local_host($local_ip)" | tee -a $errorLogFile
echo " 步骤:0" | tee -a $errorLogFile
echo " 类:ActiveMQ" | tee -a $errorLogFile
echo " 功能描述:定时归档[周${data_keep_week},时${data_keep_hour}]" | tee -a $errorLogFile
echo " 用户名:SYS" | tee -a $errorLogFile
echo " 日志:下载61626队列xml文件错误${activemq61626PathUrl},无法确认未消费队列[curl61626Result:$curl61626Result],此次定时归档失败!" | tee -a $errorLogFile
echo " 解决方案:确认ActiveMQ-8162web管理服务正常。" | tee -a $errorLogFile
echo " 级别:ERROR" | tee -a $errorLogFile
echo " 其他参数:activemq61626PathXml:${activemq61626PathXml},data_keep_xml_user:${data_keep_xml_user},data_keep_xml_password:${data_keep_xml_password}]" | tee -a $errorLogFile
echo "end----------------------------------------------------" | tee -a $errorLogFile
else
echo "下载61626队列xml文件正常${activemq61626PathUrl},[curl61626Result:$curl61626Result]" | tee -a $logfile
#统计61626未消费队列
#grep -E --color 'size="[1-9][0-9]*"' activemq61626-data.xml
ActiveMQ61626_Pending_Num=`grep -E --color 'size="[1-9][0-9]*"' $activemq61626PathXml |wc -l`
echo "统计61626未消费队列[ActiveMQ61626_Pending_Num:$ActiveMQ61626_Pending_Num]" | tee -a $logfile
#统计61626死信队列
ActiveMQ61626_DLQ_Num=`grep -C 2 -E --color '<queue name="ActiveMQ.DLQ">' $activemq61626PathXml |grep -E --color 'size="[1-9][0-9]*"' |wc -l`
echo "统计61626死信队列[ActiveMQ61626_DLQ_Num:$ActiveMQ61626_DLQ_Num]" | tee -a $logfile
if [ $ActiveMQ61626_Pending_Num = 0 ] || [ $ActiveMQ61626_Pending_Num = $ActiveMQ61626_DLQ_Num ]; then
#没有未消费队列或者只有死信队列,执行61626归档
check61626PendingMessagesResult=1;
else
#有未消费队列,无法执行61626归档,输出告警
check61626PendingMessagesResult=2;
fi
fi
echo "完成确认61626是否有待消费(Number Of Pending Messages)MQ消息![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}][check61626PendingMessagesResult:$check61626PendingMessagesResult]" | tee -a $logfile
#确认是否有待消费MQ消息end
#记录本次执行归档结果
data_keep_61616_state_now=$data_keep_61616_state;
data_keep_61626_state_now=$data_keep_61626_state;
for itemServer in $remote_hosts
do
ping -c2 -i0.3 -W1 $itemServer &>/dev/null
if [ $? -ne 0 ]; then
echo "${itemServer},无法ping通!" | tee -a $logfile
else
echo "开始执行归档和清理begin" | tee -a $logfile
if [ $data_keep_61616_state -ne 1 ]; then
echo "开始结束进程:${itemServer}:${P1}" | tee -a $logfile
lastDateTime=`date -d "${data_keep_61616_last_date}" +%s`;
nowDate=$(date +"%Y-%m-%d %T");
nowDateTime=`date -d "${nowDate}" +%s`;
CountSeconds=$(($nowDateTime-$lastDateTime));
CountDays=0;
if [ $CountSeconds -gt 86400 ]; then
CountDays=`expr $CountSeconds / 86400`;
fi
echo "61616距离上次归档天数:[CountSeconds:${CountSeconds},CountDays:${CountDays}]" | tee -a $logfile
#可执行归档或者离上次成功归档超过1个月(30天)
if [ $check61616PendingMessagesResult = 1 ] || [ $CountDays -ge $data_keep_day_count ]; then
stopresult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "ps -ef |grep "$P1" | grep -v grep| awk '{print \$2}'| xargs kill -9 >/dev/null 2>&1 &")
data_keep_61616_last_date=$(date +"%Y-%m-%d %T")
data_keep_61616_state_now=1;
echo "${data_keep_61616_last_date}完成结束进程!stopresult:${stopresult}" | tee -a $logfile
else
data_keep_61616_state_now=0;
echo "未执行结束进程,8161有待消费MQ消息或者下载数据异常:${itemServer}:${P1}" | tee -a $logfile
fi
else
echo "61616今天归档成功,不需要重复归档![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
fi
if [ $data_keep_61626_state -ne 1 ]; then
echo "开始结束进程:${itemServer}:${P2}" | tee -a $logfile
lastDateTime=`date -d "${data_keep_61626_last_date}" +%s`;
nowDate=$(date +"%Y-%m-%d %T");
nowDateTime=`date -d "${nowDate}" +%s`;
CountSeconds=$(($nowDateTime-$lastDateTime));
CountDays=0;
if [ $CountSeconds -gt 86400 ]; then
CountDays=`expr $CountSeconds / 86400`;
fi
echo "61626距离上次归档天数:[CountSeconds:${CountSeconds},CountDays:${CountDays}]" | tee -a $logfile
#可执行归档或者离上次成功归档超过1个月(30天)
if [ $check61626PendingMessagesResult = 1 ] || [ $CountDays -ge $data_keep_day_count ]; then
stopresult2=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "ps -ef |grep "$P2" | grep -v grep| awk '{print \$2}'| xargs kill -9 >/dev/null 2>&1 &")
data_keep_61626_last_date=$(date +"%Y-%m-%d %T");
data_keep_61626_state_now=1;
echo "${data_keep_61626_last_date}完成结束进程!stopresult2:${stopresult2}" | tee -a $logfile
else
data_keep_61626_state_now=0;
echo "未执行结束进程,8162有待消费MQ消息或者下载数据异常:${itemServer}:${P2}" | tee -a $logfile
fi
else
echo "61626今天归档成功,不需要重复归档![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
fi
if [ $data_keep_61616_state -ne 1 ]; then
echo "开始清理:${itemServer}:${P1}" | tee -a $logfile
#find /mpjava/activemq-cluster1/ -mtime +7 -type d -name "data.bak*" -exec rm -rf {} \;
rmresult=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "find "$P1" -mtime +7 -type d -name 'data.bak*' -exec rm -rf {} \;")
echo "rmresult:${rmresult}" | tee -a $logfile
else
echo "61616今天清理成功,不需要再清理![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
fi
if [ $data_keep_61626_state -ne 1 ]; then
echo "开始清理:${itemServer}:${P2}" | tee -a $logfile
#find /mpjava/activemq-cluster2/ -mtime +7 -type d -name "data.bak*" -exec rm -rf {} \;
rmresult2=$(sshpass -p "${remote_host_right_password}" ssh -o StrictHostKeyChecking=no $remote_host_user@$itemServer "find "$P2" -mtime +7 -type d -name 'data.bak*' -exec rm -rf {} \;")
echo "rmresult2:${rmresult2}" | tee -a $logfile
else
echo "61626今天清理成功,不需要再清理![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
fi
echo "完成执行归档和清理end" | tee -a $logfile
fi
done
#更新本次执行归档结果
data_keep_61616_state=$data_keep_61616_state_now;
data_keep_61626_state=$data_keep_61626_state_now;
#清理7天前/mpjava/amqportwath/*.log和*.xml文件
echo "清理7天前${logPath}*.log和*.xml文件" | tee -a $logfile
find ${logPath} -mtime +7 -type f -name "*.log" -exec rm -rf {} \;
find ${logPath} -mtime +7 -type f -name "*.xml" -exec rm -rf {} \;
#如果61616或者61626归档失败,10分钟后重试
if [ $data_keep_61616_state -ne 1 ] || [ $data_keep_61626_state -ne 1 ]; then
data_keep_state=0;
#if [ $data_keep_state = 1 ]; then
if [ $data_keep_61616_state -ne 1 ]; then
#61616归档失败
echo "61616归档失败!" | tee -a $logfile
echo "start-----------------------------------------------------" | tee -a $errorLogFile
echo " 日志编号:监控脚本-2001" | tee -a $errorLogFile
echo " 时间:$currTime" | tee -a $errorLogFile
echo " 主机名称:$local_host($local_ip)" | tee -a $errorLogFile
echo " 步骤:0" | tee -a $errorLogFile
echo " 类:ActiveMQ" | tee -a $errorLogFile
echo " 功能描述:定时归档[周${data_keep_week},时${data_keep_hour}]" | tee -a $errorLogFile
echo " 用户名:SYS" | tee -a $errorLogFile
echo " 日志:61616归档失败!原因是有待消费MQ消息或者下载数据异常。" | tee -a $errorLogFile
echo " 解决方案:非生产时间,手动执行归档(结束三台MQ服务器的6个activemq进程)。" | tee -a $errorLogFile
echo " 级别:ERROR" | tee -a $errorLogFile
echo " 其他参数:" | tee -a $errorLogFile
echo "end----------------------------------------------------" | tee -a $errorLogFile
fi
if [ $data_keep_61626_state -ne 1 ]; then
#61626归档失败
echo "61626归档失败!" | tee -a $logfile
echo "start-----------------------------------------------------" | tee -a $errorLogFile
echo " 日志编号:监控脚本-2001" | tee -a $errorLogFile
echo " 时间:$currTime" | tee -a $errorLogFile
echo " 主机名称:$local_host($local_ip)" | tee -a $errorLogFile
echo " 步骤:0" | tee -a $errorLogFile
echo " 类:ActiveMQ" | tee -a $errorLogFile
echo " 功能描述:定时归档[周${data_keep_week},时${data_keep_hour}]" | tee -a $errorLogFile
echo " 用户名:SYS" | tee -a $errorLogFile
echo " 日志:61626归档失败!原因是有待消费MQ消息或者下载数据异常。" | tee -a $errorLogFile
echo " 解决方案:非生产时间,手动执行归档(结束三台MQ服务器的6个activemq进程)。" | tee -a $errorLogFile
echo " 级别:ERROR" | tee -a $errorLogFile
echo " 其他参数:" | tee -a $errorLogFile
echo "end----------------------------------------------------" | tee -a $errorLogFile
fi
#fi
echo "61616或者61626归档失败,10分钟后重试。[data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
echo "sleep:10分钟(600s)" | tee -a $logfile
sleep 600
fi
else
echo "${currTime}今天已完成归档![周${data_keep_week},时${data_keep_hour}][data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
echo "sleep:10分钟(600s)" | tee -a $logfile
sleep 600
fi
else
echo "${currTime}非归档时间![周${data_keep_week},时${data_keep_hour}][data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
#归档状态复原
data_keep_61616_state=0;
data_keep_61626_state=0;
data_keep_state=0;
echo "${currTime}归档状态复原![data_keep_state:${data_keep_state},data_keep_61616_state:${data_keep_61616_state},data_keep_61626_state:${data_keep_61626_state}]" | tee -a $logfile
fi
fi
#AMQ定期归档end
if [ $data_keep_state = 0 ]; then
#AMQ端口61616/61626检查begin
check61616PortReslut=0
check61616ServerIP=""
checkAMQPort 61616 $P1
check61616PortReslut=$checkPortReslut
check61616ServerIP=$checkPortServerIP
echo "check61616PortReslut${check61616PortReslut}" | tee -a $logfile
echo "check61616ServerIP${check61616ServerIP}" | tee -a $logfile
check61626PortReslut=0
check61626ServerIP=""
checkAMQPort 61626 $P2
check61626PortReslut=$checkPortReslut
check61626ServerIP=$checkPortServerIP
echo "check61626PortReslut:${check61626PortReslut}" | tee -a $logfile
echo "check61626ServerIP${check61626ServerIP}" | tee -a $logfile
#AMQ端口61616/61626检查end
if [ $check61616PortReslut -ne 0 ] && [ $check61626PortReslut -ne 0 ]; then
#检查端口正常,每次循环沉睡1分钟(60s)
echo "sleep:1分钟(60s)" | tee -a $logfile
sleep 60
else
#检查端口不正常,结束进程后,20分钟再进行确认
echo "sleep:20分钟(1200s)" | tee -a $logfile
sleep 1200
fi
fi
done