DataX实战教程

发布于:2025-04-04 ⋅ 阅读:(64) ⋅ 点赞:(0)

需求:

用datax同步mysql: 192.168.236.134中test1库的user表到192.168.236.136中test1库的user表

步骤:

下载安装包

https://github.com/alibaba/DataX/blob/master/userGuid.md
在这里插入图片描述

进入引导页

https://github.com/alibaba/DataX/blob/master/userGuid.md
在这里插入图片描述

前置准备

在这里插入图片描述

安装jdk1.8,并配置环境变量
安装maven

yum install maven -y

安装python
我这边直接用宝塔安装了
在这里插入图片描述

准备两个mysql

在这里插入图片描述

准备测试数据

CREATE TABLE IF NOT EXISTS `user` (
  `id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  `username` VARCHAR(50) NOT NULL,
  `password` VARCHAR(100) NOT NULL,
  `email` VARCHAR(100) NOT NULL,
  `created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

批量生成测试数据的存储过程

DELIMITER $$

CREATE PROCEDURE GenerateUserTestData(IN num_records INT)
BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE v_username VARCHAR(50);
  DECLARE v_password VARCHAR(100);
  DECLARE v_email VARCHAR(100);

  WHILE i <= num_records DO
    -- 生成随机用户名(如 User_12345)
    SET v_username = CONCAT('User_', FLOOR(10000 + RAND() * 90000));
    
    -- 生成随机密码(简化示例,实际应加密)
    SET v_password = CONCAT('Pass_', FLOOR(100000 + RAND() * 900000));
    
    -- 生成随机邮箱(如 user12345@example.com)
    SET v_email = CONCAT('user', FLOOR(10000 + RAND() * 90000), '@example.com');
    
    INSERT INTO `user` (username, password, email)
    VALUES (v_username, v_password, v_email);
    
    -- 每1000条提交一次,避免事务过大
    IF i % 1000 = 0 THEN
      COMMIT;
    END IF;
    
    SET i = i + 1;
  END WHILE;
END$$

DELIMITER ;
-- 生成100条测试数据
CALL GenerateUserTestData(100);

准备datax同步脚本

可使用DeepSeek或者qwen.ai 生成
提示词如下:

datax已经安装好了。现在我需要使用datax将如下配置的mysql
ip: 192.168.236.134
用户名:test1
密码:E7WEC4ZJnHGhHBMH
中的user表数据同步到 如下mysql中。
ip: 192.168.236.136
用户名:test1
密码:G3edhwhCSAYa82EK

得到json脚本

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "test1",
            "password": "E7WEC4ZJnHGhHBMH",
            "connection": [
              {
                "jdbcUrl": ["jdbc:mysql://192.168.236.134:3306/数据库名"],
                "table": ["user"]
              }
            ],
            "column": ["*"],
            "splitPk": ""
          }
        }
      },
      {
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "test1",
            "password": "G3edhwhCSAYa82EK",
            "connection": [
              {
                "jdbcUrl": "jdbc:mysql://192.168.236.136:3306/数据库名",
                "table": "user"
              }
            ],
            "column": ["*"],
            "writeMode": "insert"
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": "1"
      }
    }
  }
}

执行报错

在这里插入图片描述

喂给千问
在这里插入图片描述
检测安装是否正常
验证步骤

ls $DATAX_HOME/plugin/reader/mysqlreader

应输出类似:libs、mysqlreader-0.0.1-SNAPSHOT.jar等文件
生成配置文件模板 (测试用):

python datax.py -r mysqlreader -w mysqlwriter > test.json

编辑test.json后执行,验证是否为配置问题
若问题仍未解决,可参考
中的错误分析方法,检查日志或通过调试工具定位具体插件加载失败的原因。

根据生成模板重新配置

json格式不正确
可以如下网页验证
https://jsonlint.com/?spm=a2ty_o01.29997173.0.0.1f39c9213lfeok
在这里插入图片描述

修改脚本

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "test1",
            "password": "E7WEC4ZJnHGhHBMH",
            "column": ["*"],
            "connection": [
              {
                "jdbcUrl": ["jdbc:mysql://192.168.236.134:3306/test1?useSSL=false&serverTimezone=UTC"],
                "table": ["user1"]
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "test1",
            "password": "G3edhwhCSAYa82EK",
            "connection": [
              {
                "jdbcUrl": "jdbc:mysql://192.168.236.136:3306/test1?useSSL=false&serverTimezone=UTC",
                "table": ["user1"]
              }
            ],
            "obWriteMode": "insert",
            "column": ["*"],
            "preSql": [],
            "postSql": []
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": "5"
      }
    }
  }
}

执行,显示如下表示成功

[root@localhost bin]# python datax.py  test9.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2025-04-03 19:57:24.786 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2025-04-03 19:57:24.788 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2025-04-03 19:57:24.812 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2025-04-03 19:57:24.816 [main] INFO  Engine - the machine info  =>

        osInfo: Linux amd64 3.10.0-1160.el7.x86_64
        jvmInfo:        Red Hat, Inc. 1.8 25.412-b08
        cpu num:        8

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size
        PS Eden Space                  | 256.00MB                       | 256.00MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        PS Survivor Space              | 42.50MB                        | 42.50MB
        PS Old Gen                     | 683.00MB                       | 683.00MB
        Metaspace                      | -0.00MB                        | 0.00MB


2025-04-03 19:57:24.825 [main] INFO  Engine -
{
        "content":[
                {
                        "reader":{
                                "name":"mysqlreader",
                                "parameter":{
                                        "username":"test1",
                                        "password":"****************",
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:mysql://192.168.236.134:3306/test1?useSSL=false&serverTimezone=UTC"
                                                        ],
                                                        "table":[
                                                                "user1"
                                                        ]
                                                }
                                        ]
                                }
                        },
                        "writer":{
                                "name":"mysqlwriter",
                                "parameter":{
                                        "username":"test1",
                                        "password":"****************",
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:mysql://192.168.236.136:3306/test1?useSSL=false&serverTimezone=UTC",
                                                        "table":[
                                                                "user1"
                                                        ]
                                                }
                                        ],
                                        "obWriteMode":"insert",
                                        "column":[
                                                "*"
                                        ],
                                        "preSql":[

                                        ],
                                        "postSql":[

                                        ]
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":"5"
                }
        }
}

2025-04-03 19:57:24.839 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2025-04-03 19:57:24.839 [main] INFO  JobContainer - DataX jobContainer starts job.
2025-04-03 19:57:24.840 [main] INFO  JobContainer - Set jobId = 0
2025-04-03 19:57:25.079 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.236.134:3306/test1?useSSL=false&serverTimezone=UTC&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2025-04-03 19:57:25.080 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2025-04-03 19:57:25.275 [job-0] INFO  OriginalConfPretreatmentUtil - table:[user1] all columns:[
id,user_name,pass_word,email,created_at
].
2025-04-03 19:57:25.275 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2025-04-03 19:57:25.276 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (id,user_name,pass_word,email,created_at) VALUES(?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://192.168.236.136:3306/test1?useSSL=false&serverTimezone=UTC&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
2025-04-03 19:57:25.277 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2025-04-03 19:57:25.277 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2025-04-03 19:57:25.277 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2025-04-03 19:57:25.277 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2025-04-03 19:57:25.278 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2025-04-03 19:57:25.280 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2025-04-03 19:57:25.281 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2025-04-03 19:57:25.301 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2025-04-03 19:57:25.303 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2025-04-03 19:57:25.304 [job-0] INFO  JobContainer - Running by standalone Mode.
2025-04-03 19:57:25.320 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2025-04-03 19:57:25.323 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2025-04-03 19:57:25.323 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2025-04-03 19:57:25.346 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2025-04-03 19:57:25.445 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from user1
] jdbcUrl:[jdbc:mysql://192.168.236.134:3306/test1?useSSL=false&serverTimezone=UTC&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2025-04-03 19:57:25.456 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from user1
] jdbcUrl:[jdbc:mysql://192.168.236.134:3306/test1?useSSL=false&serverTimezone=UTC&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2025-04-03 19:57:25.749 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[421]ms
2025-04-03 19:57:25.750 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2025-04-03 19:57:35.445 [job-0] INFO  StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-04-03 19:57:35.445 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2025-04-03 19:57:35.445 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2025-04-03 19:57:35.445 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2025-04-03 19:57:35.445 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2025-04-03 19:57:35.446 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /usr/local/datax/hook
2025-04-03 19:57:35.447 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu                   
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 1                  | 1                  | 1                  | 0.042s             | 0.042s             | 0.042s
                 PS Scavenge          | 1                  | 1                  | 1                  | 0.040s             | 0.040s             | 0.040s

2025-04-03 19:57:35.447 [job-0] INFO  JobContainer - PerfTrace not enable!
2025-04-03 19:57:35.448 [job-0] INFO  StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-04-03 19:57:35.449 [job-0] INFO  JobContainer -
任务启动时刻                    : 2025-04-03 20:04:41
任务结束时刻                    : 2025-04-03 20:04:51
任务总计耗时                    :                 10s
任务平均流量                    :                2B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                 100
读写失败总数                    :                   0



成功!


网站公告

今日签到

点亮在社区的每一天
去签到