Fate1.5.0:纵向联邦模型训练+模型预测

发布于:2023-01-21 ⋅ 阅读:(508) ⋅ 点赞:(0)

零、环境信息

前情提要:

  1. 此处训练+预测主要想验证Fate1.5.0中need_deploy字段含义
  • 环境版本: Fate 1.5.0

  • 部署模式:使用Docker-compose部署

参考:https://federatedai.github.io/FATE-Flow/1.8.0/zh/fate_flow_client/

一、模型训练+评估

数据上传

1. 上传guest方所需数据
a. 修改upload_guest.json
cat > /data/projects/fate/examples/upload/upload_guest.json <<EOF
{
  "file": "/data/projects/fate/examples/data/default_credit_hetero_host.csv",
  "id_delimiter": ",",
  "head": 1,
  "partition": 10,
  "work_mode": 0,
  "namespace": "experiment",
  "table_name": "credit_hetero_host"
}
EOF
b. 上传数据
  • 命令

    flow data upload -c /data/projects/fate/examples/upload/upload_guest.json --drop
    
2. 上传Host方所需数据
a. 修改upload_host.json
cat > /data/projects/fate/examples/upload/upload_host.json << EOF
{
  "file": "/data/projects/fate/examples/data/breast_hetero_host.csv",
  "id_delimiter": ",",
  "head": 1,
  "partition": 10,
  "work_mode": 0,
  "namespace": "experiment",
  "table_name": "breast_hetero_host"
}
EOF
b. 上传数据
  • 命令

    flow data upload -c /data/projects/fate/examples/upload/upload_host.json --drop 
    

模型训练+预测

1. 准备dsl、conf文件
  • 复制dsl、runtime文件

    # dsl
    examples/dsl/v1/hetero_logistic_regression/test_hetero_lr_train_job_dsl.json
    # conf
    examples/dsl/v1/hetero_logistic_regression/test_hetero_lr_train_job_conf.json
    
  • dsl文件

    cat > /data/projects/train/test_hetero_lr_train_job_dsl.json <<EOF
    {
        "components" : {
            "dataio_0": {
                "module": "DataIO",
                "input": {
                    "data": {
                        "data": [
                            "args.train_data"
                        ]
                    }
                },
                "output": {
                    "data": ["train"],
                    "model": ["dataio"]
                }
             },
             "intersection_0": {
                 "module": "Intersection",
                 "input": {
                     "data": {
                         "data": [
                             "dataio_0.train"
                         ]
                     }
                 },
                 "output": {
                     "data": ["train"]
                 }
             },
            "hetero_lr_0": {
                "module": "HeteroLR",
                "input": {
                    "data": {
                        "train_data": ["intersection_0.train"]
                    }
                },
                "output": {
                    "data": ["train"],
                    "model": ["hetero_lr"]
                }
            },
            "evaluation_0": {
                "module": "Evaluation",
                "input": {
                    "data": {
                        "data": ["hetero_lr_0.train"]
                    }
                }
            }
        }
    }
    EOF
    
  • 修改conf文件中数据库、表名称,修改PartyId【**注意:**修改为我们上传时的库表名称,partyID修改为服务部署时的PartyID】

    cat > /data/projects/train/test_hetero_lr_train_job_conf.json <<EOF
    {
        "initiator": {
            "role": "guest",
            "party_id": 10000
        },
        "job_parameters": {
            "work_mode": 0
        },
        "role": {
            "guest": [
                10000
            ],
            "host": [
                10000
            ],
            "arbiter": [
                10000
            ]
        },
        "role_parameters": {
            "guest": {
                "args": {
                    "data": {
                        "train_data": [
                            {
                                "name": "breast_hetero_guest",
                                "namespace": "experiment"
                            }
                        ],
                        "eval_data": [
                            {
                                "name": "breast_hetero_guest",
                                "namespace": "experiment"
                            }
                        ]
                    }
                },
                "dataio_0": {
                    "with_label": [
                        true
                    ],
                    "label_name": [
                        "y"
                    ],
                    "label_type": [
                        "int"
                    ],
                    "output_format": [
                        "dense"
                    ],
                    "missing_fill": [
                        true
                    ],
                    "outlier_replace": [
                        true
                    ]
                },
                "evaluation_0": {
                    "eval_type": [
                        "binary"
                    ],
                    "pos_label": [
                        1
                    ]
                }
            },
            "host": {
                "args": {
                    "data": {
                        "train_data": [
                            {
                                "name": "breast_hetero_host",
                                "namespace": "experiment"
                            }
                        ],
                        "eval_data": [
                            {
                                "name": "breast_hetero_host",
                                "namespace": "experiment"
                            }
                        ]
                    }
                },
                "dataio_0": {
                    "with_label": [
                        false
                    ],
                    "output_format": [
                        "dense"
                    ],
                    "outlier_replace": [
                        true
                    ]
                },
                "evaluation_0": {
                    "need_run": [
                        false
                    ]
                }
            }
        },
        "algorithm_parameters": {
            "hetero_lr_0": {
                "penalty": "L2",
                "optimizer": "rmsprop",
                "tol": 0.0001,
                "alpha": 0.01,
                "max_iter": 30,
                "early_stop": "diff",
                "batch_size": -1,
                "learning_rate": 0.15,
                "init_param": {
                    "init_method": "zeros"
                },
                "sqn_param": {
                    "update_interval_L": 3,
                    "memory_M": 5,
                    "sample_size": 5000,
                    "random_seed": null
                },
                "cv_param": {
                    "n_splits": 5,
                    "shuffle": false,
                    "random_seed": 103,
                    "need_cv": false
                }
            },
            "intersect_0": {
                "intersect_method": "rsa",
                "sync_intersect_ids": true,
                "only_output_key": false
            }
        }
    }
    EOF
    
2. 执行训练任务
  • 命令执行

    flow job submit -c /data/projects/train/test_hetero_lr_train_job_conf.json -d /data/projects/train/test_hetero_lr_train_job_dsl.json
    
3. 查看训练结果

训练结果

4. 查看模型信息
  • 命令【格式:flow job config -j $JOB_ID -r host -p 10000 -o ./examples/

    flow job config -j 202207050646218735964 -r guest -p 10000 -o /data/projects/predict
    
5. 修改预测配置文件

目录examples/dsl/v1/homo_logistic_regression/test_predict_conf.json。修改model_id和model_version,eval_data是需要预测的数据,在这里没有添加新的数据,直接使用原来的数据进行了预测,也可以可以直接使用其他示例进行验证或交叉验证:

a. 编辑评估文件
cat > /data/projects/predict/test_predict_conf.json <<EOF
{
    "initiator": {
        "role": "guest",
        "party_id": 10000
    },
    "job_parameters": {
        "work_mode": 0,
        "job_type": "predict",
        "model_id": "arbiter-10000#guest-10000#host-10000#model",
        "model_version": "202207050646218735964"
    },
    "role": {
        "guest": [
            10000
        ],
        "host": [
            10000
        ],
        "arbiter": [
            10000
        ]
    },
    "role_parameters": {
        "guest": {
            "args": {
                "data": {
                    "eval_data": [
                        {
                            "name": "breast_hetero_guest",
                            "namespace": "experiment"
                        }
                    ]
                }
            }
        },
        "host": {
            "args": {
                "data": {
                    "eval_data": [
                        {
                            "name": "breast_hetero_host",
                            "namespace": "experiment"
                        }
                    ]
                }
            }
        }
    }
}
EOF
b. 执行预测命令
  • 命令

    flow job submit -c /data/projects/predict/test_predict_conf.json -d /data/projects/train/test_hetero_lr_train_job_dsl.json
    
6.查看评估结果图

模型评估

本文含有隐藏内容,请 开通VIP 后查看