Ray集群部署与维护

发布于:2025-07-17 ⋅ 阅读:(16) ⋅ 点赞:(0)

Ray集群部署与维护

一、环境准备

1.1 安装依赖

根据不同云平台,执行以下命令安装必要依赖:

AWS
pip install -U "ray[default]" boto3
GCP
pip install -U "ray[default]" google-api-python-client
Azure
pip install -U "ray[default]" azure-cli azure-core

1.2 配置云平台凭证

AWS

配置~/.aws/credentials文件,参考AWS文档

GCP

设置环境变量:

export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"
Azure

登录并配置订阅:

az login
az account set -s <subscription_id>

二、集群部署

2.1 创建配置文件

创建config.yaml文件,以下是各平台的最小配置示例:

AWS
cluster_name: minimal
provider:
    type: aws
    region: us-west1
auth:
    ssh_user: ubuntu
GCP
cluster_name: minimal
provider:
    type: gcp
    region: us-west1
auth:
    ssh_user: ubuntu
Azure
cluster_name: minimal
provider:
    type: azure
    location: westus2
    resource_group: ray-cluster
auth:
    ssh_user: ubuntu
    ssh_private_key: ~/.ssh/id_rsa
    ssh_public_key: ~/.ssh/id_rsa.pub

2.2 启动集群

ray up -y config.yaml

三、集群使用

3.1 提交作业

ray exec config.yaml 'python -c "import ray; ray.init()"'

3.2 连接到集群

ray attach config.yaml

3.3 运行示例应用

创建script.py文件:

from collections import Counter
import socket
import time
import ray

ray.init()

print(f'''This cluster consists of
    {
     len(ray.nodes())} nodes in total
    {
     ray.cluster_resources()['CPU']} CPU resources in total
''')

@ray.remote
def 

网站公告

今日签到

点亮在社区的每一天
去签到