Deploy StarRocks with Docker

发布于:2025-07-01 ⋅ 阅读:(24) ⋅ 点赞:(0)

官网文档:Deploy StarRocks with Docker | StarRocks

如果Downloading 不动,停止后再启动。

#启动:starrocks
docker run -p 9030:9030 -p 8030:8030 -p 8040:8040 -itd --name quickstart starrocks/allin1-ubuntu

#下载数据包
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/72505394728.csv

curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/NYPD_Crash_Data.csv

# MySql终端链接
docker exec -it quickstart mysql -P 9030 -h 127.0.0.1 -u root --prompt="StarRocks > "
##  --prompt 解释:是 MySQL 客户端的一个选项,用于自定义命令行提示符。

 连接成功:

或使用Mysql工具连接  ROOT 密码为空

创建数据库、表

CREATE DATABASE IF NOT EXISTS quickstart;

USE quickstart;

 

报错:curl: (3) URL using bad/illegal format or missing URL

原因:PowerShell 中使用的是交互式输入方式(即每行手动输入),这种方式容易导致 curl 解析参数失败。

特别是当你在终端中逐行输入命令时,PowerShell 的 curl.exe 会尝试立即执行它已经“看到”的内容,而不是等待整个命令拼接完成。

优化通过脚本通过Python写入

import requests
from requests.auth import HTTPBasicAuth
import os

# 配置参数
STARROCKS_URL = "http://localhost:8030/api/quickstart/crashdata/_stream_load"
CSV_FILE_PATH = "./NYPD_Crash_Data.csv"

HEADERS = {
    "label": "crashdata-0",
    "column_separator": ",",
    "skip_header": "1",
    "enclose": '"',
    "max_filter_ratio": "1",
    "columns": (
        "tmp_CRASH_DATE, tmp_CRASH_TIME, "
        "CRASH_DATE=str_to_date(concat_ws(' ', tmp_CRASH_DATE, tmp_CRASH_TIME), '%m/%d/%Y %H:%i'),"
        "BOROUGH,ZIP_CODE,LATITUDE,LONGITUDE,LOCATION,"
        "ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME,"
        "NUMBER_OF_PERSONS_INJURED,NUMBER_OF_PERSONS_KILLED,"
        "NUMBER_OF_PEDESTRIANS_INJURED,NUMBER_OF_PEDESTRIANS_KILLED,"
        "NUMBER_OF_CYCLIST_INJURED,NUMBER_OF_CYCLIST_KILLED,"
        "NUMBER_OF_MOTORIST_INJURED,NUMBER_OF_MOTORIST_KILLED,"
        "CONTRIBUTING_FACTOR_VEHICLE_1,CONTRIBUTING_FACTOR_VEHICLE_2,"
        "CONTRIBUTING_FACTOR_VEHICLE_3,CONTRIBUTING_FACTOR_VEHICLE_4,"
        "CONTRIBUTING_FACTOR_VEHICLE_5,COLLISION_ID,"
        "VEHICLE_TYPE_CODE_1,VEHICLE_TYPE_CODE_2,VEHICLE_TYPE_CODE_3,"
        "VEHICLE_TYPE_CODE_4,VEHICLE_TYPE_CODE_5"
    ),
    "Expect": "100-continue"
}

USER = "root"
PASSWORD = ""  # 如果设置了密码,请填写(如 'your_password')

def upload_to_starrocks():
    if not os.path.exists(CSV_FILE_PATH):
        print(f"❌ 文件 {CSV_FILE_PATH} 不存在")
        return

    print("⏳ 正在上传文件...")
    with open(CSV_FILE_PATH, "rb") as f:
        try:
            response = requests.put(
                STARROCKS_URL,
                auth=HTTPBasicAuth(USER, PASSWORD),
                headers=HEADERS,
                data=f,
                timeout=6000  # 设置最大等待时间
            )
        except requests.exceptions.Timeout:
            print("❌ 请求超时,请检查网络或 StarRocks 是否正常")
            return
        except Exception as e:
            print(f"❌ 发生异常:{e}")
            return

    print("✅ 响应状态码:", response.status_code)
    try:
        print("📄 响应内容:\n", response.json())
    except Exception:
        print("📄 原始响应内容:\n", response.text)

if __name__ == "__main__":
    upload_to_starrocks()

 这个脚本试了很慢,采用文件上传至容器内的方式,导入成功

docker cp ../weather/output/isd_lite_2021_china_with_station_info.csv quickstart:/data/tmp

root@46b4bd1c3a6a:/data/tmp# curl --location-trusted -u root \
>     -T ./isd_lite_2021_china_with_station_info.csv \
>     -H "label:gz-weather-0" \
>     -H "column_separator:," \
>     -H "skip_header:1" \
>     -H "enclose:\"" \
>     -H "max_filter_ratio:1" \
>     -H "columns:year,month,day,hour,temp,dew_point,slp,wind_dir,wind_speed,sky_cover,precip_1hr,precip_6hr,station_id,station_name,country,latitude,longitude,elevation,datetime" \

 成功导入2000万条数据速度极快


网站公告

今日签到

点亮在社区的每一天
去签到