Mac 上开发 Ragflow

发布于:2025-03-19 ⋅ 阅读:(16) ⋅ 点赞:(0)

说明

在「Mac 上编译 Ragflow」的文章中只是简单的说到可以通过 Remote Development 来进入容器,可以通过修改Dockerfile的方式,增加映射卷的方式来开发。为什么要映射卷呢?这样可以保持 ragflow 原有的目录结构,同时保持 git 的代码管理,文件在卷中也不用担心因容器销毁等导致修改代码丢失的情况。

我今天完整的走通了开发的过程,重新整理开发部分的文档。为了保持完整,保留了 Remote Development 的部分说明。注意,前提是已经在 Mac 上编译完 ragflow 的 Dockerfile,并且可以正常运行。

开发

Remote Development

VS Code 安装 Remote Development 插件。它包含多个插件,其中 Dev Containters 可以连接到 Docker 容器作为开发环境,这样的好处是开发环境与部署环境一致。

在这里插入图片描述

当安装好 Remote Development,就可以通过 Remote Explorer 查看已经安装的容器,并通过点击对应容器的箭头(注意下图红色框的部分)。

在这里插入图片描述

进入到容器内部,选对应的目录,就可以打开 /ragflow,这样就能看到实际的代码了。

在这里插入图片描述

修改Dockerfile

主要是屏蔽编译过程和 probuction 步骤,增加 /raflow 卷的映射。

# base stage
FROM docker.m.daocloud.io/ubuntu:22.04 AS base
USER root
SHELL ["/bin/bash", "-c"]

ARG NEED_MIRROR=0
ARG LIGHTEN=0
ENV LIGHTEN=${LIGHTEN}

WORKDIR /ragflow

# Copy models downloaded via download_deps.py
RUN mkdir -p /ragflow/rag/res/deepdoc /root/.ragflow
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/huggingface.co,target=/huggingface.co \
    cp /huggingface.co/InfiniFlow/huqie/huqie.txt.trie /ragflow/rag/res/ && \
    tar --exclude='.*' -cf - \
        /huggingface.co/InfiniFlow/text_concat_xgb_v1.0 \
        /huggingface.co/InfiniFlow/deepdoc \
        | tar -xf - --strip-components=3 -C /ragflow/rag/res/deepdoc 
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/huggingface.co,target=/huggingface.co \
    if [ "$LIGHTEN" != "1" ]; then \
        (tar -cf - \
            /huggingface.co/BAAI/bge-large-zh-v1.5 \
            /huggingface.co/BAAI/bge-reranker-v2-m3 \
            /huggingface.co/maidalun1020/bce-embedding-base_v1 \
            /huggingface.co/maidalun1020/bce-reranker-base_v1 \
            | tar -xf - --strip-components=2 -C /root/.ragflow) \
    fi

# https://github.com/chrismattmann/tika-python
# This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
    cp -r /deps/nltk_data /root/ && \
    cp /deps/tika-server-standard-3.0.0.jar /deps/tika-server-standard-3.0.0.jar.md5 /ragflow/ && \
    cp /deps/cl100k_base.tiktoken /ragflow/9b5ad71b2ce5302211f9c61530b329a4922fc6a4

ENV TIKA_SERVER_JAR="file:///ragflow/tika-server-standard-3.0.0.jar"
ENV DEBIAN_FRONTEND=noninteractive

# Setup apt
# Python package and implicit dependencies:
# opencv-python: libglib2.0-0 libglx-mesa0 libgl1
# aspose-slides: pkg-config libicu-dev libgdiplus         libssl1.1_1.1.1f-1ubuntu2_amd64.deb
# python-pptx:   default-jdk                              tika-server-standard-3.0.0.jar
# selenium:      libatk-bridge2.0-0                       chrome-linux64-121-0-6167-85
# Building C extensions: libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
    if [ "$NEED_MIRROR" == "1" ]; then \
        sed -i 's|http://archive.ubuntu.com|https://mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list; \
    fi; \
    rm -f /etc/apt/apt.conf.d/docker-clean && \
    echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache && \
    chmod 1777 /tmp && \
    apt update && \
    apt --no-install-recommends install -y ca-certificates && \
    apt update && \
    apt install -y libglib2.0-0 libglx-mesa0 libgl1 && \
    apt install -y pkg-config libicu-dev libgdiplus && \
    apt install -y default-jdk && \
    apt install -y libatk-bridge2.0-0 && \
    apt install -y libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev && \
    apt install -y libjemalloc-dev && \
    apt install -y python3-pip pipx nginx unzip curl wget git vim less

RUN if [ "$NEED_MIRROR" == "1" ]; then \
        pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple && \
        pip3 config set global.trusted-host mirrors.aliyun.com; \
        mkdir -p /etc/uv && \
        echo "[[index]]" > /etc/uv/uv.toml && \
        echo 'url = "https://mirrors.aliyun.com/pypi/simple"' >> /etc/uv/uv.toml && \
        echo "default = true" >> /etc/uv/uv.toml; \
    fi; \
    pipx install uv

ENV PYTHONDONTWRITEBYTECODE=1 DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1
ENV PATH=/root/.local/bin:$PATH

# nodejs 12.22 on Ubuntu 22.04 is too old
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
    curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
    apt purge -y nodejs npm cargo && \
    apt autoremove -y && \
    apt update && \
    apt install -y nodejs

# A modern version of cargo is needed for the latest version of the Rust compiler.
RUN apt update && apt install -y curl build-essential \
    && if [ "$NEED_MIRROR" == "1" ]; then \
         # Use TUNA mirrors for rustup/rust dist files
         export RUSTUP_DIST_SERVER="https://mirrors.tuna.tsinghua.edu.cn/rustup"; \
         export RUSTUP_UPDATE_ROOT="https://mirrors.tuna.tsinghua.edu.cn/rustup/rustup"; \
         echo "Using TUNA mirrors for Rustup."; \
       fi; \
    # Force curl to use HTTP/1.1
    curl --proto '=https' --tlsv1.2 --http1.1 -sSf https://sh.rustup.rs | bash -s -- -y --profile minimal \
    && echo 'export PATH="/root/.cargo/bin:${PATH}"' >> /root/.bashrc

ENV PATH="/root/.cargo/bin:${PATH}"

RUN cargo --version && rustc --version

# Add msssql ODBC driver
# macOS ARM64 environment, install msodbcsql18.
# general x86_64 environment, install msodbcsql17.
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
    curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
    curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
    apt update && \
    arch="$(uname -m)"; \
    if [ "$arch" = "arm64" ] || [ "$arch" = "aarch64" ]; then \
        # ARM64 (macOS/Apple Silicon or Linux aarch64)
        ACCEPT_EULA=Y apt install -y unixodbc-dev msodbcsql18; \
    else \
        # x86_64 or others
        ACCEPT_EULA=Y apt install -y unixodbc-dev msodbcsql17; \
    fi || \
    { echo "Failed to install ODBC driver"; exit 1; }

# Add dependencies of selenium
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/chrome-linux64-121-0-6167-85,target=/chrome-linux64.zip \
    unzip /chrome-linux64.zip && \
    mv chrome-linux64 /opt/chrome && \
    ln -s /opt/chrome/chrome /usr/local/bin/
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/chromedriver-linux64-121-0-6167-85,target=/chromedriver-linux64.zip \
    unzip -j /chromedriver-linux64.zip chromedriver-linux64/chromedriver && \
    mv chromedriver /usr/local/bin/ && \
    rm -f /usr/bin/google-chrome

# https://forum.aspose.com/t/aspose-slides-for-net-no-usable-version-of-libssl-found-with-linux-server/271344/13
# aspose-slides on linux/arm64 is unavailable
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
    if [ "$(uname -m)" = "x86_64" ]; then \
        dpkg -i /deps/libssl1.1_1.1.1f-1ubuntu2_amd64.deb; \
    elif [ "$(uname -m)" = "aarch64" ]; then \
        dpkg -i /deps/libssl1.1_1.1.1f-1ubuntu2_arm64.deb; \
    fi

# builder stage
FROM base AS builder
USER root

WORKDIR /ragflow

# install dependencies from uv.lock file
# COPY pyproject.toml uv.lock ./

# https://github.com/astral-sh/uv/issues/10462
# uv records index url into uv.lock but doesn't failover among multiple indexes
# RUN --mount=type=cache,id=ragflow_uv,target=/root/.cache/uv,sharing=locked \
#     if [ "$NEED_MIRROR" == "1" ]; then \
#         sed -i 's|pypi.org|mirrors.aliyun.com/pypi|g' uv.lock; \
#     else \
#         sed -i 's|mirrors.aliyun.com/pypi|pypi.org|g' uv.lock; \
#     fi; \
#     if [ "$LIGHTEN" == "1" ]; then \
#         uv sync --python 3.10 --frozen; \
#     else \
#         uv sync --python 3.10 --frozen --all-extras; \
#     fi

# COPY web web
# COPY docs docs
# RUN --mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked \
#     cd web && npm install && npm run build

COPY .git /ragflow/.git

RUN version_info=$(git describe --tags --match=v* --first-parent --always); \
    if [ "$LIGHTEN" == "1" ]; then \
        version_info="$version_info slim"; \
    else \
        version_info="$version_info full"; \
    fi; \
    echo "RAGFlow version: $version_info"; \
    echo $version_info > /ragflow/VERSION

# production stage
# FROM base AS production
# USER root

# WORKDIR /ragflow

# Copy Python environment and packages
ENV VIRTUAL_ENV=/ragflow/.venv
# COPY --from=builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"

ENV PYTHONPATH=/ragflow/

# COPY web web
# COPY api api
# COPY conf conf
# COPY deepdoc deepdoc
# COPY rag rag
# COPY agent agent
# COPY graphrag graphrag
# COPY agentic_reasoning agentic_reasoning
# COPY pyproject.toml uv.lock ./

# COPY docker/service_conf.yaml.template ./conf/service_conf.yaml.template
# COPY docker/entrypoint.sh docker/entrypoint-parser.sh ./
# RUN chmod +x ./entrypoint*.sh

# Copy compiled web pages
# COPY --from=builder /ragflow/web/dist /ragflow/web/dist

# COPY --from=builder /ragflow/VERSION /ragflow/VERSION

VOLUME /ragflow

ENTRYPOINT ["./entrypoint.sh"]

重新编译镜像

docker build --build-arg LIGHTEN=1 --build-arg NEED_MIRROR=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .

修改docker-compose-macos.yml

这里设置本地的 ragflow 为镜像的/ragflow卷的,映射目录。这的好处就是还可以保留 git 的代码管理。

include:
  - ./docker-compose-base.yml

services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
    build:
      context: ../
      dockerfile: Dockerfile
    container_name: ragflow-server
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ../:/ragflow
    env_file: .env
    environment:
      - TZ=${TIMEZONE}
      - HF_ENDPOINT=${HF_ENDPOINT}
      - MACOS=${MACOS:-1}
      - LIGHTEN=${LIGHTEN:-1}
    networks:
      - ragflow
    restart: on-failure
    # https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration
    # If you're using Docker Desktop, the --add-host flag is optional. This flag makes sure that the host's internal IP gets exposed to the Prometheus container.
    extra_hosts:
      - "host.docker.internal:host-gateway"

因为增加了 ragflow 的卷映射,这里注意拷贝 docker/service_conf.yaml.template 到 ragflow/conf 目录下,docker/entrypoint.sh docker/entrypoint-parser.sh 拷贝到 raflow 目录下,避免程序无法启动。

cd docker
$ docker compose -f docker-compose-macos.yml up -d

修改entrypoint.sh

增加 --debug 支持调试信息。

#!/bin/bash

# replace env variables in the service_conf.yaml file
rm -rf /ragflow/conf/service_conf.yaml
while IFS= read -r line || [[ -n "$line" ]]; do
    # Use eval to interpret the variable with default values
    eval "echo \"$line\"" >> /ragflow/conf/service_conf.yaml
done < /ragflow/conf/service_conf.yaml.template

/usr/sbin/nginx

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/

PY=python3
if [[ -z "$WS" || $WS -lt 1 ]]; then
  WS=1
fi

function task_exe(){
    JEMALLOC_PATH=$(pkg-config --variable=libdir jemalloc)/libjemalloc.so
    while [ 1 -eq 1 ];do
      LD_PRELOAD=$JEMALLOC_PATH $PY rag/svr/task_executor.py $1;
    done
}

for ((i=0;i<WS;i++))
do
  task_exe  $i &
done

while [ 1 -eq 1 ];do
    $PY api/ragflow_server.py --debug
done

wait;

编译

正常情况下,这个时候服务是无法访问的,但容器已经启动,可以通过 dev containers 来访问容器。进入容器后编译前端和重新安装 ragflow 的 python 依赖。

前端
cd web
npm install
npm run build
后端

重新获取依赖

uv sync --python 3.10 --frozen;

验证

目录结构一致

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

修改下代码,比如api/apps/user_app.py。

在这里插入图片描述

我们可以看到修改的代码已经被执行。

在这里插入图片描述

小结

编程很多时候的问题是部署开发环境的问题,现在有了 docker 已经可以开发部署简单多了。但因为众所周知的原因,你还需要掌握科学上网的魔法。

这次部署 ragflow 环境下来,发现硬盘太小是个问题,模型或者镜像都要确定哪些没用了及时删除。