Terraform & Helm:微服务基础设施即代码

发布于:2025-07-02 ⋅ 阅读:(27) ⋅ 点赞:(0)

🚀 Terraform & Helm:微服务基础设施即代码



1. 引言 🚀

TL;DR

  • 🛠️ 精准声明:Terraform 管理 Resource Group、VNet(多子网)、AKS(跨可用区 + Azure Monitor)、Key Vault、PostgreSQL 等生产资源,统一 Tag 并锁定 Provider 版本
  • 📦 Umbrella Chart:Helm Umbrella Chart 支持多微服务(Gateway、Identity…),含 Probes、PodDisruptionBudget、NetworkPolicy、HPA、Secret 引用、values.schema.json 与 Chart.lock
  • 🔄 端到端 CI/CD:GitHub Actions 流水线集成 OIDC 登录、Terraform fmt/validate/lint、Checkov、Infracost、Azure 登录、ACR 鉴权、Helm lint/test/package/push、自动回滚,并发控制与环境审批
  • 🌟 企业级要素:高性能(Azure CNI + HPA + PDB)、高可用(多 AZ + Monitoring)、安全可复现(Key Vault + Terraform Backend + Sensitive + OIDC + Cost Scan)

📚 背景与动机
在多微服务架构下,“在我机器上没问题”往往难以复现🧩。通过 Terraform + Helm Chart + GitOps/CI-CD,可实现基础设施与应用部署一体化自动化、一致化、可审计、可回滚,大幅提升交付速度与可靠性💪。


2. 环境与依赖 🧰

terraform version      # >=1.4
az version             # 最新 Azure CLI
kubectl version --client
helm version           # >=3.8+

仓库结构

.
├─ infra/
│   ├─ terraform/
│   │    ├─ backend.tf
│   │    ├─ required_providers.tf
│   │    ├─ variables.tf
│   │    ├─ main.tf
│   │    ├─ outputs.tf
│   │    └─ modules/
│   │         ├─ resource_group/
│   │         ├─ vnet/
│   │         ├─ keyvault/
│   │         ├─ aks/
│   │         └─ rds/
│   └─ helm-charts/
│        └─ abp-vnext/
│             ├─ Chart.yaml
│             ├─ Chart.lock
│             ├─ values.schema.json
│             ├─ values.yaml
│             ├─ values-dev.yaml
│             ├─ values-prod.yaml
│             ├─ charts/           # Gateway、Identity 等 Subcharts
│             └─ templates/        # Umbrella 公共资源(Ingress 可选)
└─ src/
     └─ MyAbpSolution.sln

3. 架构示意 🏗️

Azure
Resource Group
VNet
Subnet: aks
Subnet: db
Key Vault
AKS 集群
PostgreSQL
Helm 部署多服务
开发者
CI/CD 流水线
IaC: Terraform
Helm 部署

4. Terraform 定义云资源 🛠️

4.1 Provider 与 Backend

# required_providers.tf
terraform {
  required_version = ">= 1.4"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.0"
    }
  }
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "tfstateacct"
    container_name       = "tfstate"
    key                  = "${terraform.workspace}.tfstate"
  }
}

4.2 公共变量与 Tag

# variables.tf
variable "location"    { type = string }
variable "environment" { type = string }
variable "owner"       { type = string }
variable "cost_center" { type = string }
variable "db_admin"    { type = string }
variable "db_password" { type = string }

locals {
  common_tags = {
    environment = var.environment
    owner       = var.owner
    cost_center = var.cost_center
  }
}

4.3 Resource Group 模块

# modules/resource_group/main.tf
resource "azurerm_resource_group" "this" {
  name     = var.name
  location = var.location
  tags     = var.tags
}
output "rg_name" { value = azurerm_resource_group.this.name }
# modules/resource_group/variables.tf
variable "name"     { type = string }
variable "location" { type = string }
variable "tags"     { type = map(string) }

4.4 VNet 模块

# modules/vnet/main.tf
resource "azurerm_virtual_network" "this" {
  name                = var.name
  address_space       = var.address_space
  location            = var.location
  resource_group_name = var.rg_name
  tags                = var.tags
}

resource "azurerm_subnet" "this" {
  for_each            = var.subnets
  name                = each.key
  resource_group_name = var.rg_name
  virtual_network_name= azurerm_virtual_network.this.name
  address_prefixes    = [each.value]
}

output "subnet_ids_map" {
  value = { for s in azurerm_subnet.this : s.name => s.id }
}
# modules/vnet/variables.tf
variable "name"          { type = string }
variable "location"      { type = string }
variable "rg_name"       { type = string }
variable "address_space" { type = list(string) }
variable "subnets"       { type = map(string) }
variable "tags"          { type = map(string) }

4.5 Key Vault 模块

# modules/keyvault/main.tf
data "azurerm_client_config" "current" {}

resource "azurerm_key_vault" "this" {
  name                        = var.name
  location                    = var.location
  resource_group_name         = var.rg_name
  tenant_id                   = data.azurerm_client_config.current.tenant_id
  sku_name                    = "standard"
  purge_protection_enabled    = true
  soft_delete_enabled         = true
  tags                        = var.tags
}

resource "azurerm_key_vault_secret" "db_password" {
  name         = "db-password"
  value        = var.admin_password
  key_vault_id = azurerm_key_vault.this.id
}
# modules/keyvault/variables.tf
variable "name"           { type = string }
variable "location"       { type = string }
variable "rg_name"        { type = string }
variable "admin_password" { type = string }
variable "tags"           { type = map(string) }

4.6 AKS 模块

# modules/aks/main.tf
resource "azurerm_kubernetes_cluster" "this" {
  name                = var.name
  location            = var.location
  resource_group_name = var.rg_name
  dns_prefix          = var.dns_prefix
  tags                = var.tags

  default_node_pool {
    name                = "agentpool"
    vm_size             = var.node_size
    availability_zones  = var.availability_zones
    enable_auto_scaling = var.enable_auto_scaler
    min_count           = var.min_count
    max_count           = var.max_count
    os_disk_size_gb     = 50
  }

  network_profile {
    network_plugin    = "azure"
    network_policy    = "calico"
    load_balancer_sku = "standard"
    subnet_id         = var.aks_subnet_id
  }

  identity { type = "SystemAssigned" }
  addon_profile { oms_agent { enabled = true } }  # Azure Monitor
}

output "kube_config" {
  value     = azurerm_kubernetes_cluster.this.kube_admin_config_raw
  sensitive = true
}
# modules/aks/variables.tf
variable "name"               { type = string }
variable "location"           { type = string }
variable "rg_name"            { type = string }
variable "dns_prefix"         { type = string }
variable "aks_subnet_id"      { type = string }
variable "node_size"          { type = string }
variable "availability_zones"{ type = list(string), default = ["1","2","3"] }
variable "enable_auto_scaler" { type = bool, default = true }
variable "min_count"          { type = number, default = 2 }
variable "max_count"          { type = number, default = 5 }
variable "tags"               { type = map(string) }

4.7 RDS 模块

# modules/rds/main.tf
data "azurerm_key_vault_secret" "db_pwd" {
  name         = "db-password"
  key_vault_id = var.keyvault_id
}

resource "azurerm_postgresql_flexible_server" "this" {
  name                         = var.name
  location                     = var.location
  resource_group_name          = var.rg_name
  version                      = var.pg_version
  sku_name                     = var.sku_name
  storage_mb                   = var.storage_mb
  delegated_subnet_id          = var.subnet_id
  administrator_login          = var.admin_user
  administrator_login_password = data.azurerm_key_vault_secret.db_pwd.value
  tags                         = var.tags
}

output "connection_string" {
  value     = format(
    "Host=%s;Port=5432;Username=%s;Password=%s;Database=%s",
    azurerm_postgresql_flexible_server.this.fqdn,
    var.admin_user,
    data.azurerm_key_vault_secret.db_pwd.value,
    var.db_name
  )
  sensitive = true
}
# modules/rds/variables.tf
variable "name"        { type = string }
variable "location"    { type = string }
variable "rg_name"     { type = string }
variable "pg_version"  { type = string }
variable "sku_name"    { type = string }
variable "storage_mb"  { type = number }
variable "admin_user"  { type = string }
variable "subnet_id"   { type = string }
variable "db_name"     { type = string }
variable "keyvault_id" { type = string }
variable "tags"        { type = map(string) }

4.8 根模块调用

# main.tf
provider "azurerm" { features {} }

module "rg" {
  source   = "./modules/resource_group"
  name     = "${var.environment}-rg"
  location = var.location
  tags     = local.common_tags
}

module "vnet" {
  source        = "./modules/vnet"
  name          = "${var.environment}-vnet"
  location      = var.location
  rg_name       = module.rg.rg_name
  address_space = ["10.0.0.0/16"]
  subnets       = { aks = "10.0.1.0/24", db = "10.0.2.0/24" }
  tags          = local.common_tags
}

module "keyvault" {
  source         = "./modules/keyvault"
  name           = "${var.environment}-kv"
  location       = var.location
  rg_name        = module.rg.rg_name
  admin_password = var.db_password
  tags           = local.common_tags
}

module "aks" {
  source            = "./modules/aks"
  name              = "${var.environment}-aks"
  location          = var.location
  rg_name           = module.rg.rg_name
  dns_prefix        = var.environment
  aks_subnet_id     = module.vnet.subnet_ids_map["aks"]
  node_size         = "Standard_DS2_v2"
  availability_zones= ["1","2","3"]
  enable_auto_scaler= true
  min_count         = 2
  max_count         = 5
  tags              = local.common_tags
}

module "rds" {
  source      = "./modules/rds"
  name        = "${var.environment}-pg"
  location    = var.location
  rg_name     = module.rg.rg_name
  pg_version  = "13"
  sku_name    = "GP_Gen5_2"
  storage_mb  = 5120
  admin_user  = var.db_admin
  subnet_id   = module.vnet.subnet_ids_map["db"]
  keyvault_id = module.keyvault.azurerm_key_vault.this.id
  db_name     = "abpdb"
  tags        = local.common_tags
}
# outputs.tf
output "kubeconfig"     { value = module.aks.kube_config      sensitive = true }
output "db_conn_string" { value = module.rds.connection_string sensitive = true }

5. Helm Chart 打包与校验 📦

5.1 Chart.yaml & Chart.lock

# Chart.yaml
apiVersion: v2
name: abp-vnext
version: 0.4.0
appVersion: "1.0.0"
description: "ABP VNext 多服务 Kubernetes Umbrella Chart"
dependencies:
  - name: gateway
    version: "0.2.0"
    repository: file://charts/gateway
  - name: identity
    version: "0.2.0"
    repository: file://charts/identity
helm dependency update infra/terraform/helm-charts/abp-vnext
helm dependency build  infra/terraform/helm-charts/abp-vnext

5.2 values.schema.json

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "replicaCount": { "type": "integer" },
    "image": {
      "type": "object",
      "properties": {
        "repository": { "type": "string" },
        "tag": { "type": "string" }
      },
      "required": ["repository","tag"]
    },
    "service": {
      "type": "object",
      "properties": {
        "type": { "type": "string" },
        "port": { "type": "integer" }
      },
      "required": ["type","port"]
    }
  },
  "required": ["replicaCount","image","service"]
}

5.3 Subchart 示例(gateway)

# charts/gateway/values.yaml
replicaCount: 2
image:
  repository: myacr.azurecr.io/abp-gateway
  tag: "1.0.0"
service:
  type: ClusterIP
  port: 80
resources:
  limits:
    cpu: "500m"
    memory: "512Mi"
  requests:
    cpu: "250m"
    memory: "256Mi"
# charts/gateway/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "gateway.fullname" . }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ include "gateway.name" . }}
  template:
    metadata:
      labels:
        app: {{ include "gateway.name" . }}
    spec:
      containers:
        - name: gateway
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          ports:
            - containerPort: 80
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 80
            initialDelaySeconds: 20
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health/live
              port: 80
            initialDelaySeconds: 30
            periodSeconds: 15
          resources:
            {{ toYaml .Values.resources | indent 12 }}

其余 Service、Ingress、PDB、NetworkPolicy、HPA 与前文一致。

5.4 Lint & Test

helm lint infra/terraform/helm-charts/abp-vnext --strict
helm template abp-vnext infra/terraform/helm-charts/abp-vnext
helm package  infra/terraform/helm-charts/abp-vnext -d charts-packages

6. CI/CD 流水线 🔄

6.1 Infra Workflow

# .github/workflows/infra.yml
name: Infra – Terraform

on:
  push:
    paths: ["infra/terraform/**"]

concurrency:
  group: infra-${{ github.ref }}
  cancel-in-progress: true

permissions:
  id-token: write
  contents: read

jobs:
  terraform:
    runs-on: ubuntu-latest
    outputs:
      kubeconfig: ${{ steps.apply.outputs.kubeconfig }}
      db_conn_string: ${{ steps.apply.outputs.db_conn_string }}

    steps:
      - uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Terraform Fmt Check
        run: terraform fmt -check
        working-directory: infra/terraform

      - name: Terraform Validate
        run: terraform validate
        working-directory: infra/terraform

      - name: Terraform Init
        run: terraform init -input=false
        working-directory: infra/terraform

      - name: Checkov Scan
        uses: bridgecrewio/checkov-action@master
        with:
          directory: infra/terraform

      - name: Infracost Estimate
        uses: infracost/actions@v2
        with:
          path: infra/terraform
        env:
          INFRACOST_TOKEN: ${{ secrets.INFRACOST_TOKEN }}

      - name: Terraform Plan
        id: plan
        run: terraform plan -out=tfplan
        working-directory: infra/terraform

      - name: Terraform Apply
        id: apply
        run: |
          terraform workspace select ${{ github.ref_name }} || terraform workspace new ${{ github.ref_name }}
          terraform apply -auto-approve tfplan
          echo "kubeconfig<<EOF" >> $GITHUB_OUTPUT
          terraform output -raw kubeconfig >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
          echo "db_conn_string<<EOF" >> $GITHUB_OUTPUT
          terraform output -raw db_conn_string >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
        working-directory: infra/terraform
        env:
          ARM_USE_MSI:          true
          ARM_SUBSCRIPTION_ID:  ${{ secrets.AZURE_SUBSCRIPTION_ID }}
          ARM_TENANT_ID:        ${{ secrets.AZURE_TENANT_ID }}

6.2 Deploy Workflow

# .github/workflows/deploy.yml
name: Deploy – Helm

on:
  push:
    paths:
      - "src/**"
      - "infra/terraform/helm-charts/**"

concurrency:
  group: deploy-${{ github.ref }}
  cancel-in-progress: true

permissions:
  id-token: write
  contents: read

environment:
  name: production
  url: https://abp.example.com
  reviewers:
    - alice
    - bob

jobs:
  deploy:
    runs-on: ubuntu-latest
    needs: terraform

    steps:
      - uses: actions/checkout@v3

      - name: Azure Login via OIDC
        uses: azure/login@v1
        with:
          client-id:       ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id:       ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: AKS Set Context
        uses: azure/aks-set-context@v1
        with:
          resource-group: production-rg
          cluster-name:   production-aks

      - name: Docker Login to ACR
        uses: docker/login-action@v2
        with:
          registry: myacr.azurecr.io
          username: ${{ secrets.ACR_USERNAME }}
          password: ${{ secrets.ACR_PASSWORD }}

      - name: Build & Push Image
        run: |
          docker build -t myacr.azurecr.io/abp-vnext:${{ github.sha }} src/
          docker push myacr.azurecr.io/abp-vnext:${{ github.sha }}

      - name: Set DEPLOY_ENV
        run: |
          if [ "${GITHUB_REF}" == "refs/heads/main" ]; then
            echo "DEPLOY_ENV=prod" >> $GITHUB_ENV
          else
            echo "DEPLOY_ENV=dev" >> $GITHUB_ENV
          fi

      - name: Helm Lint & Test
        run: |
          helm lint infra/terraform/helm-charts/abp-vnext --strict
          helm test --cleanup gateway

      - name: Helm Package & Push
        run: |
          helm package infra/terraform/helm-charts/abp-vnext -d charts-packages
          helm push charts-packages/abp-vnext-*.tgz oci://myhelmrepo

      - name: Helm Upgrade with Rollback
        run: |
          set +e
          helm repo add myrepo oci://myhelmrepo
          helm repo update
          helm upgrade --install abp-vnext myrepo/abp-vnext \
            --version 0.4.0 \
            -f infra/terraform/helm-charts/abp-vnext/values-${DEPLOY_ENV}.yaml \
            --set image.tag=${GITHUB_SHA} \
            --set-string env.DB_CONN="${{ needs.terraform.outputs.db_conn_string }}" \
            --wait --timeout 5m
          if [ $? -ne 0 ]; then
            helm rollback abp-vnext 1
            exit 1
          fi
          set -e

      - name: Verify Rollout
        run: kubectl rollout status deployment/gateway

      - name: Notify Slack on Success
        if: success()
        uses: 8398a7/action-slack@v3
        with:
          payload: '{"text":"✅ 部署成功:ABP VNext 微服务已更新到 '"${{ github.sha }}"'"}'
          channel: production-alerts
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

      - name: Notify Slack on Failure
        if: failure()
        uses: 8398a7/action-slack@v3
        with:
          payload: '{"text":"❌ 部署失败:请检查流水线日志!"}'
          channel: production-alerts
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

7. 可观测性与告警 🔔

  • Azure Monitor:已开启 Container Insights
  • Prometheus/Grafana:可选部署,收集集群与业务指标
  • EFK/ELK:通过 DaemonSet 或 Sidecar 收集日志
  • Alertmanager:基于阈值触发告警,推送到 Slack/Teams

8. 附录 📂

参考资料


网站公告

今日签到

点亮在社区的每一天
去签到