🚀 Terraform & Helm:微服务基础设施即代码
📚 目录
1. 引言 🚀
✨ TL;DR
- 🛠️ 精准声明:Terraform 管理 Resource Group、VNet(多子网)、AKS(跨可用区 + Azure Monitor)、Key Vault、PostgreSQL 等生产资源,统一 Tag 并锁定 Provider 版本
- 📦 Umbrella Chart:Helm Umbrella Chart 支持多微服务(Gateway、Identity…),含 Probes、PodDisruptionBudget、NetworkPolicy、HPA、Secret 引用、values.schema.json 与 Chart.lock
- 🔄 端到端 CI/CD:GitHub Actions 流水线集成 OIDC 登录、Terraform fmt/validate/lint、Checkov、Infracost、Azure 登录、ACR 鉴权、Helm lint/test/package/push、自动回滚,并发控制与环境审批
- 🌟 企业级要素:高性能(Azure CNI + HPA + PDB)、高可用(多 AZ + Monitoring)、安全可复现(Key Vault + Terraform Backend + Sensitive + OIDC + Cost Scan)
📚 背景与动机
在多微服务架构下,“在我机器上没问题”往往难以复现🧩。通过 Terraform + Helm Chart + GitOps/CI-CD,可实现基础设施与应用部署一体化自动化、一致化、可审计、可回滚,大幅提升交付速度与可靠性💪。
2. 环境与依赖 🧰
terraform version # >=1.4
az version # 最新 Azure CLI
kubectl version --client
helm version # >=3.8+
仓库结构
.
├─ infra/
│ ├─ terraform/
│ │ ├─ backend.tf
│ │ ├─ required_providers.tf
│ │ ├─ variables.tf
│ │ ├─ main.tf
│ │ ├─ outputs.tf
│ │ └─ modules/
│ │ ├─ resource_group/
│ │ ├─ vnet/
│ │ ├─ keyvault/
│ │ ├─ aks/
│ │ └─ rds/
│ └─ helm-charts/
│ └─ abp-vnext/
│ ├─ Chart.yaml
│ ├─ Chart.lock
│ ├─ values.schema.json
│ ├─ values.yaml
│ ├─ values-dev.yaml
│ ├─ values-prod.yaml
│ ├─ charts/ # Gateway、Identity 等 Subcharts
│ └─ templates/ # Umbrella 公共资源(Ingress 可选)
└─ src/
└─ MyAbpSolution.sln
3. 架构示意 🏗️
4. Terraform 定义云资源 🛠️
4.1 Provider 与 Backend
# required_providers.tf
terraform {
required_version = ">= 1.4"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
}
backend "azurerm" {
resource_group_name = "tfstate-rg"
storage_account_name = "tfstateacct"
container_name = "tfstate"
key = "${terraform.workspace}.tfstate"
}
}
4.2 公共变量与 Tag
# variables.tf
variable "location" { type = string }
variable "environment" { type = string }
variable "owner" { type = string }
variable "cost_center" { type = string }
variable "db_admin" { type = string }
variable "db_password" { type = string }
locals {
common_tags = {
environment = var.environment
owner = var.owner
cost_center = var.cost_center
}
}
4.3 Resource Group 模块
# modules/resource_group/main.tf
resource "azurerm_resource_group" "this" {
name = var.name
location = var.location
tags = var.tags
}
output "rg_name" { value = azurerm_resource_group.this.name }
# modules/resource_group/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "tags" { type = map(string) }
4.4 VNet 模块
# modules/vnet/main.tf
resource "azurerm_virtual_network" "this" {
name = var.name
address_space = var.address_space
location = var.location
resource_group_name = var.rg_name
tags = var.tags
}
resource "azurerm_subnet" "this" {
for_each = var.subnets
name = each.key
resource_group_name = var.rg_name
virtual_network_name= azurerm_virtual_network.this.name
address_prefixes = [each.value]
}
output "subnet_ids_map" {
value = { for s in azurerm_subnet.this : s.name => s.id }
}
# modules/vnet/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "rg_name" { type = string }
variable "address_space" { type = list(string) }
variable "subnets" { type = map(string) }
variable "tags" { type = map(string) }
4.5 Key Vault 模块
# modules/keyvault/main.tf
data "azurerm_client_config" "current" {}
resource "azurerm_key_vault" "this" {
name = var.name
location = var.location
resource_group_name = var.rg_name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
purge_protection_enabled = true
soft_delete_enabled = true
tags = var.tags
}
resource "azurerm_key_vault_secret" "db_password" {
name = "db-password"
value = var.admin_password
key_vault_id = azurerm_key_vault.this.id
}
# modules/keyvault/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "rg_name" { type = string }
variable "admin_password" { type = string }
variable "tags" { type = map(string) }
4.6 AKS 模块
# modules/aks/main.tf
resource "azurerm_kubernetes_cluster" "this" {
name = var.name
location = var.location
resource_group_name = var.rg_name
dns_prefix = var.dns_prefix
tags = var.tags
default_node_pool {
name = "agentpool"
vm_size = var.node_size
availability_zones = var.availability_zones
enable_auto_scaling = var.enable_auto_scaler
min_count = var.min_count
max_count = var.max_count
os_disk_size_gb = 50
}
network_profile {
network_plugin = "azure"
network_policy = "calico"
load_balancer_sku = "standard"
subnet_id = var.aks_subnet_id
}
identity { type = "SystemAssigned" }
addon_profile { oms_agent { enabled = true } } # Azure Monitor
}
output "kube_config" {
value = azurerm_kubernetes_cluster.this.kube_admin_config_raw
sensitive = true
}
# modules/aks/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "rg_name" { type = string }
variable "dns_prefix" { type = string }
variable "aks_subnet_id" { type = string }
variable "node_size" { type = string }
variable "availability_zones"{ type = list(string), default = ["1","2","3"] }
variable "enable_auto_scaler" { type = bool, default = true }
variable "min_count" { type = number, default = 2 }
variable "max_count" { type = number, default = 5 }
variable "tags" { type = map(string) }
4.7 RDS 模块
# modules/rds/main.tf
data "azurerm_key_vault_secret" "db_pwd" {
name = "db-password"
key_vault_id = var.keyvault_id
}
resource "azurerm_postgresql_flexible_server" "this" {
name = var.name
location = var.location
resource_group_name = var.rg_name
version = var.pg_version
sku_name = var.sku_name
storage_mb = var.storage_mb
delegated_subnet_id = var.subnet_id
administrator_login = var.admin_user
administrator_login_password = data.azurerm_key_vault_secret.db_pwd.value
tags = var.tags
}
output "connection_string" {
value = format(
"Host=%s;Port=5432;Username=%s;Password=%s;Database=%s",
azurerm_postgresql_flexible_server.this.fqdn,
var.admin_user,
data.azurerm_key_vault_secret.db_pwd.value,
var.db_name
)
sensitive = true
}
# modules/rds/variables.tf
variable "name" { type = string }
variable "location" { type = string }
variable "rg_name" { type = string }
variable "pg_version" { type = string }
variable "sku_name" { type = string }
variable "storage_mb" { type = number }
variable "admin_user" { type = string }
variable "subnet_id" { type = string }
variable "db_name" { type = string }
variable "keyvault_id" { type = string }
variable "tags" { type = map(string) }
4.8 根模块调用
# main.tf
provider "azurerm" { features {} }
module "rg" {
source = "./modules/resource_group"
name = "${var.environment}-rg"
location = var.location
tags = local.common_tags
}
module "vnet" {
source = "./modules/vnet"
name = "${var.environment}-vnet"
location = var.location
rg_name = module.rg.rg_name
address_space = ["10.0.0.0/16"]
subnets = { aks = "10.0.1.0/24", db = "10.0.2.0/24" }
tags = local.common_tags
}
module "keyvault" {
source = "./modules/keyvault"
name = "${var.environment}-kv"
location = var.location
rg_name = module.rg.rg_name
admin_password = var.db_password
tags = local.common_tags
}
module "aks" {
source = "./modules/aks"
name = "${var.environment}-aks"
location = var.location
rg_name = module.rg.rg_name
dns_prefix = var.environment
aks_subnet_id = module.vnet.subnet_ids_map["aks"]
node_size = "Standard_DS2_v2"
availability_zones= ["1","2","3"]
enable_auto_scaler= true
min_count = 2
max_count = 5
tags = local.common_tags
}
module "rds" {
source = "./modules/rds"
name = "${var.environment}-pg"
location = var.location
rg_name = module.rg.rg_name
pg_version = "13"
sku_name = "GP_Gen5_2"
storage_mb = 5120
admin_user = var.db_admin
subnet_id = module.vnet.subnet_ids_map["db"]
keyvault_id = module.keyvault.azurerm_key_vault.this.id
db_name = "abpdb"
tags = local.common_tags
}
# outputs.tf
output "kubeconfig" { value = module.aks.kube_config sensitive = true }
output "db_conn_string" { value = module.rds.connection_string sensitive = true }
5. Helm Chart 打包与校验 📦
5.1 Chart.yaml & Chart.lock
# Chart.yaml
apiVersion: v2
name: abp-vnext
version: 0.4.0
appVersion: "1.0.0"
description: "ABP VNext 多服务 Kubernetes Umbrella Chart"
dependencies:
- name: gateway
version: "0.2.0"
repository: file://charts/gateway
- name: identity
version: "0.2.0"
repository: file://charts/identity
helm dependency update infra/terraform/helm-charts/abp-vnext
helm dependency build infra/terraform/helm-charts/abp-vnext
5.2 values.schema.json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"replicaCount": { "type": "integer" },
"image": {
"type": "object",
"properties": {
"repository": { "type": "string" },
"tag": { "type": "string" }
},
"required": ["repository","tag"]
},
"service": {
"type": "object",
"properties": {
"type": { "type": "string" },
"port": { "type": "integer" }
},
"required": ["type","port"]
}
},
"required": ["replicaCount","image","service"]
}
5.3 Subchart 示例(gateway)
# charts/gateway/values.yaml
replicaCount: 2
image:
repository: myacr.azurecr.io/abp-gateway
tag: "1.0.0"
service:
type: ClusterIP
port: 80
resources:
limits:
cpu: "500m"
memory: "512Mi"
requests:
cpu: "250m"
memory: "256Mi"
# charts/gateway/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "gateway.fullname" . }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ include "gateway.name" . }}
template:
metadata:
labels:
app: {{ include "gateway.name" . }}
spec:
containers:
- name: gateway
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /health/ready
port: 80
initialDelaySeconds: 20
periodSeconds: 10
livenessProbe:
httpGet:
path: /health/live
port: 80
initialDelaySeconds: 30
periodSeconds: 15
resources:
{{ toYaml .Values.resources | indent 12 }}
其余 Service、Ingress、PDB、NetworkPolicy、HPA 与前文一致。
5.4 Lint & Test
helm lint infra/terraform/helm-charts/abp-vnext --strict
helm template abp-vnext infra/terraform/helm-charts/abp-vnext
helm package infra/terraform/helm-charts/abp-vnext -d charts-packages
6. CI/CD 流水线 🔄
6.1 Infra Workflow
# .github/workflows/infra.yml
name: Infra – Terraform
on:
push:
paths: ["infra/terraform/**"]
concurrency:
group: infra-${{ github.ref }}
cancel-in-progress: true
permissions:
id-token: write
contents: read
jobs:
terraform:
runs-on: ubuntu-latest
outputs:
kubeconfig: ${{ steps.apply.outputs.kubeconfig }}
db_conn_string: ${{ steps.apply.outputs.db_conn_string }}
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Fmt Check
run: terraform fmt -check
working-directory: infra/terraform
- name: Terraform Validate
run: terraform validate
working-directory: infra/terraform
- name: Terraform Init
run: terraform init -input=false
working-directory: infra/terraform
- name: Checkov Scan
uses: bridgecrewio/checkov-action@master
with:
directory: infra/terraform
- name: Infracost Estimate
uses: infracost/actions@v2
with:
path: infra/terraform
env:
INFRACOST_TOKEN: ${{ secrets.INFRACOST_TOKEN }}
- name: Terraform Plan
id: plan
run: terraform plan -out=tfplan
working-directory: infra/terraform
- name: Terraform Apply
id: apply
run: |
terraform workspace select ${{ github.ref_name }} || terraform workspace new ${{ github.ref_name }}
terraform apply -auto-approve tfplan
echo "kubeconfig<<EOF" >> $GITHUB_OUTPUT
terraform output -raw kubeconfig >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
echo "db_conn_string<<EOF" >> $GITHUB_OUTPUT
terraform output -raw db_conn_string >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
working-directory: infra/terraform
env:
ARM_USE_MSI: true
ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
6.2 Deploy Workflow
# .github/workflows/deploy.yml
name: Deploy – Helm
on:
push:
paths:
- "src/**"
- "infra/terraform/helm-charts/**"
concurrency:
group: deploy-${{ github.ref }}
cancel-in-progress: true
permissions:
id-token: write
contents: read
environment:
name: production
url: https://abp.example.com
reviewers:
- alice
- bob
jobs:
deploy:
runs-on: ubuntu-latest
needs: terraform
steps:
- uses: actions/checkout@v3
- name: Azure Login via OIDC
uses: azure/login@v1
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: AKS Set Context
uses: azure/aks-set-context@v1
with:
resource-group: production-rg
cluster-name: production-aks
- name: Docker Login to ACR
uses: docker/login-action@v2
with:
registry: myacr.azurecr.io
username: ${{ secrets.ACR_USERNAME }}
password: ${{ secrets.ACR_PASSWORD }}
- name: Build & Push Image
run: |
docker build -t myacr.azurecr.io/abp-vnext:${{ github.sha }} src/
docker push myacr.azurecr.io/abp-vnext:${{ github.sha }}
- name: Set DEPLOY_ENV
run: |
if [ "${GITHUB_REF}" == "refs/heads/main" ]; then
echo "DEPLOY_ENV=prod" >> $GITHUB_ENV
else
echo "DEPLOY_ENV=dev" >> $GITHUB_ENV
fi
- name: Helm Lint & Test
run: |
helm lint infra/terraform/helm-charts/abp-vnext --strict
helm test --cleanup gateway
- name: Helm Package & Push
run: |
helm package infra/terraform/helm-charts/abp-vnext -d charts-packages
helm push charts-packages/abp-vnext-*.tgz oci://myhelmrepo
- name: Helm Upgrade with Rollback
run: |
set +e
helm repo add myrepo oci://myhelmrepo
helm repo update
helm upgrade --install abp-vnext myrepo/abp-vnext \
--version 0.4.0 \
-f infra/terraform/helm-charts/abp-vnext/values-${DEPLOY_ENV}.yaml \
--set image.tag=${GITHUB_SHA} \
--set-string env.DB_CONN="${{ needs.terraform.outputs.db_conn_string }}" \
--wait --timeout 5m
if [ $? -ne 0 ]; then
helm rollback abp-vnext 1
exit 1
fi
set -e
- name: Verify Rollout
run: kubectl rollout status deployment/gateway
- name: Notify Slack on Success
if: success()
uses: 8398a7/action-slack@v3
with:
payload: '{"text":"✅ 部署成功:ABP VNext 微服务已更新到 '"${{ github.sha }}"'"}'
channel: production-alerts
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
- name: Notify Slack on Failure
if: failure()
uses: 8398a7/action-slack@v3
with:
payload: '{"text":"❌ 部署失败:请检查流水线日志!"}'
channel: production-alerts
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
7. 可观测性与告警 🔔
- Azure Monitor:已开启 Container Insights
- Prometheus/Grafana:可选部署,收集集群与业务指标
- EFK/ELK:通过 DaemonSet 或 Sidecar 收集日志
- Alertmanager:基于阈值触发告警,推送到 Slack/Teams
8. 附录 📂
参考资料: