提示工程 | | 约 33 分钟 | 12,930 字

DevOps 提示词:CI/CD、Docker、K8s

用 AI 辅助编写 Dockerfile、GitHub Actions、K8s 配置的提示词

DevOps 场景下的 AI 辅助

DevOps 工作中有大量的配置文件编写:Dockerfile、CI/CD Pipeline、K8s Manifest、Terraform 配置……这些文件格式复杂、细节多、容易出错。

AI 在这个场景下特别有用:

  1. 它熟悉各种配置文件的语法和最佳实践
  2. 它能快速生成符合规范的配置
  3. 它能帮你排查配置错误
  4. 它能解释复杂的配置选项

提示词 1:Dockerfile 生成

请为以下应用生成生产级 Dockerfile。

应用信息:
- 语言/框架:[语言和框架]
- 构建命令:[如 npm run build]
- 启动命令:[如 node dist/main.js]
- 端口:[端口号]
- 环境变量:[列表]

要求:
1. 使用多阶段构建(Multi-stage Build)
2. 最小化镜像大小
3. 使用非 root 用户运行
4. 合理利用缓存层
5. 包含健康检查
6. 添加注释说明每一步

输出示例:Node.js 应用

# ============================================
# 阶段 1:安装依赖
# ============================================
FROM node:20-alpine AS deps

WORKDIR /app

# 只复制依赖文件,利用 Docker 缓存
COPY package.json package-lock.json ./

# 只安装生产依赖
RUN npm ci --only=production && \
    # 清理 npm 缓存减小镜像
    npm cache clean --force

# ============================================
# 阶段 2:构建
# ============================================
FROM node:20-alpine AS builder

WORKDIR /app

COPY package.json package-lock.json ./
RUN npm ci

# 复制源代码并构建
COPY . .
RUN npm run build

# ============================================
# 阶段 3:生产镜像
# ============================================
FROM node:20-alpine AS runner

WORKDIR /app

# 安全:使用非 root 用户
RUN addgroup --system --gid 1001 appgroup && \
    adduser --system --uid 1001 appuser

# 从 deps 阶段复制生产依赖
COPY --from=deps /app/node_modules ./node_modules

# 从 builder 阶段复制构建产物
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./

# 切换到非 root 用户
USER appuser

# 暴露端口
EXPOSE 3000

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# 启动应用
CMD ["node", "dist/main.js"]

Python 应用的 Dockerfile

# ============================================
# 阶段 1:构建
# ============================================
FROM python:3.12-slim AS builder

WORKDIR /app

# 安装构建依赖
RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc && \
    rm -rf /var/lib/apt/lists/*

# 安装 Python 依赖
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# ============================================
# 阶段 2:生产镜像
# ============================================
FROM python:3.12-slim AS runner

WORKDIR /app

# 从 builder 复制已安装的依赖
COPY --from=builder /install /usr/local

# 复制应用代码
COPY . .

# 安全:非 root 用户
RUN useradd --create-home appuser
USER appuser

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=3s \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

提示词 2:Docker Compose 生成

请为以下微服务架构生成 docker-compose.yml。

服务列表:
[服务名、端口、依赖关系]

基础设施:
[数据库、缓存、消息队列等]

要求:
1. 服务间的依赖关系和启动顺序
2. 网络隔离
3. 数据持久化(volumes)
4. 环境变量管理
5. 健康检查
6. 开发环境和生产环境的差异处理

输出示例

version: '3.8'

services:
  # ============ 基础设施 ============
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: ${DB_USER:-postgres}
      POSTGRES_PASSWORD: ${DB_PASSWORD:-postgres}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - backend

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - backend

  # ============ 应用服务 ============
  api:
    build:
      context: ./api
      dockerfile: Dockerfile
    environment:
      DATABASE_URL: postgresql://${DB_USER:-postgres}:${DB_PASSWORD:-postgres}@postgres:5432/myapp
      REDIS_URL: redis://redis:6379
      NODE_ENV: production
    ports:
      - "3000:3000"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
    networks:
      - backend
      - frontend
    restart: unless-stopped

  web:
    build:
      context: ./web
      dockerfile: Dockerfile
    ports:
      - "80:80"
    depends_on:
      api:
        condition: service_healthy
    networks:
      - frontend
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

networks:
  frontend:
  backend:

提示词 3:GitHub Actions 工作流

请为以下项目生成 GitHub Actions CI/CD 工作流。

项目信息:
- 语言/框架:[描述]
- 测试框架:[描述]
- 部署目标:[描述]
- 分支策略:[描述]

工作流要求:
1. PR 触发:lint + test + build
2. 合并到 main:自动部署到 staging
3. 打 tag:自动部署到 production
4. 包含缓存优化
5. 包含并行执行
6. 包含 Slack/钉钉通知

输出示例

name: CI/CD Pipeline

on:
  pull_request:
    branches: [main, develop]
  push:
    branches: [main]
    tags: ['v*']

env:
  NODE_VERSION: '20'
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # ============ 代码质量检查 ============
  lint:
    name: Lint & Type Check
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'

      - run: npm ci
      - run: npm run lint
      - run: npm run type-check

  # ============ 测试 ============
  test:
    name: Test
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16-alpine
        env:
          POSTGRES_DB: test
          POSTGRES_PASSWORD: test
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'

      - run: npm ci
      - run: npm test -- --coverage
        env:
          DATABASE_URL: postgresql://postgres:test@localhost:5432/test

      - uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/

  # ============ 构建 Docker 镜像 ============
  build:
    name: Build Docker Image
    needs: [lint, test]
    runs-on: ubuntu-latest
    if: github.event_name == 'push'
    permissions:
      contents: read
      packages: write

    steps:
      - uses: actions/checkout@v4

      - uses: docker/setup-buildx-action@v3

      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - uses: docker/metadata-action@v5
        id: meta
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=semver,pattern={{version}}
            type=sha

      - uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # ============ 部署到 Staging ============
  deploy-staging:
    name: Deploy to Staging
    needs: [build]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: staging

    steps:
      - uses: actions/checkout@v4

      - name: Deploy to Staging
        run: |
          echo "Deploying to staging..."
          # kubectl set image deployment/app app=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${GITHUB_SHA::7}

  # ============ 部署到 Production ============
  deploy-production:
    name: Deploy to Production
    needs: [build]
    runs-on: ubuntu-latest
    if: startsWith(github.ref, 'refs/tags/v')
    environment: production

    steps:
      - uses: actions/checkout@v4

      - name: Deploy to Production
        run: |
          echo "Deploying to production..."
          # kubectl set image deployment/app app=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.ref_name }}

提示词 4:Kubernetes Manifest

请为以下服务生成 Kubernetes 部署配置。

服务信息:
- 服务名:[名称]
- 镜像:[镜像地址]
- 端口:[端口]
- 资源需求:[CPU/内存]
- 副本数:[数量]
- 环境变量:[列表]

要求:
1. Deployment + Service + Ingress
2. 资源限制(requests/limits)
3. 健康检查(liveness/readiness)
4. 滚动更新策略
5. ConfigMap 和 Secret 管理
6. HPA(自动扩缩容)

输出示例

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: api-config
  namespace: production
data:
  NODE_ENV: "production"
  LOG_LEVEL: "info"
  REDIS_HOST: "redis-master"

---
# secret.yaml(实际使用时通过 sealed-secrets 或外部密钥管理)
apiVersion: v1
kind: Secret
metadata:
  name: api-secrets
  namespace: production
type: Opaque
stringData:
  DATABASE_URL: "postgresql://user:pass@postgres:5432/myapp"
  JWT_SECRET: "your-jwt-secret-here"

---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: production
  labels:
    app: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: ghcr.io/myorg/api:v1.0.0
          ports:
            - containerPort: 3000
          envFrom:
            - configMapRef:
                name: api-config
            - secretRef:
                name: api-secrets
          resources:
            requests:
              cpu: 250m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 30
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10

---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: api
  namespace: production
spec:
  selector:
    app: api
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP

---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

提示词 5:Terraform/IaC

请用 Terraform 编写以下基础设施配置。

基础设施需求:
[描述需要的云资源]

云平台:[AWS/GCP/Azure]

要求:
1. 使用模块化结构
2. 使用变量和输出
3. 状态管理配置
4. 安全最佳实践(最小权限、加密)
5. 添加注释说明

提示词 6:故障排查

请帮我排查以下 DevOps 问题。

问题描述:
[描述问题现象]

环境信息:
- 平台:[K8s/Docker/VM]
- 相关日志:

[日志内容]


已尝试的排查步骤:
[已经做了什么]

请:
1. 分析可能的原因(按可能性排序)
2. 给出排查命令
3. 给出修复方案

常见问题排查模板

问题:Pod 一直处于 CrashLoopBackOff

排查步骤:
1. kubectl describe pod [pod-name] -n [namespace]
2. kubectl logs [pod-name] -n [namespace] --previous
3. 检查资源限制是否太低
4. 检查健康检查配置是否合理
5. 检查环境变量和 Secret 是否正确

提示词 7:监控和告警

请为以下服务设计监控方案。

服务架构:
[描述服务架构]

要求:
1. Prometheus 指标设计
2. Grafana Dashboard JSON
3. 告警规则(PrometheusRule)
4. 告警分级和通知渠道
5. SLO/SLI 定义

SLO 设计示例

# SLO 定义
slos:
  - name: API 可用性
    target: 99.9%  # 每月允许 43 分钟不可用
    indicator:
      type: availability
      metric: |
        sum(rate(http_requests_total{status!~"5.."}[5m]))
        /
        sum(rate(http_requests_total[5m]))

  - name: API 延迟
    target: 95%  # 95% 的请求在 200ms 内完成
    indicator:
      type: latency
      metric: |
        histogram_quantile(0.95,
          sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
        ) < 0.2

# 告警规则
alerts:
  - name: HighErrorRate
    severity: critical
    condition: error_rate > 1% for 5m
    action: 立即通知 On-Call 工程师

  - name: HighLatency
    severity: warning
    condition: p95_latency > 500ms for 10m
    action: 通知团队 Slack 频道

DevOps 配置审查提示词

请审查以下 DevOps 配置文件,指出问题和改进建议。

配置文件:
```yaml
[配置内容]

请检查:

  1. 安全问题(硬编码密钥、过大权限)
  2. 最佳实践(镜像标签、资源限制)
  3. 可靠性(健康检查、重启策略)
  4. 可维护性(注释、命名规范)
  5. 成本优化(资源过度分配)

---

## 总结

| 场景 | 提示词 | 关键要点 |
|------|--------|---------|
| Dockerfile | 提示词 1 | 多阶段构建、非 root、缓存优化 |
| Docker Compose | 提示词 2 | 依赖顺序、健康检查、网络隔离 |
| GitHub Actions | 提示词 3 | 缓存、并行、环境分离 |
| K8s | 提示词 4 | 资源限制、HPA、滚动更新 |
| Terraform | 提示词 5 | 模块化、状态管理、最小权限 |
| 故障排查 | 提示词 6 | 日志分析、系统性排查 |
| 监控 | 提示词 7 | SLO/SLI、告警分级 |

DevOps 配置文件的特点是:格式严格、细节多、一个小错误就可能导致大问题。AI 能帮你快速生成符合最佳实践的配置,但一定要在测试环境验证后再上生产。

> 基础设施即代码的精髓不是"用代码管理基础设施",而是"像对待代码一样对待基础设施"——版本控制、代码审查、自动测试,一个都不能少。

评论

加载中...

相关文章

分享:

评论

加载中...