Fault Injection Problems

Fault Injection Problems

🚫 Fails to Start

Error

Fault injection requests fail to initialize, preventing chaos experiments from running and leaving target services unaffected by intended faults.

Symptoms

  • Injection requests return errors
  • Target pods not affected by faults
  • Chaos resources not created

Diagnosis

# Check chaos resources
kubectl get podchaos -A
kubectl get networkchaos -A
kubectl get stresschaos -A

# Check chaos-mesh operator
kubectl get pods -n chaos-engineering

# Check injection logs
kubectl logs -f -l app=rcabench -n exp | grep injection

Solutions

Click to see the whole script

Step 1: Install Chaos Mesh

# Install chaos-mesh operator
curl -sSL https://mirrors.chaos-mesh.org/v2.6.2/install.sh | bash

# Verify installation
kubectl get pods -n chaos-engineering

Step 2: RBAC Permissions

# Grant chaos permissions to service account
kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: rcabench-chaos
rules:
- apiGroups: ["chaos-mesh.org"]
  resources: ["*"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: rcabench-chaos
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: rcabench-chaos
subjects:
- kind: ServiceAccount
  name: rcabench
  namespace: exp
EOF

Step 3: Target Pod Selection

# Check target pods exist
kubectl get pods -l app=target-service -n production

# Verify label selectors
kubectl get pods --show-labels -n production

# Test chaos resource manually
kubectl apply -f - <<EOF
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: test-chaos
  namespace: exp
spec:
  action: pod-kill
  mode: one
  selector:
    namespaces:
      - production
    labelSelectors:
      app: target-service
  duration: "30s"
EOF