Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongodb StatefulSet on kubernetes is not working anymore after kubernetes update

I have updated my AKS Azure Kubernetes cluster to version 1.11.5, in this cluster a MongoDB Statefulset is running:

The statefulset is created with this file:

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: default-view
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default
---
apiVersion: v1
kind: Service
metadata:
  name: mongo
  labels:
    name: mongo
spec:
  ports:
  - port: 27017
    targetPort: 27017
  clusterIP: None
  selector:
    role: mongo
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: mongo
spec:
  serviceName: "mongo"
  replicas: 2
  template:
    metadata:
      labels:
        role: mongo
        environment: test
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: mongo
          image: mongo
          command:
            - mongod
            - "--replSet"
            - rs0
            - "--bind_ip"
            - 0.0.0.0            
            - "--smallfiles"
            - "--noprealloc"
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: mongo-persistent-storage
              mountPath: /data/db
        - name: mongo-sidecar
          image: cvallance/mongo-k8s-sidecar
          env:
            - name: MONGO_SIDECAR_POD_LABELS
              value: "role=mongo,environment=test"
  volumeClaimTemplates:
  - metadata:
      name: mongo-persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: "managed-premium"
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 32Gi

after the mentioned update of the cluster to the new k8s version I get this error:

mongo-0                        1/2     CrashLoopBackOff   6          9m
mongo-1                        2/2     Running            0          1h

the detailed log from the pod is the following:

2018-12-18T14:28:44.281+0000 W STORAGE  [initandlisten] Detected configuration for non-active storage engine mmapv1 when current storage engine is wiredTiger
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten]
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten]
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten]
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-12-18T14:28:44.281+0000 I CONTROL  [initandlisten]
2018-12-18T14:28:44.477+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2018-12-18T14:28:44.478+0000 I REPL     [initandlisten] Rollback ID is 7
2018-12-18T14:28:44.479+0000 I REPL     [initandlisten] Recovering from stable timestamp: Timestamp(1545077719, 1) (top of oplog: { ts: Timestamp(1545077349, 1), t: 5 }, appliedThrough: { ts: Timestamp(1545077719, 1), t: 6 }, TruncateAfter: Timestamp(0, 0))
2018-12-18T14:28:44.480+0000 I REPL     [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1545077719, 1)
2018-12-18T14:28:44.480+0000 F REPL     [initandlisten] Applied op { : Timestamp(1545077719, 1) } not found. Top of oplog is { : Timestamp(1545077349, 1) }.
2018-12-18T14:28:44.480+0000 F -        [initandlisten] Fatal Assertion 40313 at src/mongo/db/repl/replication_recovery.cpp 361
2018-12-18T14:28:44.480+0000 F -        [initandlisten]

***aborting after fassert() failure

it seems the two instances went out of sync and are not able to recover. Can someone help?

like image 976
aumanjoa Avatar asked Dec 18 '18 14:12

aumanjoa


1 Answers

I have workaround this issue:

  1. adding a MongoDB container to the cluster to dump and restore the MongoDB data
  2. dumping the current database
  3. deleting the MongoDB instance
  4. recreating a new MongoDB instance
  5. restoring the data to the new instance

yes unfortunately this comes with a downtime

like image 82
aumanjoa Avatar answered Sep 21 '22 17:09

aumanjoa