Backing Up the Platform
Overview
The Tool used is named Bakpak
(invoked as bkp
).
It collects V3IO data, platform configuration (e.g. users, projects, services), MLRun DB, Pipelines DB, and Kubernetes entities.
Prerequisites
- An available storage device. Cloud services (such as EFS for AWS, Filestore for GCP, etc.) are preferred.
- A running MLOps Platform.
Standard Setup Example
- Provision storage and allow access (Network, Security groups, Access points, etc.) to your K8S Application nodes.
Storage must support K8SReadWriteMany
access mode, thus, for example,AWS EBS
is not recommended.
Preferred services:- On AWS - EFS, FSx
- On GCP - Filestore
- On Azure - Azure Files
- In the platform dashboard, activate the backup user (
sys
by default) as described here. - In your K8S Application Cluster:
-
Create the backup namespace (
iguazio-backup
by default). -
In that namespace, create a secret (
api-credentials-system
by default):USERNAME: "sys" PASSWORD: "PASSWORD"
-
In that namespace, create a configmap
user-settings
with the minimum required settings:SYSTEM_USER_SECRET: "api-credentials-system" # The secret name you created above SCHEDULE: "0 0 * * *" # Backup schedule. Default is daily at midnight RETENTION_PERIOD: "3 days" # How long to keep each backup instance. Default is 3 days STORAGE_PROVIDER: "some-provider" # aws / gcp / azure ... CSI_DRIVER: "some-driver" # efs.csi.aws.com / filestore.csi.storage.gke.io ... # Must be mapped in kubernetes-csi.github.io/docs/drivers.html # Must be supported by your k8s setup REGION: "some-region" # us-east-1 / eu-north-1 ... # Region of your storage device if applicable VOLUME_HANDLE: "fs-1234567890" # Your filesystem id
-
[Optional] Browse Bakpak manifest and modify the configmap to customize as needed.
For example, if you set your own PVC (in the backup namespace) with an attached NFS and want to use it instead, add:SKIP_PERSISTENT_VOLUME_CREATION: "True" PERSISTENT_VOLUME_CLAIM_NAME: "YOUR_PVC_NAME"
-
- Contact Iguazio support so they can configure an initial (one-time) setup. Then the
ClusterBackup
events are visible in the UI Events page ().
Bakpak Manifest
apiVersion: 0.1.0
kind: bakpakManifest
metadata:
name: "template-bakpak-manifest"
spec:
description: "CLI for High Level Operations on Iguazio tools like Manof, Gibby etc"
apiCredentials: # On Restore, credentials may differ between source and target
system:
username: "USERNAME"
password: "PASSWORD"
#########################################################################################################
# backupRootFolder: "/full/path/to/backup/root" # MANDATORY! Please validate r+w permission
#########################################################################################################
# kubeconfig: "~/.kube/config"
# nuctlPath: "/home/iguazio/IGZ_VERSION/platform/"
# staticServeHelmChartsPath: "/home/iguazio/IGZ_VERSION/platform/static_serve/helm/v3io-stable"
# igzPlatformPathsRethinkdbDataMount: '/mnt/platform/rethinkdb'
# gibctlPath: "/full/path/to/gibby/executable" # override bundle with absolute path to run gibby as standalone option
# gibbyImage: "gcr.io/iguazio/gibby:0.8.31" # image to run gibby as a kubernetes job
# gibbyJobMountPath: "/full/path/to/mount" # auto-creation permissions vary. For example on EFS it's enabled by
# default, but on dedicated bare-metal nodes not. Please validate r+w
# gibbyRestartPolicy: "OnFailure" # kubernetes job restart policy
# namespace: "gibby-backup"
# gibbyTimeout: "23 hours" # Kill the data backup process on timeout
# skipPersistentVolumeCreation: None # Set to True when using a dynamic volume provisioner
# persistentVolumeClaimDesiredState: "Bound" # adjust according to your dynamic volume provisioner
# persistentVolumeName: started-TIMESTAMP-gibby # manual override for persistent volume
# persistentVolumeClaimName: ^^^^^^^^^^^^^^^^^^ # manual override for persistent volume claim
# accessModes:
# - "ReadWriteMany" # For storage devices able to bind to many hosts, such as EFS
# - "ReadWriteOnce" # For storage devices able to only bind to one at a time, such as EBS
# reclaimPolicy: "Delete" # persistent volume reclaim policy for kubernetes Gibby job
# storageClass: "gibby-backup" # doesn't exist by default, please roll your own
# nodeNames: # Specify app nodes for Gibby job, preferably worker nodes
# - "k8s-node1"
# dataNodeIp: "127.0.0.1" # specify if running from outside the data node
# igzVersion: "/home/iguazio/igz/version.txt)" # iguazio system version. Set to literal value to amend
# nuctlDefaultServiceType: "NodePort" # default service type for nuctl commands
# linkLatestPrefix: "latest-" # latest backup instance symlink for a given preset
# preset: "default" # which components to backup. Run `bkp backup presets` for more info
backupSpec:
# mode: "dry-run" # Run mode. "normal" will execute
# sendEvents: True # Send events to the platform
# components: # Or specify the components and execution order yourself
# - "rethinkdb"
# - "nuclio"
# - "mlrundb"
# - ...
# cacheUpperBound: "100GiB" # Estimated size of system cache
# standalone: False # Run Gibby as a standalone executable, rather than as a k8s job
# retentionPeriod: "3 days" # Retention policy for backup instances
# rotateOnBackup: False # Rotate according to retentionPeriod above
# archive: False # Archive backup to a tar.gz file
# compressLevel: 5 # 1 is fastest, 9 is best compression
# deleteSource: True # Delete backup instance after archiving
# instanceFolderPrefix: "started-" # Prefix for backup instance folder, archive and rotation
# diskRequiredMultiplier: 2 # Multiplier for backup disk requirements (for compression overhead)
# linkLatest: True # Create a symlink to the latest backup instance
# gibbyCommands:
# - "create"
# - "snapshot"
# gibbyOptions:
# --logger-no-color: "" # binary flags pass as dict keys with empty values
# --backup-name: "gibby-backup"
# --data-plane-url: "https://webapi.default-tenant.app.your-system.iguazeng.com"
# --control-plane-url: "https://dashboard.default-tenant.app.your-system.iguazeng.com"
# --data-access-key: "DATA_SESSION_ID"
# --control-access-key: "CONTROL_SESSION_ID"
# --path: "FROM_BACKUP_FOLDER"
# --logger-file-path: "FROM_BACKUP_FOLDER/gibby.log"
# --log-level: "FROM_CLI_FLAG"
# --backup-config-spec: "INLINE-JSON-CONFIG"
# checkAppServices: True # If True, will check for overall app services state == ready
# kubeconfig: "~/.kube/config" # Hardcoded in Python k8s lib. Set the KUBECONFIG os environment variable to override
# nuctlPath: "/home/iguazio/IGZ_VERSION/platform/"
# staticServeHelmChartsPath: "/home/iguazio/IGZ_VERSION/platform/static_serve/helm/v3io-stable"
# gibctlPath: "/full/path/to/gibby/executable" # override bundle with absolute path to run gibby as standalone option
# gibbyImage: "gcr.io/iguazio/gibby:0.8.31" # image to run gibby as a kubernetes job
preamble:
banner: |-
echo "Welcome to Bakpak"
date
pwd
# customPreamble1: write your own preamble
# customPreamble2: beware of YAML compatability with your shell script syntax
# postamble:
# customPostamble1: echo "Thank you for using Bakpak"
# customPostamble1: will run after the main components
description: "CLI for High Level Operations on Iguazio tools like Manof, Gibby etc"