The Ultimate GitLab Backup Script

Why GitLab Backups Matter

GitLab is often the heart of your development workflow. Having reliable backups ensures you can recover your code repositories, issues, and project data in case of unexpected problems or data loss. This guide focuses on a simple, customizable Bash script for automating your GitLab backups.

Prerequisites

GitLab API Access: You'll need a private access token from your GitLab instance to interact with the API. See GitLab documentation for how to generate this token.
git Command: Ensure the git command-line tool is installed on your system. It's used to clone repositories from the GitLab instance.
jq Command: If not already installed, get the jq command-line JSON processor. It helps easily parse data from the GitLab API responses.

The Backup Script

Let's dive into the core Bash script for backing up your GitLab repositories.

#!/bin/bash

# Configuration
GITLAB_URL="https://your-gitlab-instance.com"
PRIVATE_TOKEN="YOUR-PRIVATE-TOKEN"
BACKUP_DIR="/absolute/path/to/your/backup/dir"
GITLAB_USERNAME="gitlab-username"

# Ensure backup directory exists and is empty
rm -rf "$BACKUP_DIR"
mkdir -p "$BACKUP_DIR"


# Improved Error Handling (print error message and exit)
handle_error() {
    echo "ERROR: $1"
    exit 1
}

# Pagination variables
page=1 # Initial page number
per_page=100 # Number of projects per page (maximum is 100)

# Main loop for fetching and backing up projects
while true; do
    echo "Fetching page $page;"

    # Fetch project data using the GitLab API
    API_HEADERS=$(curl -s -L -I -H "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
        "$GITLAB_URL/api/v4/projects?membership=true&per_page=$per_page&page=$page")

    # Error handling
    if [ $? -ne 0 ]; then
        handle_error "Error fetching projects from GitLab api (page $page): curl command failed."
    fi

    # Check if the API response contains a 'rel="next"' link header, increment page number if so.
    if [[ $API_HEADERS == *'rel="next"'* ]]; then
        ((page++))
    else
        unset page
    fi

    API_RESPONSE=$(curl -s -L -H "PRIVATE-TOKEN: $PRIVATE_TOKEN" \
        "$GITLAB_URL/api/v4/projects?membership=true&per_page=$per_page&page=$page")

    # Extract the body containing the project data
    project_data=$(echo "$API_RESPONSE") # Adjust for potential extra headers

    # Store paginated response directly
    echo "$project_data" > "$BACKUP_DIR/projects_page_$page_$TIMESTAMP.json"

    # Iterate through projects within the page and clone repositories
    jq -c '.[]' "$BACKUP_DIR/projects_page_$page_$TIMESTAMP.json" | while read -r project; do

        PROJECT_ID=$(echo "$project" | jq -r '.id')
        PROJECT_NAME=$(echo "$project" | jq -r '.name')
        CLONE_URL=$(echo "$project" | jq -r '.http_url_to_repo')
        PATH_WITH_NAMESPACE=$(echo "$project" | jq -r '.path_with_namespace')

        # Check if values are extracted correctly
        if [ -z "$CLONE_URL" ]; then
            echo "Error: Empty clone URL for project $PROJECT_NAME"
            continue # Skip to the next project
        fi

        # Construct authenticated URL
        AUTH_CLONE_URL=$(echo "$CLONE_URL" | sed "s|https://|https://$GITLAB_USERNAME:$PRIVATE_TOKEN@|")

        # Derive repository directory name.
        REPO_DIR=$(echo "$PATH_WITH_NAMESPACE")

        echo "Backing up project (mirror): $PROJECT_NAME ($PROJECT_ID)"
        git clone --mirror "$AUTH_CLONE_URL" "$BACKUP_DIR/$REPO_DIR"

    done

    if [ -z $page ]; then
        echo "All projects backed up."
        break # Exit the loop
    fi

done
exit 0

Want to trust your GitLab backups are running well without hastle?

Try SimpleBackups Now →

How to restore your GitLab Backup

To restore a repository from a backup, you can use the git clone command to create a new repository from the backup. For example:

git clone /path/to/backup/repo.git /path/to/new/repo.git

Works as usual, you can check the repository and push it to your GitLab instance.

Customizing the Script

The script is designed to be easily customizable. You can modify the GITLAB_URL, PRIVATE_TOKEN, BACKUP_DIR, and GITLAB_USERNAME variables to suit your environment. You can also adjust the per_page variable to control the number of projects fetched per API request.

Automating the Backup

To automate the backup process, you can use a cron job to run the script at regular intervals. For example, to run the script every day at 3 AM, you can add the following line to your crontab:

0 3 * * * /path/to/backup_script.sh

Store your GitLab backup on S3

You can also sync the backup directory to cloud storage services like Amazon S3, Google Cloud Storage, or Azure Blob Storage. This ensures your backups are stored offsite and protected from local data loss.

To Setup a sync with Amazon S3, you can use the aws s3 sync command to sync the backup directory with an S3 bucket. Here's an example script to sync the backup directory with an S3 bucket:

#!/bin/bash
BACKUP_DIR="/absolute/path/to/your/backup/dir"
S3_BUCKET="your-s3-bucket-name"
AWS_CLI_PATH="/usr/local/bin/aws"

$AWS_CLI_PATH s3 sync $BACKUP_DIR s3://$S3_BUCKET
# (optional) print the sync status
echo "Backup sync to S3 completed with status: $?"
exit 0

Conclusion

With this script, you can easily automate your GitLab backups and ensure your repositories are always protected. You can customize the script to fit your specific needs and automate the backup process to run at regular intervals. This way, you can have peace of mind knowing your GitLab data is always safe and recoverable.