The Ultimate Developer's Guide to GitHub Backups

SimpleBackups founder

Laurent Lemaire

Co-founder, SimpleBackups

August 6, 2024

As developers, our code is our most valuable asset. While GitHub provides a robust and reliable platform for version control and collaboration, it's crucial to have a backup strategy in place. This guide will walk you through the process of backing up not just your repositories, but also the valuable metadata associated with your projects on GitHub.

Why You Need GitHub Backups

Despite GitHub's reliability, there are several reasons why maintaining your own backups is essential:

  1. Accidental Deletion: Human error can lead to accidental deletions of repositories or branches.
  2. Repository Corruption: Though rare, data corruption can occur.
  3. Service Downtime: GitHub could experience outages that temporarily limit access to your code.
  4. Compliance and Auditing ❗️: Certain industries and projects require regular backups for compliance purposes.

Understanding GitHub Data

Before diving into backup strategies, let's break down the types of data stored on GitHub:

Repositories

Repositories contain your source code, commit history, branches, and tags. This is the core of your project and the most critical data to back up.

Metadata

GitHub stores various types of metadata associated with your repositories:

  • Issues and Pull Requests
  • Wiki pages
  • Project boards
  • Releases
  • Actions workflows
  • Packages
  • Discussions

Backing Up GitHub Repositories


Using Git Clone

The simplest way to back up a repository is by using the git clone command. This creates a local copy of your repository, including all branches and commit history.

# Clone a repository
git clone --mirror https://github.com/username/repository.git

# Navigate into the repository
cd repository.git

# Add a new remote for backup
git remote add backup https://backupserver.com/username/repository.git

# Push all branches and tags to the backup remote
git push --mirror backup

The --mirror flag ensures that all references are copied, including branches and tags.

Using multiple git remotes

Setting Up Multiple Push URLs for a Single Remote

Instead of creating multiple remotes, you can configure a single remote (typically origin) to push to multiple URLs. This method is particularly useful when you want to maintain a primary remote while ensuring backups are pushed simultaneously.

To add multiple push URLs to your origin remote:

git remote set-url --add --push origin https://primary-repo.com/user/repo.git
git remote set-url --add --push origin https://backup-repo.com/user/repo.git

These commands configure your origin remote to push to both the primary repository and the backup repository simultaneously.

To view your remote configuration:

git remote -v

You might see output like this:

origin  https://primary-repo.com/user/repo.git (fetch)
origin  https://primary-repo.com/user/repo.git (push)
origin  https://backup-repo.com/user/repo.git (push)

Now, when you run git push origin, Git will push to both URLs automatically.

Find the original response here: https://stackoverflow.com/questions/14290113/git-pushing-code-to-two-remotes/14290145#14290145

GitHub API for Repository Backup

For more control and automation, you can use the GitHub API to back up repositories programmatically. Here’s a Python script to back up a repository along with its issues and pull requests.

First, install the required libraries:

pip install requests

Then, create a script:

import os
import requests

# GitHub token and repository details
GITHUB_TOKEN = 'your_github_token'
REPO_OWNER = 'username'
REPO_NAME = 'repository'

# Headers for GitHub API
headers = {
    'Authorization': f'token {GITHUB_TOKEN}',
    'Accept': 'application/vnd.github.v3+json',
}

# Function to back up repository
def backup_repo():
    repo_url = f'https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}'
    response = requests.get(repo_url, headers=headers)
    with open(f'{REPO_NAME}_repo.json', 'w') as f:
        f.write(response.text)
    print(f'Repository metadata backed up to {REPO_NAME}_repo.json')

# Function to back up issues
def backup_issues():
    issues_url = f'https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}/issues'
    response = requests.get(issues_url, headers=headers)
    with open(f'{REPO_NAME}_issues.json', 'w') as f:
        f.write(response.text)
    print(f'Issues backed up to {REPO_NAME}_issues.json')

# Function to back up pull requests
def backup_pull_requests():
    pulls_url = f'https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}/pulls'
    response = requests.get(pulls_url, headers=headers)
    with open(f'{REPO_NAME}_pulls.json', 'w') as f:
        f.write(response.text)
    print(f'Pull requests backed up to {REPO_NAME}_pulls.json')

# Run backup functions
backup_repo()
backup_issues()
backup_pull_requests()

This script fetches all repositories for the specified account, then clones or updates each repository in the designated backup directory.

Backing Up GitHub Metadata


Issues and Pull Requests

To back up issues and pull requests, you can use the GitHub API. Here's a Python script to download all issues and pull requests for a repository:

import requests
import json
import os

API_URL = "https://api.github.com"
TOKEN = "your_personal_access_token"
REPO_OWNER = "owner"
REPO_NAME = "repo"
BACKUP_DIR = "github_backups"

def get_issues_and_prs():
    headers = {
        "Authorization": f"token {TOKEN}",
        "Accept": "application/vnd.github.v3+json"
    }
    issues_and_prs = []
    page = 1
    while True:
        response = requests.get(
            f"{API_URL}/repos/{REPO_OWNER}/{REPO_NAME}/issues?state=all&page={page}&per_page=100",
            headers=headers
        )
        if response.status_code == 200:
            page_data = response.json()
            if not page_data:
                break
            issues_and_prs.extend(page_data)
            page += 1
        else:
            print(f"Error fetching issues and PRs: {response.status_code}")
            break
    return issues_and_prs

def save_issues_and_prs(data):
    backup_path = os.path.join(BACKUP_DIR, f"{REPO_OWNER}_{REPO_NAME}_issues_and_prs.json")
    with open(backup_path, 'w') as f:
        json.dump(data, f, indent=2)

def main():
    os.makedirs(BACKUP_DIR, exist_ok=True)
    issues_and_prs = get_issues_and_prs()
    save_issues_and_prs(issues_and_prs)
    print(f"Backed up {len(issues_and_prs)} issues and pull requests")

if __name__ == "__main__":
    main()

Wiki Pages

To back up wiki pages, you can clone the wiki repository:

git clone https://github.com/username/repository.wiki.git

Project Boards

Project boards can be backed up using the GitHub API. Here's a Python script to download project board data:

import requests
import json
import os

API_URL = "https://api.github.com"
TOKEN = "your_personal_access_token"
REPO_OWNER = "owner"
REPO_NAME = "repo"
BACKUP_DIR = "github_backups"

def get_project_boards():
    headers = {
        "Authorization": f"token {TOKEN}",
        "Accept": "application/vnd.github.inertia-preview+json"
    }
    response = requests.get(
        f"{API_URL}/repos/{REPO_OWNER}/{REPO_NAME}/projects",
        headers=headers
    )
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error fetching project boards: {response.status_code}")
        return []

def get_project_columns(project_id):
    headers = {
        "Authorization": f"token {TOKEN}",
        "Accept": "application/vnd.github.inertia-preview+json"
    }
    response = requests.get(
        f"{API_URL}/projects/{project_id}/columns",
        headers=headers
    )
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error fetching project columns: {response.status_code}")
        return []

def save_project_boards(data):
    backup_path = os.path.join(BACKUP_DIR, f"{REPO_OWNER}_{REPO_NAME}_project_boards.json")
    with open(backup_path, 'w') as f:
        json.dump(data, f, indent=2)

def main():
    os.makedirs(BACKUP_DIR, exist_ok=True)
    project_boards = get_project_boards()
    for board in project_boards:
        board['columns'] = get_project_columns(board['id'])
    save_project_boards(project_boards)
    print(f"Backed up {len(project_boards)} project boards")

if __name__ == "__main__":
    main()

Releases

To back up releases, you can use the GitHub API. Here's a Python script to download release data:

import requests
import json
import os

API_URL = "https://api.github.com"
TOKEN = "your_personal_access_token"
REPO_OWNER = "owner"
REPO_NAME = "repo"
BACKUP_DIR = "github_backups"

def get_releases():
    headers = {
        "Authorization": f"token {TOKEN}",
        "Accept": "application/vnd.github.v3+json"
    }
    releases = []
    page = 1
    while True:
        response = requests.get(
            f"{API_URL}/repos/{REPO_OWNER}/{REPO_NAME}/releases?page={page}&per_page=100",
            headers=headers
        )
        if response.status_code == 200:
            page_releases = response.json()
            if not page_releases:
                break
            releases.extend(page_releases)
            page += 1
        else:
            print(f"Error fetching releases: {response.status_code}")
            break
    return releases

def save_releases(data):
    backup_path = os.path.join(BACKUP_DIR, f"{REPO_OWNER}_{REPO_NAME}_releases.json")
    with open(backup_path, 'w') as f:
        json.dump(data, f, indent=2)

def main():
    os.makedirs(BACKUP_DIR, exist_ok=True)
    releases = get_releases()
    save_releases(releases)
    print(f"Backed up {len(releases)} releases")

if __name__ == "__main__":
    main()

Restoring from a GitHub Backup


To restore a repository from a backup:

  1. Create a new repository on GitHub (if needed).
  2. Push the backed-up repository to the new GitHub repository:
cd backup_repository.git
git push --mirror https://github.com/username/new_repository.git

For metadata, you'll need to use the GitHub API or manual processes to restore the data, depending on the type of metadata and how it was backed up.

Github Backup Solutions


GitHub Archive Program

GitHub has its own archive program that creates long-term archives of public repositories. While this isn't a solution for private repositories or for maintaining your own backups, it's worth mentioning as part of GitHub's commitment to preserving open-source code.

Third-party Backup Tools

Several third-party tools and services offer comprehensive GitHub backup solutions:

SimpleBackups

SimpleBackups offers an automated service specifically designed for backing up GitHub repositories and metadata to any storage solution. This service stands out for its flexibility and ease of use.

Key advantages of SimpleBackups:

  1. Full Automation: Set up once and let SimpleBackups handle regular backups without further intervention.
  2. Flexible Storage Options: Back up your GitHub data to a wide range of storage solutions, allowing you to choose the most suitable option for your needs.
  3. Comprehensive Coverage: Backs up not just repositories, but also issues, pull requests, wikis, and other GitHub metadata.
  4. Customizable Schedules: Set backup frequencies that match your project's needs and activity levels.
  5. Easy Recovery: Simplifies the process of restoring your data when needed.
  6. Secure Transfer and Storage: Ensures your data is protected during transfer and in storage.
  7. Compliancy: ISO 27001 certified solution. It provides all you need for your ISO, GDPR and SOC2 requirements

SimpleBackups provides a hassle-free solution for maintaining up-to-date backups of your entire GitHub presence, offering peace of mind and data security for developers and teams of all sizes.

Other Notable Tools

When choosing a backup solution, consider factors such as ease of use, storage flexibility, comprehensiveness of the backup, restoration process, and cost.

Rapid fire FAQs


How to backup GitHub repository?

To backup a GitHub repository, you can use the git clone command with the --mirror flag to create a local copy of the repository, including all branches and commit history.

git clone --mirror https://github.com/username/repository.git

This repository can then be backed up to another remote (GitLab, BitBucket, Gitea...).

git remote add backup-remote https://backup-server.com/username/repository.git
git push --mirror backup-remote

Or simply compress the Git repository and save it to another storage, using s3 or scp.

tar -czvf repository-backup.tar.gz repository.git

How to backup all branches in a GitHub repository?

You have 2 options when it comes to backing up all branches of your respository.

  1. Using --mirror flag when cloning the repository
git clone --mirror https://github.com/username/repository.git
  1. Using fetch all after a regular clone
git clone https://github.com/username/repository.git
git fetch --all

How to backup github organization?

Backing up an organization will invole backing up all the repositories in that organization.
You can use the GitHub API to list all repositories in an organization and then backup each repository individually.

Using the GitHub CLI (gh), you can list all repositories in an organization:

Note that you'll have to install the GitHub CLI first: https://cli.github.com/ and get an access token from GitHub.

gh repo list organization_name --limit 1000 --json name,sshUrl > repos.json

Then, you can loop over the repositories and clone them:

cat repos.json | jq -r '.[].sshUrl' | xargs -n 1 git clone

Conclusion

Backing up your GitHub repositories and metadata is a crucial part of protecting your code and project history. By implementing a comprehensive backup strategy using the methods outlined in this guide, you can ensure that your valuable work is safe and recoverable in case of any unforeseen events.

Remember to regularly review and update your backup processes as your projects evolve and as GitHub introduces new features. With these practices in place, you can code with confidence, knowing that your GitHub data is securely backed up.

When you implement your own backup solutions using the methods outlined in this guide, you gain a deep understanding of the backup process and have full control over your data. However, this approach requires ongoing maintenance, monitoring, and troubleshooting to ensure your backups remain effective and up-to-date.

On the other hand, a service like SimpleBackups offers several key benefits that complement and enhance your backup strategy.

Remember, the ultimate goal is to ensure your valuable GitHub data is safely and consistently backed up. Whether you choose to implement your own solutions, use a service like SimpleBackups, or employ a combination of both, regular backups are an essential practice for any developer or team relying on GitHub for their projects.



Back to blog

Stop worrying about your backups.
Focus on building amazing things!

Free 7-day trial. No credit card required.

Have a question? Need help getting started?
Get in touch via chat or at [email protected]

Customer support with experts
Security & compliance
Service that you'll love using