Mastering S3 Sync: A Developer's Guide to s3cmd, AWS CLI, and Rclone

SimpleBackups founder

Islam Essam

Co-founder, SimpleBackups

December 12, 2023

Amazon Simple Storage Service (Amazon S3) is a versatile and scalable object storage service that developers often use to store and retrieve data. You may want to sync S3 buckets, which is a common task, but we usually do not have the right tools for, and there are in fact several tools available for this purpose. In this guide, we will explore how to perform S3 sync using three popular tools: s3cmd, aws, and rclone.

Prerequisites

Before we dive into the specifics of each tool, make sure you have the following prerequisites:

  • AWS Account: You need an AWS account with the necessary IAM permissions to interact with S3.
  • Installed Tools: Ensure that you have s3cmd, aws, and rclone installed on your machine.

Reasons to Sync S3 Buckets

Syncing cloud storage buckets can offer several benefits to users, the most common ones we have see are:

  • Data Redundancy & Recovery: By syncing, you ensure that you duplicate your data across multiple locations or providers. This redundancy minimizes the risk of data loss due to provider outage, hardware issue, or other unforeseen circumstances.
  • Data Availability: Synced buckets can improve data availability. If one storage location becomes unavailable, you can access your data from another synchronized location, ensuring uninterrupted access for your applications and users.
  • Cost Optimization: By syncing data strategically, you can take advantage of lower storage costs in certain regions or providers, ensuring you optimize your cloud storage expenses.

Overall, syncing cloud storage buckets empowers you with improved data resilience, availability, and flexibility.

1. Sync S3 using s3cmd

s3cmd is a command-line utility for managing Amazon S3 and other cloud storage services. Here's how you can use s3cmd to sync data with an S3 bucket:

# Install s3cmd (if not already installed)
# On Linux:
# sudo apt-get install s3cmd

# Configure s3cmd with your AWS credentials
s3cmd --configure

# Sync local directory with an S3 bucket
s3cmd sync /path/to/local/directory s3://your-bucket-name

Advanced S3 sync options for s3cmd

When using s3cmd for S3 sync, you can take advantage of several advanced options to customize the synchronization process to your specific needs. Here are some of the advanced options available:

Delete Removed Files

--delete-removed: When files are deleted from the source directory, this option ensures that corresponding objects in the destination bucket are also deleted. It helps keep the destination bucet in sync with the source.

s3cmd sync --delete-removed /path/to/local/directory s3://your-bucket-name

Exclude Specific Files

--exclude: You can use this option to exclude specific files or patterns from being synchronized. For example, if you want to exclude all .log files, you can do:

s3cmd sync --exclude "*.log" /path/to/local/directory s3://your-bucket-name

Include Specific Files

--include: This option allows you to include specific files or patterns for synchronization while excluding others. For instance, if you want to include only .txt files, you can use:

s3cmd sync --include "*.txt" /path/to/local/directory s3://your-bucket-name

Skip Existing Files

--skip-existing: When this option is used, s3cmd will skip copying files that already exist in the destination. It's helpful to avoid overwriting existing files.

s3cmd sync --skip-existing /path/to/local/directory s3://your-bucket-name

Dry Run

--dry-run: A dry run is a simulation of the sync operation. It shows you what actions s3cmd would take without actually performing the sync. Useful for testing and verifying your sync command before executing it.

s3cmd sync --dry-run /path/to/local/directory s3://your-bucket-name

Delete After

--delete-after: This option delays the deletion of source files until after the transfer is complete. It's useful when you want to ensure that the transfer is successful before removing files from the source.

s3cmd sync --delete-after /path/to/local/directory s3://your-bucket-name

Quiet Mode

--quiet: If you want to suppress most of the output and only display errors, you can use the --quiet option. It makes the sync operation less verbose.

s3cmd sync --quiet /path/to/local/directory s3://your-bucket-name

Some s3cmd key details

  • Uploads any new or updated files from a local dir to the S3 path
  • Deletes any files no longer present in a local dir from the S3 path
  • Uses multipart uploads for large files to improve transfer performance
  • MD5 hashes used to detect if local file changed compared to S3
  • Can configure ACLs, encryption, MIME types in .s3cfg file

2. Sync S3 Using AWS CLI

AWS Command Line Interface (AWS CLI) is the official command-line tool provided by AWS. It offers extensive functionality for managing AWS services, including S3. To sync data with S3 using AWS CLI:

# Configure AWS CLI with your AWS credentials
aws configure

# Sync local directory with an S3 bucket
aws s3 sync /path/to/local/directory s3://your-bucket-name

Advanced S3 sync options for AWS CLI

The AWS Command Line Interface (CLI) offers several advanced options for synchronizing files to and from Amazon S3. Here are some of the common advanced sync options along with code examples:

Delete Missing Files

--delete: Removes files from the destination that are not present in the source.

aws s3 sync s3://mybucket/src /localpath --delete

Pattern-based File Filtering

--exclude / --include: Specifies patterns to filter files or objects to exclude or include in the sync.

aws s3 sync /localpath s3://mybucket/dest --exclude "*.tmp" --include "*.txt"

Access Control List Settings

--acl: Sets the ACL for the synced files (e.g., public-read).

aws s3 sync /localpath s3://mybucket/dest --acl public-read

Specify Storage Class

--storage-class: Specifies the storage class for the synced files (e.g., STANDARD_IA for infrequent access).

aws s3 sync /localpath s3://mybucket/dest --storage-class STANDARD_IA

Dry Run (Simulation)

--dryrun: Displays the operations that would be performed using the specified command without actually running them.

aws s3 sync /localpath s3://mybucket/dest --dryrun

Sync Based on Size Only

--size-only: Makes the sync command ignore changes in file timestamps and only compare sizes.

aws s3 sync /localpath s3://mybucket/dest --size-only

Use Exact Timestamps

--exact-timestamps: Use exact timestamps to determine if a file needs to be synced, rather than rounded timestamps.

aws s3 sync /localpath s3://mybucket/dest --exact-timestamps

--follow-symlinks: Syncs the contents of the symlinked files and folders.

aws s3 sync /localpath s3://mybucket/dest --follow-symlinks

--no-follow-symlinks: Ignores symlinks and does not sync their contents.

aws s3 sync /localpath s3://mybucket/dest --no-follow-symlinks

Specify Source and Destination Regions

--source-region / --region: Specifies the region of the source/destination bucket.

aws s3 sync s3://sourcebucket/src s3://destbucket/dest --source-region us-west-1 --region us-east-1

These options provide greater control over the synchronization process, allowing you to tailor it to your specific needs. Remember to replace /localpath, s3://mybucket/src, and s3://mybucket/dest with your actual local file paths and S3 bucket names/paths.

Some AWS CLI key details

  • Provides the same sync functionality as s3cmd
  • Can set ACLs, encryption, and other options on the CLI
  • Uses the AWS SDK, so can leverage other AWS services if needed
  • Credentials managed via AWS config files
  • Sync works the same in both directions

3. Using rclone

rclone is a versatile command-line program for syncing files and directories to and from various cloud storage providers, including S3. To use rclone for S3 sync:

# Install rclone (if not already installed)
# On Linux:
# curl https://rclone.org/install.sh | sudo bash

# Configure rclone with your AWS credentials
rclone config

# Sync local directory with an S3 bucket
rclone sync /path/to/local/directory remote:your-bucket-name

Advanced S3 sync options for rclone

Rclone provides various options and flags for customizing your S3 sync. Here are some commonly used options:

  • --delete: Deletes files from the destination that don't exist in the source.
  • --exclude: Exclude files or patterns from being synced.
  • --include: Include files or patterns for syncing.
  • --dry-run: Simulate the sync operation without making any changes.
  • --quiet: Suppress output and only display errors.

Some rclone key details

  • Supports other backends like Google Drive, Dropbox, etc.
  • Can modify aspects like chunk size and transfers
  • Optional encryption and ACL configuration
  • Uses minimal config - sets up remotes to sync to/from

Experience Effortless S3 Sync with SimpleBackups

At SimpleBackups, we understand the importance of data synchronization and redundancy. That's why we've made it easy to sync S3 buckets with other cloud storage providers. With SimpleBackups, you can set up a storage replication in minutes and ensure that your data is always available and accessible.

Ready to safeguard your data with ease? Sign up for SimpleBackups today and experience the simplicity of secure backups!



Back to blog

Stop worrying about your backups.
Focus on building amazing things!

Free 7-day trial. No credit card required.

Have a question? Need help getting started?
Get in touch via chat or at [email protected]

Customer support with experts
Security & compliance
Service that you'll love using