Using Rclone to Migrate Data from Oracle into Snowflake

Recently I had a customer that needed to copy data from the Oracle Infrastructure Cloud (OIC) into Snowflake. Snowflake can natively ingest data from data buckets on Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), but the Oracle Cloud is not currently supported. This led me to learn about the Rclone utility, and I have to agree with a quote on their site (and originally by Arthur C. Clarke) that it is “Technology indistinguishable from magic.” Rclone can replicate data to and from a long list of cloud providers and local folders using your command line with a minimal amount of configuration. Although every cloud provider has its own command-line utility for object storage, the power of Rclone is that this one utility can work with all the cloud providers and copy entire buckets of files across clouds while preserving file timestamps and performing CRC checks to ensure data is transferred intact.

I will provide a walk-through below of the steps I used to set up replication from OIC to AWS and Snowflake, but the process is very similar for setting up Azure and GCP buckets. You can use whichever cloud provider you prefer since Snowflake can ingest data from a different cloud than it is running on.

1. Create an object storage bucket on both Oracle Cloud Infrastructure (OCI) and AWS, Azure, or GCP

The following documentation provides detailed instructions on creating a bucket and associating it with authentication credentials.

You will need the following information for these buckets:

  • Oracle — Object Storage Namespace, Region, Customer Secret Key access key ID & secret key, and bucket name.
  • AWS — S3 region, access key ID, secret key, and bucket name

2. Install rclone

For Windows and other operating systems, you can download the installer from the Rclone website. In Linux/macOS/BSD systems, you can use your package manager to install rclone, or there is a bootstrap script you can run to simplify instruction:

sudo yum install -y rclone

or

curl https://rclone.org/install.sh | sudo bash

3. Create a rclone configuration file with the information for your cloud storage providers

On Linux/macOS/BSD, we can use nano or another editor to create the file.

mkdir -p ~/.config/rclone
nano ~/.config/rclone/rclone.conf

In this file, we add configurations for the Oracle Cloud Bucket and AWS S3.

[oci-cloud]
type = s3
env_auth = false
access_key_id = YOUR_OIC_ACCESS_KEY
secret_access_key = YOUR_OIC_ACCESS_SECRET_KEY
region = YOUR_OIC_REGION_IDENTIFIER
endpoint = https://YOUR_NAMESPACE.compat.objectstorage.YOUR_REGION_IDENTIFIER.oraclecloud.com
[s3-cloud]
type = s3
provider = AWS
env_auth = false
access_key_id = YOUR_AWS_ACCESS_KEY
secret_access_key = YOUR_AWS_ACCESS_SECRET_KEY
region = YOUR_AWS_REGION_IDENTIFIER

It is also possible to create these entries interactively using the following command. An example of the process is available in the Rclone documentation for S3, Azure, and GCP.

rclone config

4. Test if you can list the contents of both buckets

rclone ls oci-cloud:OIC_BUCKET_NAME --fast-list
rclone ls s3-cloud:AWS_BUCKET_NAME --fast-list

5. Now, you should be able to run a number of commands to move, copy, and sync buckets and individual files

A few examples are provided below, and the full list is available at https://rclone.org/commands/.

Copy all the files from one bucket to the next with CRC checking:

rclone copy oci-cloud:OIC_BUCKET_NAME s3-cloud:AWS_BUCKET_NAME --verbose --fast-list --checksum

Copy if you need additional threads:

rclone copy oci-cloud:OIC_BUCKET_NAME s3-cloud:AWS_BUCKET_NAME --verbose --fast-list --checksum --cache-workers 64 --transfers 64 --retries 32

Sync folders:

rclone sync oci-cloud:OIC_BUCKET_NAME s3-cloud:AWS_BUCKET_NAME --verbose --fast-list --checksum

Move files from one bucket to another:

rclone move oci-cloud:OIC_BUCKET_NAME s3-cloud:AWS_BUCKET_NAME --verbose --fast-list --checksum --delete-empty-src-dirs

Delete files and empty folders in a bucket (don’t run this now because we need the files for the next step):

rclone delete s3-cloud:AWS_BUCKET_NAME --verbose --fast-list --rmdirs

6. Finally, in Snowflake, create an external stage, create a table to store the data, and load the data into your table from S3

-- Set your context
USE ROLE YOUR_SNOWFLAKE_ROLE;
USE WAREHOUSE YOUR_SNOWFLAKE_WAREHOUSE;
USE YOUR_SNOWFLAKE_DATABASE.YOUR_SNOWFLAKE_SCHEMA;
-- Create an external stage
CREATE OR REPLACE STAGE MY_S3_EXTERNAL_STAGE
URL='S3://AWS_BUCKET_NAME/'
CREDENTIALS=(
AWS_KEY_ID='YOUR_AWS_ACCESS_KEY'
AWS_SECRET_KEY='YOUR_AWS_ACCESS_SECRET_KEY'
);
-- Test that you can list files in the stage
LIST @MY_S3_EXTERNAL_STAGE;
-- Create a table to load
CREATE OR REPLACE TABLE OCI_TEST_SAMP_REVENUE_F (
SHIPTO_ADDR_KEY INT,
OFFICE_KEY INT,
EMPL_KEY INT,
PROD_KEY INT,
ORDER_KEY INT,
UNITS NUMBER(38,10),
DISCNT_VALUE NUMBER(38,10)
);
-- Bulk load files from a stage to the table
COPY INTO OCI_TEST_SAMP_REVENUE_F
FROM @MY_S3_EXTERNAL_STAGE
FILE_FORMAT=(TYPE='CSV' SKIP_HEADER = 1);
-- Verify that we now have the data in the table
SELECT * FROM OCI_TEST_SAMP_REVENUE_F LIMIT 10;

Related tutorials

--

--

Dan Flippo
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Sr Solutions Architect @Snowflake; SME: dbt, Kafka, Oracle BI & DB, StreamSets, Vertica. Views & opinions are my own and do not represent those of my employer.