Snowflake

Destination Connector

Overview

Snowflake

The Tarsal Snowflake destination allows you to sync data to Snowflake.

Output Schema

Each stream will be output into its own table in Snowflake. Each table will contain the following additional 3 columns:

FieldColumn TypeDescription
_tarsal_idVARCHARA uuid assigned by Tarsal to each event that is processed
_tarsal_emitted_atTIMESTAMP WITH TIME ZONEA timestamp representing when the event was pulled from the data source
_tarsal_dataVARIANTA JSON blob representing with the event data

Snowflake Configuration

We recommend creating an Tarsal-specific warehouse, database, schema, user, and role for writing data into Snowflake so it is possible to track costs specifically related to Tarsal (including the cost of running this warehouse) and control permissions at a granular level. Since the Tarsal user creates, drops, and alters tables, OWNERSHIP permissions are required in Snowflake. If you are not following the recommended script below, please limit the OWNERSHIP permissions to only the necessary database and schema for the Tarsal user.

📘

Snowflake Naming Conventions

The Tarsal Snowflake destination will create tables and schemas using the Unquoted identifiers when possible or fallback to Quoted Identifiers if the names are containing special characters. See Snowflake's Identifier requirements for more information.

We provide the following script to create these resources. Before running, you must change the password to something secure. You may change the names of the other resources if you desire.

-- set variables (these need to be uppercase)
set tarsal_role = 'TARSAL_ROLE';
set tarsal_username = 'TARSAL_USER';
set tarsal_warehouse = 'TARSAL_WAREHOUSE';
set tarsal_database = 'TARSAL_DATABASE';
set tarsal_schema = 'TARSAL_SCHEMA';

-- set user password
set tarsal_password = 'password';

begin;

-- create Tarsal role
use role securityadmin;
create role if not exists identifier($tarsal_role);
grant role identifier($tarsal_role) to role SYSADMIN;

-- create Tarsal user
create user if not exists identifier($tarsal_username)
password = $tarsal_password
default_role = $tarsal_role
default_warehouse = $tarsal_warehouse;

grant role identifier($tarsal_role) to user identifier($tarsal_username);

-- change role to sysadmin for warehouse / database steps
use role sysadmin;

-- create Tarsal warehouse
create warehouse if not exists identifier($tarsal_warehouse)
warehouse_size = xsmall
warehouse_type = standard
auto_suspend = 60
auto_resume = true
initially_suspended = true;

-- create Tarsal database
create database if not exists identifier($tarsal_database);

-- grant Tarsal warehouse access
grant USAGE
on warehouse identifier($tarsal_warehouse)
to role identifier($tarsal_role);

-- grant Tarsal database access
grant OWNERSHIP
on database identifier($tarsal_database)
to role identifier($tarsal_role);

commit;

begin;

USE DATABASE identifier($tarsal_database);

-- create schema for Tarsal data
CREATE SCHEMA IF NOT EXISTS identifier($tarsal_schema);

commit;

begin;

-- grant Tarsal schema access
grant OWNERSHIP
on schema identifier($tarsal_schema)
to role identifier($tarsal_role);

commit;

Tarsal Configuration

The following fields are used to configure the destination connector.

FieldRequiredDescriptionExample
HostyesHost domain of the snowflake instance. (must include the account, region, cloud environment, and end with snowflakecomputing.com)accountname.us-east-2.aws.snowflakecomputing.com
RoleyesThe role you created for Tarsal to access Snowflake. Must be UPPERCASE.TARSAL_ROLE
WarehouseyesThe warehouse you created for Tarsal to sync data into. Must be UPPERCASE.TARSAL_WAREHOUSE
DatabaseyesThe database you created for Tarsal to sync data into. Must be UPPERCASE.TARSAL_DATABASE
SchemayesThe default Snowflake schema tables are written to if the source does not specify a namespace. Must be UPPERCASE. Schema name would be transformed to follow Snowflake Naming ConventionsTARSAL_SCHEMA
UsernameyesThe username you created to allow Tarsal to access the database. Must be UPPERCASE.TARSAL_USER
PasswordyesPassword associated with the username.
Loading MethodnoSee below

Loading Method

FieldDescription
Standard Uses INSERT statements to send batches of records to Snowflake. Easiest (no setup) but not recommended for large production workloads due to slow speed.
Internal StagingWrites large batches of records to a file, uploads the file to Snowflake, then uses COPY INTO table to upload the file. Recommended for large production workloads for better speed and scalability.
S3 StagingSee below
GCS StagingSee below

S3 Staging

Writes large batches of records to a file, uploads the file to S3, then uses COPY INTO table to upload the file. Recommended for large production workloads for better speed and scalability.

FieldRequiredDescription
S3 Key Id yesThe Access Key Id granting allow one to access the above S3 staging bucket. Tarsal requires Read and Write permissions to the given bucket.
S3 Bucket NameyesThe name of the staging S3 bucket. Tarsal will write files to this bucket and read them via COPY statements on Snowflake.
S3 Bucket RegionnoThe region of the S3 staging bucket to use if utilizing a copy strategy.
S3 Access KeyyesThe corresponding secret to the above access key id.

GCS Staging

Writes large batches of records to a file, uploads the file to GCS, then uses COPY INTO table to upload the file. Recommended for large production workloads for better speed and scalability.

FieldRequiredDescription
GCS Bucket Name yesThe name of the staging GCS bucket. Tarsal will write files to this bucket and read them via COPY statements on Snowflake.
Google Application CredentialsyesThe contents of the JSON key file that has read/write permissions to the staging GCS bucket. You will separately need to grant bucket access to your Snowflake GCP service account. See the [GCP docs](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys\) for more information on how to generate a JSON key for your service account.
GCP Project IDyesThe name of the GCP project ID for your credentials.

Cloud Storage Staging

By default, Tarsal uses batches of INSERT commands to add data to a temporary table before copying it over to the final table in Snowflake. This is too slow for larger/multi-GB replications. For those larger replications we recommend configuring using cloud storage to allow batch writes and loading.

AWS S3

For AWS S3, you will need to create a bucket and provide credentials to access the bucket. We recommend creating a bucket that is only used for Tarsal to stage data to Snowflake. Tarsal needs read/write access to interact with this bucket.

Google Cloud Storage (GCS)

  1. Create a GCS bucket, then run the script below.
    Notes:
    • The script must be run as the account admin for Snowflake.
    • Replace TARSAL_ROLE with the role used for Tarsal's Snowflake configuration.
    • Replace YOURBUCKETNAME with the bucket name
    • The stage name can be modified to any valid name.
    • gcs_tarsal_integration must be used
create storage INTEGRATION gcs_tarsal_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = GCS
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://YOURBUCKETNAME');

create stage gcs_tarsal_stage
  url = 'gcs://io_tarsal_test_staging'
  storage_integration = gcs_tarsal_integration;

GRANT USAGE ON integration gcs_tarsal_integration TO ROLE TARSAL_ROLE;
GRANT USAGE ON stage gcs_tarsal_stage TO ROLE TARSAL_ROLE;

DESC STORAGE INTEGRATION gcs_tarsal_integration;
  1. Verify the query. It should should show a STORAGE_GCP_SERVICE_ACCOUNT property with an email as the property value.
  2. Finally, add read/write permissions to your bucket with that email.