Snowflake
Destination Connector
Overview
This Destination Connector allows you to sync data to Snowflake.
Output Schema
Each stream will be output into its own table in Snowflake. Each table will contain the following additional 3 columns:
Field | Column Type | Description |
---|---|---|
_tarsal_id | VARCHAR | A uuid assigned by Tarsal to each event that is processed |
_tarsal_emitted_at | TIMESTAMP WITH TIME ZONE | A timestamp representing when the event was pulled from the data source |
_tarsal_data | VARIANT | A JSON blob representing with the event data |
Snowflake Configuration
We recommend creating an Tarsal-specific warehouse, database, schema, user, and role for writing data into Snowflake so it is possible to track costs specifically related to Tarsal (including the cost of running this warehouse) and control permissions at a granular level. Since the Tarsal user creates, drops, and alters tables, OWNERSHIP
permissions are required in Snowflake. If you are not following the recommended script below, please limit the OWNERSHIP
permissions to only the necessary database and schema for the Tarsal user.
Snowflake Naming Conventions
The Tarsal Snowflake destination will create tables and schemas using the Unquoted identifiers when possible or fallback to Quoted Identifiers if the names are containing special characters. See Snowflake's Identifier requirements for more information.
We provide the following script to create these resources. Before running, you must change the password to something secure. You may change the names of the other resources if you desire.
-- set variables (these need to be uppercase)
set tarsal_role = 'TARSAL_ROLE';
set tarsal_username = 'TARSAL_USER';
set tarsal_warehouse = 'TARSAL_WAREHOUSE';
set tarsal_database = 'TARSAL_DATABASE';
set tarsal_schema = 'TARSAL_SCHEMA';
-- set user password
set tarsal_password = 'password';
begin;
-- create Tarsal role
use role securityadmin;
create role if not exists identifier($tarsal_role);
grant role identifier($tarsal_role) to role SYSADMIN;
-- create Tarsal user
create user if not exists identifier($tarsal_username)
password = $tarsal_password
default_role = $tarsal_role
default_warehouse = $tarsal_warehouse;
grant role identifier($tarsal_role) to user identifier($tarsal_username);
-- change role to sysadmin for warehouse / database steps
use role sysadmin;
-- create Tarsal warehouse
create warehouse if not exists identifier($tarsal_warehouse)
warehouse_size = xsmall
warehouse_type = standard
auto_suspend = 60
auto_resume = true
initially_suspended = true;
-- create Tarsal database
create database if not exists identifier($tarsal_database);
-- grant Tarsal warehouse access
grant USAGE
on warehouse identifier($tarsal_warehouse)
to role identifier($tarsal_role);
-- grant Tarsal database access
grant OWNERSHIP
on database identifier($tarsal_database)
to role identifier($tarsal_role);
commit;
begin;
USE DATABASE identifier($tarsal_database);
-- create schema for Tarsal data
CREATE SCHEMA IF NOT EXISTS identifier($tarsal_schema);
commit;
begin;
-- grant Tarsal schema access
grant OWNERSHIP
on schema identifier($tarsal_schema)
to role identifier($tarsal_role);
commit;
Tarsal Configuration
The following fields are used to configure the destination connector.
Field | Required | Description | Example |
---|---|---|---|
Host | yes | Host domain of the snowflake instance. (must include the account, region, cloud environment, and end with snowflakecomputing.com) | accountname.us-east-2.aws.snowflakecomputing.com |
Role | yes | The role you created for Tarsal to access Snowflake. Must be UPPERCASE. | TARSAL_ROLE |
Warehouse | yes | The warehouse you created for Tarsal to sync data into. Must be UPPERCASE. | TARSAL_WAREHOUSE |
Database | yes | The database you created for Tarsal to sync data into. Must be UPPERCASE. | TARSAL_DATABASE |
Schema | yes | The default Snowflake schema tables are written to if the source does not specify a namespace. Must be UPPERCASE. Schema name would be transformed to follow Snowflake Naming Conventions | TARSAL_SCHEMA |
Username | yes | The username you created to allow Tarsal to access the database. Must be UPPERCASE. | TARSAL_USER |
Password | yes | Password associated with the username. | |
Loading Method | no | See below |
Loading Method
Field | Description |
---|---|
Standard | Uses INSERT statements to send batches of records to Snowflake. Easiest (no setup) but not recommended for large production workloads due to slow speed. |
Internal Staging | Writes large batches of records to a file, uploads the file to Snowflake, then uses COPY INTO table to upload the file. Recommended for large production workloads for better speed and scalability. |
S3 Staging | See below |
GCS Staging | See below |
S3 Staging
Writes large batches of records to a file, uploads the file to S3, then uses COPY INTO table
to upload the file. Recommended for large production workloads for better speed and scalability.
Field | Required | Description |
---|---|---|
S3 Key Id | yes | The Access Key Id granting allow one to access the above S3 staging bucket. Tarsal requires Read and Write permissions to the given bucket. |
S3 Bucket Name | yes | The name of the staging S3 bucket. Tarsal will write files to this bucket and read them via COPY statements on Snowflake. |
S3 Bucket Region | no | The region of the S3 staging bucket to use if utilizing a copy strategy. |
S3 Access Key | yes | The corresponding secret to the above access key id. |
GCS Staging
Writes large batches of records to a file, uploads the file to GCS, then uses COPY INTO table
to upload the file. Recommended for large production workloads for better speed and scalability.
Field | Required | Description |
---|---|---|
GCS Bucket Name | yes | The name of the staging GCS bucket. Tarsal will write files to this bucket and read them via COPY statements on Snowflake. |
Google Application Credentials | yes | The contents of the JSON key file that has read/write permissions to the staging GCS bucket. You will separately need to grant bucket access to your Snowflake GCP service account. See the [GCP docs](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys\) for more information on how to generate a JSON key for your service account. |
GCP Project ID | yes | The name of the GCP project ID for your credentials. |
Cloud Storage Staging
By default, Tarsal uses batches of INSERT commands to add data to a temporary table before copying it over to the final table in Snowflake. This is too slow for larger/multi-GB replications. For those larger replications we recommend configuring using cloud storage to allow batch writes and loading.
AWS S3
For AWS S3, you will need to create a bucket and provide credentials to access the bucket. We recommend creating a bucket that is only used for Tarsal to stage data to Snowflake. Tarsal needs read/write access to interact with this bucket.
Google Cloud Storage (GCS)
- Create a GCS bucket, then run the script below.
Notes:- The script must be run as the account admin for Snowflake.
- Replace
TARSAL_ROLE
with the role used for Tarsal's Snowflake configuration. - Replace
YOURBUCKETNAME
with the bucket name - The stage name can be modified to any valid name.
gcs_tarsal_integration
must be used
create storage INTEGRATION gcs_tarsal_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = ('gcs://YOURBUCKETNAME');
create stage gcs_tarsal_stage
url = 'gcs://io_tarsal_test_staging'
storage_integration = gcs_tarsal_integration;
GRANT USAGE ON integration gcs_tarsal_integration TO ROLE TARSAL_ROLE;
GRANT USAGE ON stage gcs_tarsal_stage TO ROLE TARSAL_ROLE;
DESC STORAGE INTEGRATION gcs_tarsal_integration;
- Verify the query. It should should show a
STORAGE_GCP_SERVICE_ACCOUNT
property with an email as the property value. - Finally, add read/write permissions to your bucket with that email.
Updated about 2 months ago