S3

Source Connector 🚧 (Docs In Progress)

Overview

AWS S3

This page contains the setup guide and reference information for the Amazon S3 source connector.

Streams

Prerequisites

Authentication

The following authentication options are supported by this connector:

Authentication MethodSupportedDocumentation
Access Key ID and Access SecretyesManaging access keys for IAM users
IAM role authenticationyesIAM roles

Source Ingest Overview

The source bucket will send s3:ObjectCreated:* to an SQS Queue, and these events will then be ingested by tarsal.

When an SQS message is processed successfully this message is deleted and will not be tried again. This also means in certain cases when a file doesn't match the prefix in the configuration below, these messages will be processed, but ignored which in turn means that these messages will be deleted. This same principle applies to invalid json events.

If you do have different types of events coming into the same s3 bucket and only want specific messages to be processed by this source you should either use the path prefix or the more efficient way described in the next section

If the error is internal we will continue retrying ourselves until resolved.

How To Create Source Infrastructure

In order to have the s3 source setup you as a customer will need a couple things:

  • AWS S3 Bucket
  • AWS S3 Event Notifications
  • AWS SQS Queue

Configuration

The following fields are used to configure the source connector using an Access Token.

FieldRequiredDescriptionExample
Output Stream NameyesThe name of the stream you would like this source to output. Can contain letters, numbers, or underscores.
BucketyesName of the S3 bucket where the file(s) exist.myFolder/thisTable/
SQS Queue URLyesURL of the sqs queue that ingests s3 event notifications from the bucket abovehttps://sqs.{AWS_REGION}.amazonaws.com/{AWS_ACCOUNT_ID}/{QUEUE_NAME}
SQS Queue ARNyesARN of the sqs queue that ingests s3 event notifications from the bucket abovearn:aws:sqs:{AWS_REGION}:{AWS_ACCOUNT_ID}:{QUEUE_NAME}
Path PrefixnoSee below
EndpointnoSee below

Path Prefix

An optional string that limits the files returned by AWS when listing files to only those starting with the specified prefix. This is different than the Path Pattern, as the prefix is applied directly to the API call made to S3, rather than being filtered within Tarsal. This is not a regular expression and does not accept pattern-style symbols like wildcards (*). We recommend using this filter to improve performance if the connector if your bucket has many folders and files that are unrelated to the data you want to replicate, and all the relevant files will always reside under the specified prefix.

  • Together with the Path Pattern, there are multiple ways to specify the files to sync. For example, all the following configurations are equivalent:
    • Prefix = <empty>, Pattern = path1/path2/myFolder/**/*
    • Prefix = path1/, Pattern = path2/myFolder/**/_.jsonl
    • Prefix = path1/path2/, Pattern = myFolder/**/_.jsonl
    • Prefix = path1/path2/myFolder/, Pattern = *_/_.jsonl
  • The ability to individually configure the prefix and pattern has been included to accommodate situations where you do not want to replicate the majority of the files in the bucket. If you are unsure of the best approach, you can safely leave the Path Prefix field empty and just set the Path Pattern to meet your requirements.

Endpoint

An optional parameter that enables the use of non-Amazon S3 compatible services. If you are using the default Amazon service, leave this field blank.

Authentication

You can either use Access Key ID and an Access Secret to authenticate or an IAM Role. Note that the IAM Role authentication is usually the preferred way for customers.

Access Key and ID

FieldDescription
Access key IDThe access key id for authentication to Amazon Web Services
Access SecretThe access secret for authenticating to Amazon Web Services

IAM Role

FieldDescription
IAM Role ARNIAM Role associated with S3 bucket if using assume role

File Format Settings

JSONL is the only format currently supported. As such, there are no extra settings for other format types.

File Compressions

CompressionSupported?
Gzipyes

​