AWS S3

Destination Connector

Introduction

The following information is a configuration reference for your data's Amazon S3 (Simple Storage Service) destination connector.

The AWS Identity and Access Management (IAM) web service secures access control to AWS resources. IAM manages granular permissions for authentication and authorization to control which users may sign in and what resources they can access.

AWS uses Amazon Resource Names (ARNs) to identify unique resources and services. Applications must use ARNs in policies to access multiple resources within AWS. ARNs have general and resource-specific formats. More information is available at https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html.

Overview

An S3 bucket may be the destination of data consumed and transformed by Tarsal. After Tarsal ingests and normalizes data from the destination connector, the destination connector writes file data to a bucket in a separate directory for each stream.

See Considerations for details on destination connector behavior and output.

Setting up an S3 destination connector requires these tasks in the AWS Management Console and the Tarsal portal:

Prerequisites

The connector requires specific permissions for S3 to connect.

❗️

Before You Begin

Confirm you have AWS console administrative privileges for IAM and S3 before configuration.

Authentication

The S3 destination connector supports these authentication methods:

Authentication MethodDescriptionDocumentation
Access Key ID and Access SecretThe access key and secret ID for AWS authenticationManaging access keys for IAM users
IAM RoleThe permissions role associated with an S3 bucketIAM roles

Permissions

For either authentication method above, the connector requires the actions and access levels below:

Amazon ServiceActionAccess LevelResource(s)
S3PutObjectWriteAll objects or specified object(s)
S3ListBucketListAll buckets or specified buckets(s)
S3GetBucketLocationReadAll buckets or specified buckets(s)

Considerations

Access Keys vs. IAM Roles

S3 connectors authenticate with either access keys or IAM roles in the Tarsal administrative portal.

Access keys have two parts: an access key ID and a secret access key. Think of the access key ID as the username and the secret access key as the password. The connector creates a token with them for AWS authentication.

Access keys should be saved securely. The secret access key is only retrievable upon creation. If the secret access key is lost, a new one must be generated.

AWS supports several types of IAM roles, and service-linked roles for connector authentication are recommended. A service-linked role is tied to a service (i.e., EC2, S3, RDS) with a specific purpose: to assume a role for performing actions. The service owns service-linked roles.

Access keys require more maintenance; they are easily distributable and should be regularly rotated for security purposes. Removing secret access keys immediately invalidates them, preventing applications from functioning until updated with the new credentials. They typically require code updates and deployments that may increase timelines more than other authentication methods and live longer than IAM roles.

IAM roles provide more security than access keys and have a limited scope of permissions. They’re temporary, centrally managed in the AWS console, and not distributable.

📘

Choosing a Connector Authentication Method

IAM roles are the preferred authentication method for the majority of Tarsal customers. AWS and Tarsal encourage roles instead of keys.

AWS Infrastructure

Using Existing AWS Infrastructure

All configuration instructions assume creating new AWS resources across all services for the S3 destination connector. Reuse of IAM and S3 objects is possible, though not recommended.

If you choose to use existing infrastructure, replace the suggested resource names in this guide with your existing ones where applicable.

⚠️

Selecting AWS Regions

The connector's resources and services should reside in the same AWS region.

IAM and Resource Security

The policies presented here provide the connector access to all resources (*) under the necessary services to simplify the configuration process and limit policy maintenance.

AWS and Tarsal recommend defining access levels as restrictively as possible. Limit the connector to only the necessary S3 buckets and actions.

❗️

Before You Begin

Throughout the configuration, you’ll be copying values for later use. Save them to an easily accessible location for reference.

Create the IAM Policy

IAM policies are allow policies that control permissions for AWS objects. They define and enforce resource access for users, roles, and services. Policies are checked on every request and are agnostic of the operational method used (the AWS Console, CLI, or API).

The policy grants AWS services to associate with the connector’s IAM user and role.

Create the IAM policy with S3 object permissions:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
  2. Go to Access Management > Policies from the left navigation.
  3. Click the Create Policy button in the upper right.
  4. Click the JSON tab.
  5. In the Policy Editor
    1. Delete the existing JSON.
    2. Copy and paste the following policy:
      {
      	"Version": "2012-10-17",
      	"Statement": [
      		{
      			"Effect": "Allow",
      			"Action": [
      				"s3:GetObject",
      				"s3:PutObject",
      				"s3:ListBucket"
      			],
      			"Resource": [
      				"*"
      			]
      		}
      	]
      }
      
    3. Click the Next button.
  6. Under Policy Details, enter tarsal-s3-destination-connector-policy for Policy Name.
  7. Click the Create Policy button.

Create the IAM Role

IAM roles are identities with specific permissions and short-lived credentials. Roles are assigned to IAM identities with permissions and trust policies and for access management.

Create a service-linked assumable role, add a custom trust policy for the connector, and attach the permissions policy:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
  2. Go to Access Management > Roles from the left navigation.
  3. Click the Create Role button in the upper right.
  4. For Trusted Entity Type, select the Custom Trust Policy radio button.
  5. Sign in to the Tarsal portal at https://app.tarsal.cloud.
  6. Go to Account > Settings from the left navigation.
  7. Click the Cloud Tools tab.
  8. Next to Sample Trust Policy with Minimum Privileges, click the clipboard icon () on the right to copy the trust policy. The policy includes Tarsal’s AWS account ID to assume the role.
  9. Copy and paste the policy into the in-line text editor, then remove the Condition from the policy:
    ,  
    "Condition": {  
            "StringEquals": {  
              "sts:ExternalId": "f668651f-6c7a-4c54-9dad-78674c93f00d"  
       }  
     }
    
    Verify the JSON syntax. Make sure to remove the comma (,) before “Condition”.
  10. Click the Next button.
  11. Under Permissions Policies, click the drop-down list under Filter by Type and select Customer Managed.
  12. Locate tarsal-s3-destination-connector-policy and select the checkbox to the left of the name.
  13. Click the Next button.
  14. Under Role Details, enter tarsal-s3-destination-connector-role for Role Name.
  15. Click the Create Role button.
  16. In the Roles list, click tarsal-s3-destination-connector-role.
  17. Under Summary, select the Copy ARN button and save the value for retrieval later.

📘

Choosing a Connector Authentication Method

The connector configuration supports two authentication methods: access keys and IAM roles. Complete the steps in the next section only if you’re authenticating with access keys. Please refer to Access Keys vs. IAM Roles when making a decision.

Create the IAM User

Create an IAM user and assign permissions. This user assumes the role previously created.

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
  2. Go to Access Management > Users from the left navigation.
  3. Click the Create User button in the upper right.
  4. Under User Details, enter tarsal-s3-destination-connector-user for User Name.
  5. Click the Next button.
  6. Under Permissions Options, click the radio button next to Attach Policies Directly.
  7. Under Permissions Policies, click the drop-down list under Filter by Type and select Customer Managed.
  8. Locate the tarsal-s3-destination-connector-policy and select the checkbox to the left of the name.
  9. Click the Next button.
  10. Click the Create User button.

Optionally Create Access Keys

Complete the following steps only if you plan to authenticate the connector with access keys.

⚠️

User Security

AWS best practices recommend assigning permissions to user groups rather than users. The following steps assume a single Tarsal S3 destination connector user; therefore, the instructions attach permissions to the user rather than a user group. However, you may assign the user to a group and apply the IAM policy there.

Alternatively, you may use this same connector for multiple S3 destinations.

Generate access keys for previously created user:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
  2. Go to Access Management > Users from the left navigation.
  3. In the Users list, click tarsal-s3-destination-connector-user.
  4. Click the Security Credentials tab.
  5. Locate Access Keys and click the Create Access Key button.
  6. For Use Case, select the Third-party Service radio button.
  7. Check the checkbox under Confirmation.
  8. Click the Next button.
  9. For Description Tag Value, enter Tarsal S3 Destination Connector.
  10. Click the Create Access Key button.
  11. Click the Download .csv File button.
  12. Click the Done button after the .csv download.

The keys are added later when configuring connector authentication.

⚠️

Storing Keys

Record and store the secret access key in a safe place! Lost secret access keys are not recoverable.

If you no longer have your key, deactivate the old key, generate a new one, and update the connector credentials in the Tarsal portal.

📘

AWS Access Key Best Practices

  • Never store your access key in plain text, a code repository, or code.
  • Disable or delete the access key when no longer needed.
  • Enable least-privilege permissions.
  • Rotate access keys regularly.

Visit https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#securing_access-keys for more information.

Create the S3 Bucket

❗️

Data Purging and Syncing

After every sync, the data in your connected S3 bucket is deleted. To prevent data loss from misconfiguration or for retention purposes, create a dedicated S3 bucket for the connector.

⚠️

Naming S3 Buckets

You can’t change bucket names after creation! If you want to change the bucket name, create a new bucket with a new name and copy any necessary data over.

Create an S3 bucket for the connector data destination:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/s3.
  2. Click the AWS region drop-down list next to your name in the upper right and select the desired region for the bucket.
  3. Under General Purpose Buckets, click the Create Bucket button.
  4. Under General Configuration, enter tarsal-s3-destination for Bucket Name.
  5. Under Object Ownership, ensure ACLs Disabled is selected.
  6. Leave the remaining default selections unchanged and click the Create Bucket button.

Configure the S3 Destination Connector

This reference table describes the portal fields required for key and role authentication types. Replace the values in brackets ({}) with the applicable information for your AWS account and resources.

ParameterDescriptionAuthentication MethodFormatExample
EndpointThe S3 URL when the bucket is configured as a static websiteIAM role
Access key ID and secret
http://{S3_BUCKET_NAME}.s3-website.{AWS_REGION}.amazonaws.comhttp://example-bucket.s3-website.us-west-2.amazonaws.com
S3 Bucket NameThe name of the S3 bucket storing the destination file(s)IAM role
Access key ID and secret
{S3_BUCKET_NAME}tarsal-s3-destination
S3 Bucket Path (optional)The folder within the bucket storing the destination file(s)IAM role
Access key ID and secret
folder/customers/
S3 Bucket RegionThe AWS location of the bucketIAM role
Access key ID and secret
country-region-numberus-west-2
IAM Role ARNThe AWS resource name of the role assumed by the connectorIAM rolearn:aws:iam:{AWS_REGION}:{AWS_ACCOUNT_ID}:{QUEUE_NAME}arn:aws:iam:us-east-2:012345678901:tarsal-s3-queue
S3 Key IDThe AWS ID for the S3 key that provides permissionsAccess key ID and secret16-128 charactersAKIAVKPEQFPUU7XNIKXW
S3 Access KeyThe S3 secret for the AWS S3 key IDAccess key ID and secret16-128 charactersuZyhAyiDJ4sEgAtei5haQ9NNmaX3jVRME8sUWUHF
Output FormatThe delimiter in the destination data file(s)IAM role
Access key ID and secret
CSV JSONL
Format TypeNormalization method for CSV onlyIAM role
Access key ID and secret
No flattening Root-level flattening

Add and configure the S3 destination connector based on the chosen authentication method:

  1. Sign in to the Tarsal portal at https://app.tarsal.cloud.
  2. Go to Configuration > Destinations from the left navigation.
  3. Click the Add Destination button in the upper right.
  4. Click AWS S3.
  5. Under Metadata, enter AWS S3 for Name.
  6. Under Configuration, enter tarsal-s3-destination for S3 Bucket Name.
  7. Click the S3 Bucket Region drop-down list and select the bucket’s AWS region.
  8. Click the Authentication drop-down list and select your chosen method.
    1. For IAM Role Authentication
      1. For Auth Method, paste the IAM Role ARN you previously copied for AWS ARN Role. You can also enter arn:aws:iam::{AWS_ACCOUNT_ID}:role/tarsal-s3-destination-connector-role. Replace {AWS_ACCOUNT_ID} with your 12-digit AWS account number. Do not enclose the value in brackets.
    2. For Access Key ID and Secret
      1. Open the downloaded file tarsal-s3-destination-connector-user_accessKeys.csv.
      2. Copy Access Key ID and paste the value for S3 Key ID.
      3. Copy Secret Access Key and paste the value for S3 Access Key.
  9. Under Output Format
    1. Select CSV: Comma-separated values or JSONL: Newline-delimited JSON.
    2. If using CSV, select No flattening or Root-level flattening under Format Type.
      See Output Schema below for more information.
  10. Click the Save button.

The portal immediately notifies you whether the connector configuration is successful with a status banner in the lower right.

If the connector configuration fails, verify all preceding steps or contact Tarsal customer support. See the next section for testing.

Test the Connector

In the portal, hover over the icon in the Health column for the connector in the list on the Destinations page or next to the Status label on the connector’s detail page. A broken heart icon indicates failure, and the Summary widget on the Destinations page also lists destinations errors.

📘

Updating AWS Configurations

Be sure to test the connector configuration in the portal after any AWS changes to associated users, roles, policies, or regions.

  1. Sign in to the Tarsal portal at https://app.tarsal.cloud.
  2. Go to Configuration > Destinations from the left navigation.
  3. In the Destinations list, click . . . (three dots) under the Actions column for the connector.
  4. Select Test from the drop-down list.

A banner in the lower right indicates the success or failure of the connector configuration test.

Alternatively, the connector can be tested directly from the connector destination detail page using the Test button in the upper right.

Considerations

Output Path

The naming patterns for output paths follow these conventions:

  • The upload timestamp is a concatenation of the upload date (YYYY-MM-DD), upload time (ms), and partition ID
  • The path sections and upload date use underscores (_) as separators
  • The partition ID is always 0

The structure of the full output path for data files is:

bucket-name/optional-source-namespace/stream-name/upload-date_upload_milliseconds_partition-id.format-extension

For example:

Example Output Path

📘

Stream Prefixes

Stream names have prefixes if defined in the connector configuration.

Output Schema

Each stream renders a complete datastore of all output files to the path set in the connector configuration, with one stream per directory.

ColumnConditionDescription
dataAlways existsThe log or event data
_tarsal_metadataAlways existsTarsal-defined column for each record; an event processing timestamp and UUID
Root-level fieldsRoot-level normalization (flattening)Expanded root-level fields

There are two data output formats available: CSV and JSONL. CSV has formatting options unavailable with JSONL, including no flattening and root-level flattening.

CSV

If the connector configuration’s format type is no flattening, the file output has two columns: timestamp and data blob. If the format type is root-level flattening, the file output has three columns: timestamp, UUID, and data blob.

Root-level flatting moves the UUID from the data blob into a dedicated column through a normalization process.
To understand how data normalization affects CSV output, consider this example data source JSON object:

{
  "user_id": 123,
	"name":	{
		"first": "John",
  	"last": "Doe"
	}
}

The following table shows CSV output based on the format type:

Format TypeColumnsRecords
No flattening
(no normalization)
_tarsal_metadata,data{"_emitted_at":1622135805000,_ab_id:26d7...a206},{"user_id":123,name:{"first":"John","last":"Doe"}}
Root-level flattening
(normalization)
_tarsal_metadata,user_id,name{"_emitted_at":1622135805000,_ab_id:26d7...a206},123,{name:{"first":"John","last":"Doe"}

JSON Lines (JSONL)

JSON Lines, or newline-delimited JSON, is a text format for structured data. JSONL has one JSON per line and handles tabular and nested data better than CSV. The output file has the following structure:

{  
  "_tarsal_metadata": "<json-metadata>"  
  "data": "<json-data-from-source>"  
}

An example of JSON source and output follows:

[
  {
    "user_id": 123,
    "name": {
      "first": "John",
      "last": "Doe"
    }
  },
  {
    "user_id": 456,
    "name": {
      "first": "Jane",
      "last": "Roe"
    }
  }
]
{ "_tarsal_metadata": {"_tarsal_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_tarsal_emitted_at": "1631948170000"}, "data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } }
{ "_tarsal_metadata": {"_tarsal_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_tarsal_emitted_at": "1631948170000"}, "data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } }