S3 with SQS

Source Connector

Introduction

The following information is a configuration reference for your data's Amazon S3 with Simple Storage Service source connector.

The AWS Identity and Access Management (IAM) web service secures access control to AWS resources. IAM manages granular permissions for authentication and authorization to control which users may sign in and what resources they can access.

AWS uses Amazon Resource Names (ARNs) to identify unique resources and services. Applications must use ARNs in policies to access multiple resources within AWS. ARNs have general and resource-specific formats. More information is available at https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html.

Overview

An S3 bucket containing objects may be the origin of data consumed and transformed by Tarsal. The S3 with SQS connector uses Amazon's Simple Queue Service for new data notifications.

The S3 source bucket sends events to an SQS queue upon S3 object creation and update. Tarsal polls the message queue, periodically checking for new information (published messages) based on the connector’s sync frequency defined in the portal. Data ingestion is initiated when it detects these events.

Setting up an S3 source connector requires these tasks in the AWS Management Console and the Tarsal portal:

Prerequisites

The connector requires specific permissions for S3 and SQS to connect to S3 data sources.

❗️

Before You Begin

Confirm you have AWS console administrative privileges for IAM, S3, and SQS before configuration.

Authentication

The S3 source connector supports these authentication methods:

Authentication MethodDescriptionDocumentation
Access Key ID and Access SecretThe access key and secret ID for AWS authenticationManaging access keys for IAM users
IAM RoleThe permissions role associated with an S3 bucket and SQS queueIAM roles

Permissions

For either authentication method above, the connector requires the actions and access levels below:

Amazon ServiceActionAccess LevelResource(s)
S3GetObjectReadAll objects or specified object(s)
S3ListBucketListAll buckets or specified buckets(s)
SQSChangeMessageVisibilityWriteAll messages in all queues or specified queue(s)
SQSDeleteMessageWriteAll messages in all queues or specified queue(s)
SQSGetQueueAttributesReadAll queues or specified queue(s)
SQSGetQueueUrlReadAll queues or specified queue(s)
SQSReceiveMessageReadAll messages in all queues or specified queue(s)
SQSSendMessageWriteAll messages in all queues or specified queue(s)

Considerations

Access Keys vs. IAM Roles

S3 connectors authenticate with either access keys or IAM roles in the Tarsal administrative portal.

Access keys have two parts: an access key ID and a secret access key. Think of the access key ID as the username and the secret access key as the password. The connector creates a token with them for AWS authentication.

Access keys should be saved securely. The secret access key is only retrievable upon creation. If the secret access key is lost, a new one must be generated.

AWS supports several types of IAM roles, and service-linked roles for connector authentication are recommended. A service-linked role is tied to a service (i.e., EC2, S3, RDS) with a specific purpose: to assume a role for performing actions. The service owns service-linked roles.

Access keys require more maintenance; they are easily distributable and should be regularly rotated for security purposes. Removing secret access keys immediately invalidates them, preventing applications from functioning until updated with the new credentials. They typically require code updates and deployments that may increase timelines more than other authentication methods and live longer than IAM roles.

IAM roles provide more security than access keys and have a limited scope of permissions. They’re temporary, centrally managed in the AWS console, and not distributable.

📘

Choosing a Connector Authentication Method

IAM roles are the preferred authentication method for the majority of Tarsal customers. AWS and Tarsal encourage roles instead of keys.

AWS Infrastructure

Using Existing AWS Infrastructure

All configuration instructions assume creating new AWS resources across all services for the S3 source connector. Information regarding reusing IAM, S3, and SQS objects is possible, though not recommended.

If you choose to use existing infrastructure, replace the suggested resource names in this guide with your existing ones where applicable.

⚠️

Selecting AWS Regions

The connector's resources and services should reside in the same AWS region.

IAM and Resource Security

The policies presented here provide the connector access to all resources (*) under the necessary services to simplify the configuration process and limit policy maintenance.

AWS and Tarsal recommend defining access levels as restrictively as possible. Limit the connector to only the necessary S3 buckets, SQS queues, and actions.

❗️

Before You Begin

Throughout the configuration, you’ll be copying values for later use. Save them to an easily accessible location for reference.

Create the IAM Policy

IAM policies are allow policies that control permissions for AWS objects. They define and enforce resource access for users, roles, and services. Policies are checked on every request and are agnostic of the operational method used (the AWS Console, CLI, or API).

The policy grants AWS services to associate with the connector’s IAM user and role.

Create the IAM policy with S3 and SQS object permissions:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
  2. Go to Access Management > Policies from the left navigation.
  3. Click the Create Policy button in the upper right.
  4. Click the JSON tab.
  5. In the Policy Editor
    1. Delete the existing JSON.
    2. Copy and paste the following policy:
      {  
      	"Version": "2012-10-17",  
      	"Statement": [
      		{  
      			"Effect": "Allow",  
      			"Action": [  
      				"s3:GetObject",  
      				"s3:ListBucket",  
      				"sqs:ChangeMessageVisibility",  
      				"sqs:DeleteMessage",  
      				"sqs:GetQueueAttributes",  
      				"sqs:GetQueueUrl",  
      				"sqs:ReceiveMessage",  
      				"sqs:SendMessage"  
      			],
      			"Resource": [  
      				"*",  
      				"*"  
      			]  
      		}  
      	]  
      }
      
    3. Click the Next button.
  6. Under Policy Details, enter tarsal-s3-source-connector-policy for Policy Name.
  7. Click the Create Policy button.

Create the IAM Role

IAM roles are identities with specific permissions and short-lived credentials. Roles are assigned to IAM identities with permissions and trust policies and for access management.

Create a service-linked assumable role, add a custom trust policy for the connector, and attach the permissions policy:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
  2. Go to Access Management > Roles from the left navigation.
  3. Click the Create Role button in the upper right.
  4. For Trusted Entity Type, select the Custom Trust Policy radio button.
  5. Sign in to the Tarsal portal at https://app.tarsal.cloud.
  6. Go to Account > Settings from the left navigation.
  7. Click the Cloud Tools tab.
  8. Next to Sample Trust Policy with Minimum Privileges, click the clipboard icon () on the right to copy the trust policy. The policy includes Tarsal’s AWS account ID and an external ID to assume the role.
  9. Copy and paste the policy into the in-line text editor.
  10. Click the Next button.
  11. Under Permissions Policies, click the drop-down list under Filter by Type and select Customer Managed.
  12. Locate tarsal-s3-source-connector-policy and select the checkbox to the left of the name.
  13. Click the Next button.
  14. Under Role Details, enter tarsal-s3-source-connector-role for Role Name.
  15. Click the Create Role button.

📘

Choosing a Connector Authentication Method

The connector configuration supports two authentication methods: access keys and IAM roles. Complete the steps in the next section only if you’re authenticating with access keys. Please refer to Access Keys vs. IAM Roles when making a decision.

Create the IAM User

Create an IAM user and assign permissions. This user assumes the role previously created.

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
  2. Go to Access Management > Users from the left navigation.
  3. Click the Create User button in the upper right.
  4. Under User Details, enter tarsal-s3-source-connector-user for User Name.
  5. Click the Next button.
  6. Under Permissions Options, click the radio button next to Attach Policies Directly.
  7. Under Permissions Policies, click the drop-down list under Filter by Type and select Customer Managed.
  8. Locate the tarsal-s3-source-connector-policy and select the checkbox to the left of the name.
  9. Click the Next button.
  10. Click the Create User button.

Optionally Create Access Keys

Complete the following steps only if you plan to authenticate the connector with access keys.

⚠️

User Security

AWS best practices recommend assigning permissions to user groups rather than users. The following steps assume a single Tarsal S3 source connector user; therefore, the instructions attach permissions to the user rather than a user group. However, you may assign the user to a group and apply the IAM policy there.

Alternatively, you may use this same connector for multiple S3 sources.

Generate access keys for previously created user:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
  2. Go to Access Management > Users from the left navigation.
  3. In the Users list, click tarsal-s3-source-connector-user.
  4. Click the Security Credentials tab.
  5. Locate Access Keys and click the Create Access Key button.
  6. For Use Case, select the Third-party Service radio button.
  7. Check the checkbox under Confirmation.
  8. Click the Next button.
  9. For Description Tag Value, enter Tarsal S3 Source Connector.
  10. Click the Create Access Key button.
  11. Click the Download .csv File button.
  12. Click the Done button after the .csv download.

The keys are added later when configuring connector authentication.

⚠️

Storing Keys

Record and store the secret access key in a safe place! Lost secret access keys are not recoverable.

If you no longer have your key, deactivate the old key, generate a new one, and update the connector credentials in the Tarsal portal.

📘

AWS Access Key Best Practices

  • Never store your access key in plain text, a code repository, or code.
  • Disable or delete the access key when no longer needed.
  • Enable least-privilege permissions.
  • Rotate access keys regularly.

Visit https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#securing_access-keys for more information.

Create the S3 Bucket

⚠️

Naming S3 Buckets

You can’t change bucket names after creation! If you want to change the bucket name, create a new bucket with a new name and copy any necessary data over.

Create an S3 bucket in the AWS Management Console for the connector data source:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/s3.
  2. Click the AWS region drop-down list next to your name in the upper right and select the desired region for the bucket.
  3. Under General Purpose Buckets, click the Create Bucket button.
  4. Under General Configuration, enter tarsal-s3-source for Bucket Name.
  5. Under Object Ownership, ensure ACLs Disabled is selected.
  6. Leave the remaining default selections unchanged and click the Create Bucket button.
  7. Under General Purpose Buckets
    1. Locate the tarsal-s3-source bucket visually or by searching.
    2. Click the radio button to the left of the bucket name.
    3. Click the Copy ARN button and save the value for retrieval later.

❗️

S3 Data Sources

Don’t forget to populate the bucket with source data for the connector.

Create the SQS Queue

An access policy allows S3 to post messages to a queue. The SQS region and owner AWS account ID are required.
See the information at the end of this section regarding queue configuration options.

⚠️

Choosing Queues

While you can use an existing SQS queue that already receives messages, Tarsal strongly recommends a dedicated queue per source connector over shared queues for separation of concerns and troubleshooting.

⚠️

Naming SQS Queues

You can’t change queue names after creation! If you want to change the queue name, create a new queue with a new name and reconfigure it.

📘

Locating your AWS Account ID and Region

The account ID location in AWS varies depending on whether you’re logged in as root or an IAM user. See https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindAccountId for more information.

The AWS region is available in the upper right of the global navigation. Click the drop-down list of region names to the left of your name. The name is highlighted and located on the right in the menu (i.e. us-east-1, us-west-2).

Set up an SQS queue to receive S3 bucket messages for the connector:

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/sqs.
  2. Under Get Started, click the Create Queue button.
  3. Under Details, enter tarsal-s3-source-connector-queue for Name.
  4. Under Configuration
    1. For Visibility Timeout
      1. Enter 600 in the text field.
      2. Select Seconds from the drop-down list.
    2. For Message Retention Period
      1. Enter 7 in the text field.
      2. Select Days from the drop-down list.
  5. Under Access Policy > Choose Method, select the Advanced radio button to modify the JSON.
  6. Delete the entire policy JSON from the in-line text editor.
  7. Copy and paste the following policy into the in-line text editor:
    {  
      "Version": "2012-10-17",  
      "Statement": [  
        {  
          "Effect": "Allow",  
          "Principal": {  
            "AWS": "*"  
          },  
          "Action": "SQS:SendMessage",  
          "Resource": "arn:aws:sqs:{AWS_REGION}:{AWS_ACCOUNT_ID}:tarsal-s3-source-connector-queue",  
          "Condition": {  
            "ArnLike": {  
              "aws:SourceArn": "arn:aws:s3:*:*:tarsal-s3-source"  
            }  
          }  
        }  
      ]  
    }
    
  8. Add the appropriate AWS Region and Account ID to the JSON policy:
    1. Replace {AWS_ACCOUNT_ID} in the policy JSON with your AWS Account ID. Do not enclose the account ID value in brackets. The account ID location in AWS varies depending on whether you’re logged in as root or an IAM user. See https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindAccountId for more information.
    2. Replace {AWS_REGION} in the policy JSON with the AWS region where the SQS queue resides. Do not enclose the region value in brackets.
  9. Click the Create Queue button.
  10. Under Details > ARN, copy the queue’s ARN and URL for later use.

Successfully processed SQS messages are deleted; Tarsal attempts to resolve internal errors through periodic retries.

Considerations

AWS provides several helpful settings for SQS queue configuration. The following settings impact the processing of data sources. The instructions above include Tarsal’s recommended values, which differ from SQS defaults to safeguard the ingestion process.

Visibility Timeout

The visibility timeout is the length of time a queue message is hidden from other message consumers after being received by one consumer. If messages are not deleted before the visibility timeout, duplicates can occur. The steps above include Tarsal’s recommended value to safeguard those situations.

Message Retention Period

The message retention period is the length of time a message persists if not deleted. SQS automatically deletes queue messages after the maximum retention period.

Information attached to deleted messages is not recoverable. Longer retention periods increase persistent storage costs, so activating the source connector prevents unnecessary message retention expenses.

Add S3 Event Notifications

S3 event notifications let users receive alerts for specific bucket events and act on them. Examples of events include object creation, deletion, and restoration, and notifications can be routed to other services, like SQS and Simple Notification Service (SNS).

Configure S3 source bucket notifications to publish queue object creation events for the connector.

  1. Sign in to the AWS Management Console at https://console.aws.amazon.com/s3.
  2. Under General Purpose Buckets, click tarsal-s3-source.
  3. Click the Properties tab at the top.
  4. Locate the Event Notifications section and click the Create Event Notification button.
  5. Under General Configuration, enter tarsal-s3-source-object-creation for Event Name.
  6. Under Event Types, select the checkbox for All object create events.
  7. Under Destination, select the radio button for SQS queue.
  8. Under Specify SQS Queue, select one of the two methods:
    1. If Choose from your SQS queues is selected, click on the SQS Queue drop-down list and select the queue created earlier that receives the S3 source bucket messages.
    2. If Enter SQS queue ARN is selected, paste the value copied after creating the queue. Alternatively, click this queue from the list of queues and copy the ARN under Details and paste it into the SQS Queue field.
  9. Click the Save Changes button.

Verify Event Notifications

AWS sends a test notification to the queue post-creation. To verify receipt

  1. Click on tarsal-s3-source-connector-queue from the SQS queue list.
  2. Under Details, click the More link.
  3. Locate Messages available on the right, which should have a value of 1.

Considerations

Filter Event Notifications

If you plan to add event notifications for the S3 source bucket unrelated to Tarsal connectors, consider using prefixes. Prefixes, defined on event notification creation, are optional S3 bucket paths that filter notifications. They limit notifications to objects starting with the prefix, which is useful when multiple event types on the same bucket are processed differently.

For example, objects for Tarsal ingestion could be stored in an S3 bucket folder, /tarsal/. With /tarsal/ as the event notification prefix, the associated SQS queue only receives relevant notifications rather than those for unrelated objects and processes.

❗️

Invalid Events

If you specify an S3 object in your IAM policy (see below) that doesn’t match the configuration’s event notification bucket prefix, messages are processed, ignored, and deleted. Invalid JSON events are treated similarly.

Verify the IAM Role

AWS provides the IAM Policy Simulator, available at https://policysim.aws.amazon.com/home/index.jsp, to confirm that the assigned permissions give the expected results. More information about the simulator is available at https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_testing-policies.html.

Test the policy, role, and user:

  1. Go to https://policysim.aws.amazon.com/home/index.jsp.
  2. Under Users, Groups, and Roles, click the drop-down list on the left and select Roles.
  3. Click tarsal-s3-source-connector-role.
  4. Under Policy Simulator, click the Select Service drop-down list.
  5. In the Filter field, type S3 and click on the S3 result.
  6. Click the Select Actions drop-down list, then
    1. Select the checkboxes to the left of the following labels:
      1. GetObject
      2. ListBucket
    2. Click anywhere on the page to dismiss the Select Actions list.
  7. Under Policy Simulator, click the Select Service drop-down list (which now displays Amazon S3).
  8. In the Filter field, delete the text S3.
  9. Type SQS and click on the SQS result.
  10. Click the Select Actions drop-down list, then
    1. Click the checkboxes to the left of the following labels:
      1. ChangeMessageVisibility
      2. DeleteMessage
      3. ReceiveMessage
      4. SendMessage
      5. GetQueueAttributes
      6. GetQueueUrl
    2. Click anywhere on the page to dismiss the Select Actions list.
  11. Click the Run Simulation button on the upper right.
  12. Under the Permission column, confirm each service is allowed and has one matching statement.
  13. Under Users, Groups, and Roles, click the drop-down list on the left and select Users.
  14. Click tarsal-s3-source-connector-user.
  15. Repeat steps 4-12.

If any statements don’t match, review all JSON policies for errors.

Configure the S3 Source Connector

This reference table describes the portal fields required for key and role authentication types. Replace the values in brackets ({}) with the applicable information for your AWS account and resources.

ParameterDescriptionAuthentication MethodFormatExample
S3 Bucket NameThe name of the S3 bucket storing the source file(s)IAM role
Access key ID and secret
folder/table/tarsal-s3-source/customers/
S3 Bucket RegionThe AWS location of the bucketIAM role
Access key ID and secret
country-region-numberus-west-2
IAM Role ARNThe AWS resource name of the role assumed by the connectorIAM rolearn:aws:iam:{AWS_REGION}:{AWS_ACCOUNT_ID}:{QUEUE_NAME}arn:aws:iam:us-east-2:012345678901:tarsal-s3-queue
S3 Key IDThe AWS ID for the S3 key that provides permissionsAccess key ID and secret16-128 charactersAKIAVKPEQFPUU7XNIKXW
S3 Access KeyThe S3 secret for the AWS S3 key IDAccess key ID and secret16-128 charactersuZyhAyiDJ4sEgAtei5haQ9NNmaX3jVRME8sUWUHF
SQS Queue URLThe URL of the queue that receives S3 event notifications from the bucketIAM role
Access key ID and secret
https://sqs.{AWS_REGION}.amazonaws.com/{AWS_ACCOUNT_ID}/{QUEUE_NAME}``https://sqs.us-east-2.amazonaws.com/012345678901/tarsal-s3-queue
SQS Queue ARNThe AWS resource name of the queue that receives S3 event notifications from the bucketIAM role
Access key ID and secret
arn:aws:sqs:{AWS_REGION}:{AWS_ACCOUNT_ID}:{QUEUE_NAME}arn:aws:sqs:us-east-2:012345678901:tarsal-s3-queue

Add and configure the S3 source connector based on the chosen authentication method:

  1. Sign in to the Tarsal portal at https://app.tarsal.cloud.
  2. Go to Configuration > Sources from the left navigation.
  3. Click the Add Source button in the upper right.
  4. Click AWS S3.
  5. Under Metadata, enter AWS S3 for Name.
  6. Under Configuration, enter tarsal-s3-source for S3 Bucket Name.
  7. Click the S3 Bucket Region drop-down list and select the bucket’s AWS region.
  8. Click the Authentication drop-down list and select your chosen method.
    1. For IAM Role Authentication
      1. For Auth Method, enter arn:aws:iam::{AWS_ACCOUNT_ID}:role/tarsal-s3-source-connector-role for AWS ARN Role. Replace {AWS_ACCOUNT_ID} with your 12-digit AWS account number. Do not enclose the value in brackets.
    2. For Access Key ID and Secret
      1. Open the downloaded file tarsal-s3-source-connector-user_accessKeys.csv.
      2. Copy Access Key ID and paste the value for S3 Key ID.
      3. Copy Secret Access Key and paste the value for S3 Access Key.
  9. Enter https://sqs.{AWS_REGION}.amazonaws.com/{AWS_ACCOUNT_ID}/tarsal-s3-queue for SQS Queue URL. The queue URL is also available in the queue's AWS Console Details section.
    1. Replace {AWS_REGION} with the queue location. Do not enclose the value in brackets.
    2. Replace {AWS_ACCOUNT_ID} with your 12-digit AWS account number. Do not enclose the value in brackets.
  10. Enter arn:aws:iam:{AWS_REGION}:{AWS_ACCOUNT_ID}:tarsal-s3-queue for SQS Queue ARN. The queue ARN is also available in the queue's AWS Console Details section.
    1. Replace {AWS_REGION} with the queue location. Do not enclose the value in brackets.
    2. Replace {AWS_ACCOUNT_ID} with your 12-digit AWS account number. Do not enclose the value in brackets.
  11. Click the Save button.

The portal immediately notifies you whether the connector configuration is successful with a status banner in the lower right.

If the connector configuration fails, verify all preceding steps or contact Tarsal customer support. See the next section for testing.

Test the Connector

In the portal, hover over the icon in the Health column for the connector in the list on the Sources page or next to the Status label on the connector’s detail page. A broken heart icon indicates failure, and the Summary widget on the Sources page also lists source errors.

📘

Updating AWS Configurations

Be sure to test the connector configuration in the portal after any AWS changes to associated users, roles, policies, or regions.

  1. Sign in to the Tarsal portal at https://app.tarsal.cloud.
  2. Go to Configuration > Sources from the left navigation.
  3. In the Sources list, click . . . (three dots) under the Actions column for the connector.
  4. Select Test from the drop-down list.

A banner in the lower right indicates the success or failure of the connector configuration test.

Alternatively, the connector can be tested directly from the connector source detail page using the Test button in the upper right.

Considerations

File Formats

Tarsal supports two JSON formats: new line-delimited and non-delimited.

FormatDescriptionExample
New Line Delimited JSONJson events separated by new line{ "ip": "10.0.0.1","name": "Atriedes" "location": "Arrakis" }\n{ "ip": "10.0.0.2", "name": "Gru" , "location": "The Moon" }\n{ "ip": "10.0.0.3", "name": "Aslan", "location": "Narnia" }
Non-Delimited JSONBack to back JSON events{ "ip": "10.0.0.1", "name": "Pele", "location": "Brazil" }{ "ip": "10.0.0.2", "name": "Gru" , "location": "The Moon" }{ "ip": "10.0.0.3", "name": "Aslan", "location": "Narnia" }
JSON ArrayEvents in JSON array format[{ "ip": "10.0.0.1", "name": "Pele", "location": "Brazil" },{ "ip": "10.0.0.2", "name": "Gru" , "location": "The Moon" },{ "ip": "10.0.0.3", "name": "Aslan", "location": "Narnia" }]

📘

File Streaming

Streaming with JSON chunks, rather than loading files into memory, speeds up processing and improves memory usage.

File Compression

The S3 source connector supports the following file compression:

CompressionFile Suffix
gzip.gz