AWS S3 with SQS
Source Connector
Introduction
The following information is a configuration reference for your data's Amazon S3 with Simple Storage Service source connector.
The AWS Identity and Access Management (IAM) web service secures access control to AWS resources. IAM manages granular permissions for authentication and authorization to control which users may sign in and what resources they can access.
AWS uses Amazon Resource Names (ARNs) to identify unique resources and services. Applications must use ARNs in policies to access multiple resources within AWS. ARNs have general and resource-specific formats. More information is available at https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html.
All streams sent to this destination will end up in a single bucket, with a folder is created for each stream within the bucket.
Overview
An S3 bucket containing objects may be the origin of data consumed and transformed by Tarsal. The S3 with SQS connector uses Amazon's Simple Queue Service for new data notifications.
The S3 source bucket sends events to an SQS queue upon S3 object creation and update. Tarsal polls the message queue, periodically checking for new information (published messages) based on the connector’s sync frequency defined in the portal. Data ingestion is initiated when it detects these events.
Setting up an S3 source connector requires these tasks in the AWS Management Console and the Tarsal portal:
- Creating an IAM policy
- Creating an IAM role
- Creating an IAM user
- Creating an S3 bucket
- Creating an SQS queue
- Adding S3 event notifications
- Configuring an S3 source connector in Tarsal
- Testing the connector
Prerequisites
The connector requires specific permissions for S3 and SQS to connect to S3 data sources.
Before You Begin
Confirm you have AWS console administrative privileges for IAM, S3, and SQS before configuration.
Authentication
The S3 source connector supports these authentication methods:
Authentication Method | Description | Documentation |
---|---|---|
Access Key ID and Access Secret | The access key and secret ID for AWS authentication | Managing access keys for IAM users |
IAM Role | The permissions role associated with an S3 bucket and SQS queue | IAM roles |
Permissions
For either authentication method above, the connector requires the actions and access levels below:
Amazon Service | Action | Access Level | Resource(s) |
---|---|---|---|
S3 | GetObject | Read | All objects or specified object(s) |
S3 | ListBucket | List | All buckets or specified buckets(s) |
SQS | ChangeMessageVisibility | Write | All messages in all queues or specified queue(s) |
SQS | DeleteMessage | Write | All messages in all queues or specified queue(s) |
SQS | GetQueueAttributes | Read | All queues or specified queue(s) |
SQS | GetQueueUrl | Read | All queues or specified queue(s) |
SQS | ReceiveMessage | Read | All messages in all queues or specified queue(s) |
SQS | SendMessage | Write | All messages in all queues or specified queue(s) |
Considerations
Access Keys vs. IAM Roles
S3 connectors authenticate with either access keys or IAM roles in the Tarsal administrative portal.
Access keys have two parts: an access key ID and a secret access key. Think of the access key ID as the username and the secret access key as the password. The connector creates a token with them for AWS authentication.
Access keys should be saved securely. The secret access key is only retrievable upon creation. If the secret access key is lost, a new one must be generated.
AWS supports several types of IAM roles, and service-linked roles for connector authentication are recommended. A service-linked role is tied to a service (i.e., EC2, S3, RDS) with a specific purpose: to assume a role for performing actions. The service owns service-linked roles.
Access keys require more maintenance; they are easily distributable and should be regularly rotated for security purposes. Removing secret access keys immediately invalidates them, preventing applications from functioning until updated with the new credentials. They typically require code updates and deployments that may increase timelines more than other authentication methods and live longer than IAM roles.
IAM roles provide more security than access keys and have a limited scope of permissions. They’re temporary, centrally managed in the AWS console, and not distributable.
Choosing a Connector Authentication Method
IAM roles are the preferred authentication method for the majority of Tarsal customers. AWS and Tarsal encourage roles instead of keys.
AWS Infrastructure
Using Existing AWS Infrastructure
All configuration instructions assume creating new AWS resources across all services for the S3 source connector. Information regarding reusing IAM, S3, and SQS objects is possible, though not recommended.
If you choose to use existing infrastructure, replace the suggested resource names in this guide with your existing ones where applicable.
Selecting AWS Regions
The connector's resources and services should reside in the same AWS region.
IAM and Resource Security
The policies presented here provide the connector access to all resources (*) under the necessary services to simplify the configuration process and limit policy maintenance.
AWS and Tarsal recommend defining access levels as restrictively as possible. Limit the connector to only the necessary S3 buckets, SQS queues, and actions.
Before You Begin
Throughout the configuration, you’ll be copying values for later use. Save them to an easily accessible location for reference.
Create the IAM Policy
IAM policies are allow policies that control permissions for AWS objects. They define and enforce resource access for users, roles, and services. Policies are checked on every request and are agnostic of the operational method used (the AWS Console, CLI, or API).
The policy grants AWS services to associate with the connector’s IAM user and role.
Create the IAM policy with S3 and SQS object permissions:
- Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
- Go to Access Management > Policies from the left navigation.
- Click the Create Policy button in the upper right.
- Click the JSON tab.
- In the Policy Editor
- Delete the existing JSON.
- Copy and paste the following policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket", "sqs:ChangeMessageVisibility", "sqs:DeleteMessage", "sqs:GetQueueAttributes", "sqs:GetQueueUrl", "sqs:ReceiveMessage", "sqs:SendMessage" ], "Resource": [ "*", "*" ] } ] }
- Click the Next button.
- Under Policy Details, enter
tarsal-s3-source-connector-policy
for Policy Name. - Click the Create Policy button.
Create the IAM Role
IAM roles are identities with specific permissions and short-lived credentials. Roles are assigned to IAM identities with permissions and trust policies and for access management.
Create a service-linked assumable role, add a custom trust policy for the connector, and attach the permissions policy:
- Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
- Go to Access Management > Roles from the left navigation.
- Click the Create Role button in the upper right.
- For Trusted Entity Type, select the
Custom Trust Policy
radio button. - Sign in to the Tarsal portal at https://app.tarsal.cloud.
- Go to Account > Settings from the left navigation.
- Click the Cloud Tools tab.
- Next to
Sample Trust Policy with Minimum Privileges
, click the clipboard icon () on the right to copy the trust policy. The policy includes Tarsal’s AWS account ID and an external ID to assume the role. - Copy and paste the policy into the in-line text editor.
- Click the Next button.
- Under Permissions Policies, click the drop-down list under Filter by Type and select
Customer Managed
. - Locate
tarsal-s3-source-connector-policy
and select the checkbox to the left of the name. - Click the Next button.
- Under Role Details, enter
tarsal-s3-source-connector-role
for Role Name. - Click the Create Role button.
Choosing a Connector Authentication Method
The connector configuration supports two authentication methods: access keys and IAM roles. Complete the steps in the next section only if you’re authenticating with access keys. Please refer to Access Keys vs. IAM Roles when making a decision.
Create the IAM User
Create an IAM user and assign permissions. This user assumes the role previously created.
- Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
- Go to Access Management > Users from the left navigation.
- Click the Create User button in the upper right.
- Under User Details, enter
tarsal-s3-source-connector-user
for User Name. - Click the Next button.
- Under Permissions Options, click the radio button next to
Attach Policies Directly
. - Under Permissions Policies, click the drop-down list under Filter by Type and select
Customer Managed
. - Locate the
tarsal-s3-source-connector-policy
and select the checkbox to the left of the name. - Click the Next button.
- Click the Create User button.
Optionally Create Access Keys
Complete the following steps only if you plan to authenticate the connector with access keys.
User Security
AWS best practices recommend assigning permissions to user groups rather than users. The following steps assume a single Tarsal S3 source connector user; therefore, the instructions attach permissions to the user rather than a user group. However, you may assign the user to a group and apply the IAM policy there.
Alternatively, you may use this same connector for multiple S3 sources.
Generate access keys for previously created user:
- Sign in to the AWS Management Console at https://console.aws.amazon.com/iam.
- Go to Access Management > Users from the left navigation.
- In the Users list, click
tarsal-s3-source-connector-user
. - Click the Security Credentials tab.
- Locate Access Keys and click the Create Access Key button.
- For Use Case, select the
Third-party Service
radio button. - Check the checkbox under Confirmation.
- Click the Next button.
- For Description Tag Value, enter
Tarsal S3 Source Connector
. - Click the Create Access Key button.
- Click the Download .csv File button.
- Click the Done button after the
.csv
download.
The keys are added later when configuring connector authentication.
Storing Keys
Record and store the secret access key in a safe place! Lost secret access keys are not recoverable.
If you no longer have your key, deactivate the old key, generate a new one, and update the connector credentials in the Tarsal portal.
AWS Access Key Best Practices
- Never store your access key in plain text, a code repository, or code.
- Disable or delete the access key when no longer needed.
- Enable least-privilege permissions.
- Rotate access keys regularly.
Visit https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#securing_access-keys for more information.
Create the S3 Bucket
Naming S3 Buckets
You can’t change bucket names after creation! If you want to change the bucket name, create a new bucket with a new name and copy any necessary data over.
Create an S3 bucket in the AWS Management Console for the connector data source:
- Sign in to the AWS Management Console at https://console.aws.amazon.com/s3.
- Click the AWS region drop-down list next to your name in the upper right and select the desired region for the bucket.
- Under General Purpose Buckets, click the Create Bucket button.
- Under General Configuration, enter
tarsal-s3-source
for Bucket Name. - Under Object Ownership, ensure ACLs Disabled is selected.
- Leave the remaining default selections unchanged and click the Create Bucket button.
- Under General Purpose Buckets
- Locate the
tarsal-s3-source
bucket visually or by searching. - Click the radio button to the left of the bucket name.
- Click the Copy ARN button and save the value for retrieval later.
- Locate the
S3 Data Sources
Don’t forget to populate the bucket with source data for the connector.
Create the SQS Queue
An access policy allows S3 to post messages to a queue. The SQS region and owner AWS account ID are required.
See the information at the end of this section regarding queue configuration options.
Choosing Queues
While you can use an existing SQS queue that already receives messages, Tarsal strongly recommends a dedicated queue per source connector over shared queues for separation of concerns and troubleshooting.
Naming SQS Queues
You can’t change queue names after creation! If you want to change the queue name, create a new queue with a new name and reconfigure it.
Locating your AWS Account ID and Region
The account ID location in AWS varies depending on whether you’re logged in as root or an IAM user. See https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindAccountId for more information.
The AWS region is available in the upper right of the global navigation. Click the drop-down list of region names to the left of your name. The name is highlighted and located on the right in the menu (i.e.
us-east-1
,us-west-2
).
Set up an SQS queue to receive S3 bucket messages for the connector:
- Sign in to the AWS Management Console at https://console.aws.amazon.com/sqs.
- Under Get Started, click the Create Queue button.
- Under Details, enter
tarsal-s3-source-connector-queue
for Name. - Under Configuration
- For Visibility Timeout
- Enter
600
in the text field. - Select
Seconds
from the drop-down list.
- Enter
- For Message Retention Period
- Enter
7
in the text field. - Select
Days
from the drop-down list.
- Enter
- For Visibility Timeout
- Under Access Policy > Choose Method, select the
Advanced
radio button to modify the JSON. - Delete the entire policy JSON from the in-line text editor.
- Copy and paste the following policy into the in-line text editor:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "SQS:SendMessage", "Resource": "arn:aws:sqs:{AWS_REGION}:{AWS_ACCOUNT_ID}:tarsal-s3-source-connector-queue", "Condition": { "ArnLike": { "aws:SourceArn": "arn:aws:s3:*:*:tarsal-s3-source" } } } ] }
- Add the appropriate AWS Region and Account ID to the JSON policy:
- Replace
{AWS_ACCOUNT_ID}
in the policy JSON with your AWS Account ID. Do not enclose the account ID value in brackets. The account ID location in AWS varies depending on whether you’re logged in as root or an IAM user. See https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindAccountId for more information. - Replace
{AWS_REGION}
in the policy JSON with the AWS region where the SQS queue resides. Do not enclose the region value in brackets.
- Replace
- Click the Create Queue button.
- Under Details > ARN, copy the queue’s ARN and URL for later use.
Successfully processed SQS messages are deleted; Tarsal attempts to resolve internal errors through periodic retries.
Considerations
AWS provides several helpful settings for SQS queue configuration. The following settings impact the processing of data sources. The instructions above include Tarsal’s recommended values, which differ from SQS defaults to safeguard the ingestion process.
Visibility Timeout
The visibility timeout is the length of time a queue message is hidden from other message consumers after being received by one consumer. If messages are not deleted before the visibility timeout, duplicates can occur. The steps above include Tarsal’s recommended value to safeguard those situations.
Message Retention Period
The message retention period is the length of time a message persists if not deleted. SQS automatically deletes queue messages after the maximum retention period.
Information attached to deleted messages is not recoverable. Longer retention periods increase persistent storage costs, so activating the source connector prevents unnecessary message retention expenses.
Add S3 Event Notifications
S3 event notifications let users receive alerts for specific bucket events and act on them. Examples of events include object creation, deletion, and restoration, and notifications can be routed to other services, like SQS and Simple Notification Service (SNS).
Configure S3 source bucket notifications to publish queue object creation events for the connector.
- Sign in to the AWS Management Console at https://console.aws.amazon.com/s3.
- Under General Purpose Buckets, click
tarsal-s3-source
. - Click the Properties tab at the top.
- Locate the Event Notifications section and click the Create Event Notification button.
- Under General Configuration, enter
tarsal-s3-source-object-creation
for Event Name. - Under Event Types, select the checkbox for
All object create events
. - Under Destination, select the radio button for
SQS queue
. - Under Specify SQS Queue, select one of the two methods:
- If
Choose from your SQS queues
is selected, click on the SQS Queue drop-down list and select the queue created earlier that receives the S3 source bucket messages. - If
Enter SQS queue ARN
is selected, paste the value copied after creating the queue. Alternatively, click this queue from the list of queues and copy the ARN under Details and paste it into the SQS Queue field.
- If
- Click the Save Changes button.
Verify Event Notifications
AWS sends a test notification to the queue post-creation. To verify receipt
- Click on
tarsal-s3-source-connector-queue
from the SQS queue list. - Under Details, click the More link.
- Locate
Messages available on the right
, which should have a value of1
.
Considerations
Filter Event Notifications
If you plan to add event notifications for the S3 source bucket unrelated to Tarsal connectors, consider using prefixes. Prefixes, defined on event notification creation, are optional S3 bucket paths that filter notifications. They limit notifications to objects starting with the prefix, which is useful when multiple event types on the same bucket are processed differently.
For example, objects for Tarsal ingestion could be stored in an S3 bucket folder, /tarsal/
. With /tarsal/
as the event notification prefix, the associated SQS queue only receives relevant notifications rather than those for unrelated objects and processes.
Invalid Events
If you specify an S3 object in your IAM policy (see below) that doesn’t match the configuration’s event notification bucket prefix, messages are processed, ignored, and deleted. Invalid JSON events are treated similarly.
Verify the IAM Role
AWS provides the IAM Policy Simulator, available at https://policysim.aws.amazon.com/home/index.jsp, to confirm that the assigned permissions give the expected results. More information about the simulator is available at https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_testing-policies.html.
Test the policy, role, and user:
- Go to https://policysim.aws.amazon.com/home/index.jsp.
- Under Users, Groups, and Roles, click the drop-down list on the left and select
Roles
. - Click
tarsal-s3-source-connector-role
. - Under Policy Simulator, click the Select Service drop-down list.
- In the Filter field, type
S3
and click on theS3
result. - Click the Select Actions drop-down list, then
- Select the checkboxes to the left of the following labels:
GetObject
ListBucket
- Click anywhere on the page to dismiss the Select Actions list.
- Select the checkboxes to the left of the following labels:
- Under Policy Simulator, click the Select Service drop-down list (which now displays
Amazon S3
). - In the Filter field, delete the text
S3
. - Type
SQS
and click on theSQS
result. - Click the Select Actions drop-down list, then
- Click the checkboxes to the left of the following labels:
ChangeMessageVisibility
DeleteMessage
ReceiveMessage
SendMessage
GetQueueAttributes
GetQueueUrl
- Click anywhere on the page to dismiss the Select Actions list.
- Click the checkboxes to the left of the following labels:
- Click the Run Simulation button on the upper right.
- Under the Permission column, confirm each service is allowed and has one matching statement.
- Under Users, Groups, and Roles, click the drop-down list on the left and select
Users
. - Click
tarsal-s3-source-connector-user
. - Repeat steps 4-12.
If any statements don’t match, review all JSON policies for errors.
Configure the S3 Source Connector
This reference table describes the portal fields required for key and role authentication types. Replace the values in brackets ({}
) with the applicable information for your AWS account and resources.
Parameter | Description | Authentication Method | Format | Example |
---|---|---|---|---|
S3 Bucket Name | The name of the S3 bucket storing the source file(s) | IAM role Access key ID and secret | folder/table/ | tarsal-s3-source/customers/ |
S3 Bucket Region | The AWS location of the bucket | IAM role Access key ID and secret | country-region-number | us-west-2 |
IAM Role ARN | The AWS resource name of the role assumed by the connector | IAM role | arn:aws:iam:{AWS_REGION}:{AWS_ACCOUNT_ID}:{QUEUE_NAME} | arn:aws:iam:us-east-2:012345678901:tarsal-s3-queue |
S3 Key ID | The AWS ID for the S3 key that provides permissions | Access key ID and secret | 16-128 characters | AKIAVKPEQFPUU7XNIKXW |
S3 Access Key | The S3 secret for the AWS S3 key ID | Access key ID and secret | 16-128 characters | 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 |
SQS Queue URL | The URL of the queue that receives S3 event notifications from the bucket | IAM role Access key ID and secret | https://sqs. {AWS_REGION}.amazonaws.com/ {AWS_ACCOUNT_ID}/ {QUEUE_NAME}`` | https://sqs.us-east-2.amazonaws.com/012345678901/tarsal-s3-queue |
SQS Queue ARN | The AWS resource name of the queue that receives S3 event notifications from the bucket | IAM role Access key ID and secret | arn:aws:sqs:{AWS_REGION}:{AWS_ACCOUNT_ID}:{QUEUE_NAME} | arn:aws:sqs:us-east-2:012345678901:tarsal-s3-queue |
Add and configure the S3 source connector based on the chosen authentication method:
- Sign in to the Tarsal portal at https://app.tarsal.cloud.
- Go to Configuration > Sources from the left navigation.
- Click the Add Source button in the upper right.
- Click AWS S3.
- Under Metadata, enter
AWS S3
for Name. - Under Configuration, enter
tarsal-s3-source
for S3 Bucket Name. - Click the S3 Bucket Region drop-down list and select the bucket’s AWS region.
- Click the Authentication drop-down list and select your chosen method.
- For
IAM Role Authentication
- For Auth Method, enter
arn:aws:iam::{AWS_ACCOUNT_ID}:role/tarsal-s3-source-connector-role
for AWS ARN Role. Replace{AWS_ACCOUNT_ID}
with your 12-digit AWS account number. Do not enclose the value in brackets.
- For Auth Method, enter
- For
Access Key ID and Secret
- Open the downloaded file
tarsal-s3-source-connector-user_accessKeys.csv
. - Copy
Access Key ID
and paste the value for S3 Key ID. - Copy
Secret Access Key
and paste the value for S3 Access Key.
- Open the downloaded file
- For
- Enter
https://sqs.{AWS_REGION}.amazonaws.com/{AWS_ACCOUNT_ID}/tarsal-s3-queue
for SQS Queue URL. The queue URL is also available in the queue's AWS Console Details section.- Replace
{AWS_REGION}
with the queue location. Do not enclose the value in brackets. - Replace
{AWS_ACCOUNT_ID}
with your 12-digit AWS account number. Do not enclose the value in brackets.
- Replace
- Enter
arn:aws:iam:{AWS_REGION}:{AWS_ACCOUNT_ID}:tarsal-s3-queue
for SQS Queue ARN. The queue ARN is also available in the queue's AWS Console Details section.- Replace
{AWS_REGION}
with the queue location. Do not enclose the value in brackets. - Replace
{AWS_ACCOUNT_ID}
with your 12-digit AWS account number. Do not enclose the value in brackets.
- Replace
- Click the Save button.
The portal immediately notifies you whether the connector configuration is successful with a status banner in the lower right.
If the connector configuration fails, verify all preceding steps or contact Tarsal customer support. See the next section for testing.
Test the Connector
In the portal, hover over the icon in the Health column for the connector in the list on the Sources page or next to the Status label on the connector’s detail page. A broken heart icon indicates failure, and the Summary widget on the Sources page also lists source errors.
Updating AWS Configurations
Be sure to test the connector configuration in the portal after any AWS changes to associated users, roles, policies, or regions.
- Sign in to the Tarsal portal at https://app.tarsal.cloud.
- Go to Configuration > Sources from the left navigation.
- In the Sources list, click
. . .
(three dots) under the Actions column for the connector. - Select
Test
from the drop-down list.
A banner in the lower right indicates the success or failure of the connector configuration test.
Alternatively, the connector can be tested directly from the connector source detail page using the Test button in the upper right.
Considerations
File Formats
Tarsal supports two JSON formats: new line-delimited and non-delimited.
Format | Description | Example |
---|---|---|
New Line Delimited JSON | Json events separated by new line | { "ip": "10.0.0.1","name": "Atriedes" "location": "Arrakis" }\n{ "ip": "10.0.0.2", "name": "Gru" , "location": "The Moon" }\n{ "ip": "10.0.0.3", "name": "Aslan", "location": "Narnia" } |
Non-Delimited JSON | Back to back JSON events | { "ip": "10.0.0.1", "name": "Pele", "location": "Brazil" }{ "ip": "10.0.0.2", "name": "Gru" , "location": "The Moon" }{ "ip": "10.0.0.3", "name": "Aslan", "location": "Narnia" } |
JSON Array | Events in JSON array format | [{ "ip": "10.0.0.1", "name": "Pele", "location": "Brazil" },{ "ip": "10.0.0.2", "name": "Gru" , "location": "The Moon" },{ "ip": "10.0.0.3", "name": "Aslan", "location": "Narnia" }] |
File Streaming
Streaming with JSON chunks, rather than loading files into memory, speeds up processing and improves memory usage.
File Compression
The S3 source connector supports the following file compression:
Compression | File Suffix |
---|---|
gzip | .gz |
Updated about 2 months ago