Normalization

Data Normalization

Tarsal normalizes logs from every data source to make analysis easy. It also applies a set of standard fields across all log sources to make cross-log correlation simple.

For example, events from a data source have a time that they occurred, but every source won't name their timestamp attribute the same way, nor is it guaranteed that the time has a timezone consistent with other data sources. Tarsal appends a UTC-normalized field called t_event_time to each log which maps to the log's corresponding event time. That lets you query over logs from multiple data sources using t_event_time to properly align and correlate despite their disparate schemas.

We append the below fields to every log record:

  • t_event_time: The event time for the log, normalized to UTC
  • t_parse_time: The time when the event was parsed, normalized to UTC. If an event does not have a timestamp, then t_event_time will be set to t_parse_time
  • t_ip_address: IP address for the log source. Even if one source defines an ip address field as ipAddr, and another defines it as srcIpAddress, you can query across both by searching for t_ip_address.
  • t_email_address: The email address of the user

Table/Bucket/Container Normalization

When configuring a flow with the following destinations, theses normalizations are available:

Snowflake

Normalization TypeDefaultDescription
NONEyesAll streams end up in their own raw table. Raw tables are a single column with a JSON blob inserted into that column. Turns off both SINGLE_TABLE and BASIC
SINGLE_TABLEnoAll streams end up in 1 raw table. Raw tables are a single column with a JSON blob inserted into that column.Turns off both NONE and BASIC
BASICnoAll streams end up in a table where each column is a top level key. JSON sub objects are inserted as JSON blobs. Turns on NONE; turns off SINGLE_TABLE

S3

Normalization TypeDefaultDescription
SINGLE_BUCKETyesAll streams end up in a single bucket, and a folder is created for each stream within the bucket