Normalization

Data Normalization

Tarsal normalizes logs from every data source to make analysis easy. It also applies a set of standard fields across all log sources to make cross-log correlation simple.

For example, events from a data source have a time that they occurred, but every source won't name their timestamp attribute the same way, nor is it guaranteed that the time has a timezone consistent with other data sources. Tarsal appends a UTC-normalized field called t_event_time to each log which maps to the log's corresponding event time. That lets you query over logs from multiple data sources using t_event_time to properly align and correlate despite their disparate schemas.

We append the below fields to every log record:

  • t_event_time: The event time for the log, normalized to UTC
  • t_parse_time: The time when the event was parsed, normalized to UTC. If an event does not have a timestamp, then t_event_time will be set to t_parse_time
  • t_ip_address: IP address for the log source. Even if one source defines an ip address field as ipAddr, and another defines it as srcIpAddress, you can query across both by searching for t_ip_address.
  • t_email_address: The email address of the user

Table/Bucket/Container Normalization

When configuring a flow with the following destinations, the following normalizations are available:

Snowflake

Normalization TypeDefaultDescription
NONEyesAll streams end up in their own raw table. Raw tables are a single column with a JSON blob inserted into that column. Turns off both SINGLE_TABLE and BASIC
SINGLE_TABLEnoAll streams end up in 1 raw table. Raw tables are a single column with a JSON blob inserted into that column.Turns off both NONE and BASIC
BASICnoAll streams end up in a table where each column is a top level key. JSON sub objects are inserted as JSON blobs. Turns on NONE; turns off SINGLE_TABLE

Note: Basic normalization is not yet supported for the Standard Inserts loading method. To make use of normalized tables, use one of the other two Snowflake loading methods, such as S3 Staging. This is configurable on the destination configuration.