Normalization

Data Normalization

Tarsal normalizes logs from every data source to make analysis easy. It also applies a set of standard fields across all log sources to make cross-log correlation simple.

For example, events from a data source have a time that they occurred, but every source won't name their timestamp attribute the same way, nor is it guaranteed that the time has a timezone consistent with other data sources. Tarsal appends a UTC-normalized field called t_event_time to each log which maps to the log's corresponding event time. That lets you query over logs from multiple data sources using t_event_time to properly align and correlate despite their disparate schemas.

Another example is ip addresses. One source may define an ip address field as ipAddr, and another may define it as srcIpAddress. With Tarsal's normalized field, you can query across both sources by simply searching for t_ip_address.

We append the below fields to every log record:

FieldDescription
t_parse_timeThe time when the log was parsed by Tarsal.
t_event_timeThe event time for the log. Sourced from the log data.
t_ip_addressThe IP address for the log source. Sourced from the log data.
t_email_addressThe email address of the user or actor for the log source. Sourced from the log data.

Note: Date fields (t_event_time and t_parse_time) are normalized to UTC and formatted to a date time string format, a simplified version of ISO 8601 (YYYY-MM-DDTHH:mm:ss.sssZ). Depending on the resolution of the timestamps in the log source, the timestamp may or may not include microseconds.

Table/Bucket/Container Normalization

When configuring a flow with the following destinations, the following normalizations are available:

Snowflake

Normalization TypeDefaultDescription
NONEyesAll streams end up in their own raw table. Raw tables are a single column with a JSON blob inserted into that column. Turns off both SINGLE_TABLE and BASIC
SINGLE_TABLEnoAll streams end up in 1 raw table. Raw tables are a single column with a JSON blob inserted into that column. Turns off both NONE and BASIC
BASICnoAll streams end up in a table where each column is a top level key. JSON sub objects are inserted as JSON blobs. Turns on NONE; turns off SINGLE_TABLE