Normalization
Data Normalization
Tarsal normalizes logs from every data source to make analysis easy. It also applies a set of standard fields across all log sources to make cross-log correlation simple.
For example, events from a data source have a time that they occurred, but every source won't name their timestamp attribute the same way, nor is it guaranteed that the time has a timezone consistent with other data sources. Tarsal appends a UTC-normalized field called t_event_time
to each log which maps to the log's corresponding event time. That lets you query over logs from multiple data sources using t_event_time
to properly align and correlate despite their disparate schemas.
Another example is ip addresses. One source may define an ip address field as ipAddr
, and another may define it as srcIpAddress
. With Tarsal's normalized field, you can query across both sources by simply searching for t_ip_address
.
We append the below fields to every log record:
Field | Description |
---|---|
t_parse_time | The time when the log was parsed by Tarsal. |
t_event_time | The event time for the log. Sourced from the log data. |
t_ip_address | The IP address for the log source. Sourced from the log data. |
t_email_address | The email address of the user or actor for the log source. Sourced from the log data. |
Note: Date fields (t_event_time
and t_parse_time
) are normalized to UTC and formatted to a date time string format, a simplified version of ISO 8601 (YYYY-MM-DDTHH:mm:ss.sssZ
). Depending on the resolution of the timestamps in the log source, the timestamp may or may not include microseconds.
Table/Bucket/Container Normalization
When configuring a flow with the following destinations, the following normalizations are available:
Snowflake
Normalization Type | Default | Description |
---|---|---|
NONE | yes | All streams end up in their own raw table. Raw tables are a single column with a JSON blob inserted into that column. Turns off both SINGLE_TABLE and BASIC |
SINGLE_TABLE | no | All streams end up in 1 raw table. Raw tables are a single column with a JSON blob inserted into that column. Turns off both NONE and BASIC |
BASIC | no | All streams end up in a table where each column is a top level key. JSON sub objects are inserted as JSON blobs. Turns on NONE; turns off SINGLE_TABLE |
Updated 19 days ago