How can I ingest log events into Panther if they contain duplicate field names?
QUESTION
How can I ingest logs into Panther if the logs have duplicate field names? For example, if I have a log schema with the fieldsversion
and VERSION,
how can I ingest them into Panther?
ANSWER
When working with logs which have duplicate fields, there are three options: use a copy transformation, drop one of the fields, or parse the parent object as JSON.
Option 1: Use the copy transformation
Panther supports using the copy transformation to move a field to the top level of the event, and give it a different name.
As an example, consider this log:
{ "date": "2023-04-21T00:00:00Z", "attributes": { "message": "I am a log!", "version": 0.1 "VERSION": 0.1 } }
For this log, you could write the following schema:
fields: - name: date type: timestamp - name: attributes type: object fields: - name: message type: string - name: version type: float - name: attributes_version type: float copy: from: attributes.VERSION
This method has the benefit of not losing data, nor having to adjust existing queries and mappings for this log type. The final format of your event in Panther has a different structure than the original event (since a field was moved and renamed). It's important to be aware of this change when using the copy transformation.
Option 2: Drop one of the fields
In the case where the duplicate fields will always have the same data (i.e. event.version == event.VERSION
), it may be sufficient to drop one of the fields. In this case, simple omit either version
or VERSION
from your schema, and Panther will ignore the field, preventing any duplicate field errors.
Option 3: Parse the parent object as JSON
Typically when a duplicate field error arises, the fields are nested within a parent object (as in event.some_object.version
). In such a case, you can choose to set the data type of the parent as JSON
, in which case the object won't be flattened upon Snowflake ingestion, and you'll be able to differentiate between version
and VERSION
.
Taking the sample log in Option 1, if in the schema we set attributes.type = object
, then this event will fail at ingest, due to two columns with the same name. This is because the object
data type reduces all the fields it contains to lowercase.
Alternatively, we can set attributes.type = json
, in which case the object will ingest without issue. This is because JSON objects allow for the same name with different casing.
Casting the object to JSON is flexible, and doesn't result in any lost data, but may require you to rewrite existing SQL queries or integrations with data models and lookup tables.