How can I ingest logs into Panther if the logs have duplicate field names? For example, if I have a log schema with the fieldsversion
and VERSION,
how can I ingest them into Panther?
When working with logs which have duplicate fields, there are four options: use a concat transformation, use a copy transformation, drop one of the fields, or parse the parent object as JSON.
If the fields are mutually exclusive, but contain the same information, then you can use the concat transformation to combine the fields under a single column name. For example, if sometimes your log contains a field called "nodeID", but sometimes the logs instead reference "nodeid", you can use this method to consolidate the field under one name for both cases.
Rename the original fields (eg. "nodeID" to "_nodeID1", and "nodeid" to "_nodeID2"). This step is critical to prevent the name conflict.
Create a new field named either "nodeID" or "nodeid", whichever capitalization you prefer.
In the new field, add the concat transformation, and make it the combination of the other 2 fields.
Here's an example of this process in action:
fields:
- name: _nodeID1
type: string
rename:
from: nodeID
- name: _nodeID2
type: string
rename:
from: nodeid
- name: nodeID
type: string
concat:
paths:
- _nodeID1
- _nodeID2
Panther supports using the copy transformation to move a field to the top level of the event, and give it a different name.
As an example, consider this log:
{
"date": "2023-04-21T00:00:00Z",
"attributes": {
"message": "I am a log!",
"version": 0.1
"VERSION": 0.1
}
}
For this log, you could write the following schema:
fields:
- name: date
type: timestamp
- name: attributes
type: object
fields:
- name: message
type: string
- name: version
type: float
- name: attributes_version
type: float
copy:
from: attributes.VERSION
This method has the benefit of not losing data, nor having to adjust existing queries and mappings for this log type. The final format of your event in Panther has a different structure than the original event (since a field was moved and renamed). It's important to be aware of this change when using the copy transformation.
In the case where the duplicate fields will always have the same data (i.e. event.version == event.VERSION
), it may be sufficient to drop one of the fields. In this case, simply omit either version
or VERSION
from your schema, and Panther will ignore the field, preventing any duplicate field errors.
Typically when a duplicate field error arises, the fields are nested within a parent object (as in event.some_object.version
). In such a case, you can choose to set the data type of the parent as JSON
, in which case the object won't be flattened upon Snowflake ingestion, and you'll be able to differentiate between version
and VERSION
.
Taking the sample log in Option 1, if in the schema we set attributes.type = object
, then this event will fail at ingest, due to two columns with the same name. This is because the object
data type reduces all the fields it contains to lowercase.
Alternatively, we can set attributes.type = json
, in which case the object will ingest without issue. This is because JSON objects allow for the same name with different casing.
Casting the object to JSON is flexible, and doesn't result in any lost data, but may require you to rewrite existing SQL queries or integrations with data models and lookup tables.