How can I ingest log events into Panther if they contain duplicate field names?

Last updated
Save as PDF

QUESTION

How can I ingest logs into Panther if the logs have duplicate field names? For example, if I have a log schema with the fieldsversion and VERSION, how can I ingest them into Panther?

ANSWER

When working with logs which have duplicate fields, there are four options: use a concat transformation, use a copy transformation, drop one of the fields, or parse the parent object as JSON.

Option 1: Use the concat transformation (for mutually-exclusive fields)

If the fields are mutually exclusive, but contain the same information, then you can use the concat transformation to combine the fields under a single column name. For example, if sometimes your log contains a field called "nodeID", but sometimes the logs instead reference "nodeid", you can use this method to consolidate the field under one name for both cases.

Rename the original fields (eg. "nodeID" to "_nodeID1", and "nodeid" to "_nodeID2"). This step is critical to prevent the name conflict.
Create a new field named either "nodeID" or "nodeid", whichever capitalization you prefer.
In the new field, add the concat transformation, and make it the combination of the other 2 fields.

Here's an example of this process in action:

fields:
  - name: _nodeID1
    type: string
    rename:
      from: nodeID
  - name: _nodeID2
    type: string
    rename:
      from: nodeid
  - name: nodeID
    type: string
    concat:
      paths:
        - _nodeID1
        - _nodeID2

Option 2: Use the copy transformation

Panther supports using the copy transformation to move a field to the top level of the event, and give it a different name.

As an example, consider this log:

{
    "date": "2023-04-21T00:00:00Z",
    "attributes": {
        "message": "I am a log!",
        "version": 0.1
        "VERSION": 0.1
    }
}

For this log, you could write the following schema:

fields:
  - name: date
    type: timestamp
  - name: attributes
    type: object
    fields:
      - name: message
        type: string
      - name: version
        type: float
  - name: attributes_version
    type: float
    copy:
      from: attributes.VERSION

This method has the benefit of not losing data, nor having to adjust existing queries and mappings for this log type. The final format of your event in Panther has a different structure than the original event (since a field was moved and renamed). It's important to be aware of this change when using the copy transformation.

Option 3: Drop one of the fields

In the case where the duplicate fields will always have the same data (i.e. event.version == event.VERSION), it may be sufficient to drop one of the fields. In this case, simply omit either version or VERSION from your schema, and Panther will ignore the field, preventing any duplicate field errors.

Option 4: Parse the parent object as JSON (if the fields are nested)

Typically when a duplicate field error arises, the fields are nested within a parent object (as in event.some_object.version). In such a case, you can choose to set the data type of the parent as JSON, in which case the object won't be flattened upon Snowflake ingestion, and you'll be able to differentiate between versionand VERSION.

Taking the sample log in Option 1, if in the schema we set attributes.type = object, then this event will fail at ingest, due to two columns with the same name. This is because the objectdata type reduces all the fields it contains to lowercase.

Alternatively, we can set attributes.type = json, in which case the object will ingest without issue. This is because JSON objects allow for the same name with different casing.

Casting the object to JSON is flexible, and doesn't result in any lost data, but may require you to rewrite existing SQL queries or integrations with data models and lookup tables.