Is there a way to handle variable column counts in logs with Panther?

Last updated: November 11, 2025

QUESTION

How do I parse CSV logs that have a variable number of columns, where some logs contain extra fields at the end that should be ignored?

ANSWER

When dealing with CSV logs that have variable column counts (such as AWS ELB access logs that may have additional fields added over time), the recommended solution is to use a Regex parser.

Use a Regex parser with the trailing expression: (?:\\s+\\S+)*$

This pattern ensures that logs are successfully parsed whether or not they include extra trailing fields. Here's an example configuration:

parser:
  regex:
    patternDefinitions:
      TIMESTAMP: '\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?Z'
    match:
      - '^%{DATA:field1}'
      - '\\s+%{TIMESTAMP:time}'
      - '\\s+%{DATA:field2}'
      # ... other fields
      - '\\s+%{DATA:last_known_field}'
      - '(?:\\s+\\S+)*$'  # This handles any additional trailing fields
    trimSpace: true

Note: When using the Regex parser for this use case, Field Discovery is not necessary. Since the regex parser only discovers fields defined inside the match patterns, enabling Field Discovery won't provide additional benefit for handling variable trailing columns.

For more information on regex parsers, see Panther's regex parser documentation.