How can I determine which S3 object caused a classification failure in Panther?

Last updated: September 3, 2024

QUESTION

When Panther notifies me that I have events which failed to classify from my S3 log source, how can I determine what S3 object those events came from?

ANSWER

You can determine the source file by performing a search of the data_audit table using the parse time of the classification failure:

Open the Alert generated by the classification failure. Navigate to the Events tab, and copy the p_event_time and p_source_label attributes.
Run the following query in the Data Explorer:

with cf as
  (select * from panther_monitor.public.classification_failures where p_occurs_since('30 days') and p_source_label = 'my_s3_log_source'),
da as
  (select * from panther_monitor.public.data_audit where p_occurs_since('30 days'))
select CONCAT('s3://',da.s3bucket,'/',da.s3key) as filepath
from cf join da on TIMEDIFF(MILLISECOND, cf.p_event_time, da.p_event_time) = 0
limit 10

This will return the bucket name and the path to the file. Because this query uses TIMEDIFF, it performs a "fuzzy match", and thus there's a chance it will return more than one result. You can adjust the unit used for the TIMEDIFF command if needed, or (if the result list is small) simply check each of the returned S3 objects.

Note: In the panther_monitor tables, p_event_time refers to the time the data was ingested or parsed by Panther, not the time of the event Panther is ingesting.

Note: The use of TIMEDIFF is critical in this query, since there is a minute delay between Panther ingesting a file and attempting to parse each event. In most cases. this delay is less than 1 millisecond, but in some extreme cases, you may need to alter the TIMEDIFF statement.