How can I determine which S3 object caused a classification failure in Panther?
When Panther notifies me that I have events which failed to classify from my S3 log source, how can I determine what S3 object those events came from?
You can determine the source file by performing a search of the
data_audit table using the parse time of the classification failure:
- Open the Alert generated by the classification failure. Navigate to the Events tab, and copy the
Run the following query in the Data Explorer:
with cf as (select * from panther_monitor.public.classification_failures where p_occurs_since('30 days') and p_source_label = 'my_s3_log_source'), da as (select * from panther_monitor.public.data_audit where p_occurs_since('30 days')) select CONCAT('s3://',da.s3bucket, da.s3key) as filepath from cf join da on TIMEDIFF(MILLISECOND, cf.p_event_time, da.p_event_time) = 0 limit 10
This will return the bucket name and the path to the file. Because this query uses
TIMEDIFF, it performs a "fuzzy match", and thus there's a chance it will return more than one result. You can adjust the unit used for the
TIMEDIFF command if needed, or (if the result list is small) simply check each of the returned S3 objects.
Note: In the
p_event_time refers to the time the data was ingested or parsed by Panther, not the time of the event Panther is ingesting.
Note: The use of
TIMEDIFF is critical in this query, since there is a minute delay between Panther ingesting a file and attempting to parse each event. In most cases. this delay is less than 1 millisecond, but in some extreme cases, you may need to alter the