When Panther notifies me that I have events which failed to classify from my S3 log source, how can I determine what S3 object those events came from?
You can determine the source file by performing a search of the data_audit
table using the parse time of the classification failure:
Open the Alert generated by the classification failure. Navigate to the Events tab, and copy the p_event_time
and p_source_label
attributes.
Run the following query in the Data Explorer:
with cf as
(select * from panther_monitor.public.classification_failures where p_occurs_since('30 days') and p_source_label = 'my_s3_log_source'),
da as
(select * from panther_monitor.public.data_audit where p_occurs_since('30 days'))
select CONCAT('s3://',da.s3bucket,'/',da.s3key) as filepath
from cf join da on TIMEDIFF(MILLISECOND, cf.p_event_time, da.p_event_time) = 0
limit 10
This will return the bucket name and the path to the file. Because this query uses TIMEDIFF
, it performs a "fuzzy match", and thus there's a chance it will return more than one result. You can adjust the unit used for the TIMEDIFF
command if needed, or (if the result list is small) simply check each of the returned S3 objects.
Note: In the panther_monitor
tables, p_event_time
refers to the time the data was ingested or parsed by Panther, not the time of the event Panther is ingesting.
Note: The use of TIMEDIFF
is critical in this query, since there is a minute delay between Panther ingesting a file and attempting to parse each event. In most cases. this delay is less than 1 millisecond, but in some extreme cases, you may need to alter the TIMEDIFF
statement.