What are the fields in the panther_monitor.public.data_audit table, and what is it used for?

Last updated: May 7, 2026

QUESTION

I'd like to understand the schema of panther_monitor.public.data_audit, what each field means and what the table represents. Is it only for S3-based log sources, or does it cover other ingestion paths too? Can I use it as a shortcut for per-source event counts instead of running counts across panther_views.public.all_logs?

ANSWER

The panther_monitor.public.data_audit is part of Panther's data pipeline monitoring infrastructure that provides an audit trail of S3 object processing events. It's essentially a log for every file that Panther ingests from your log sources. Each read of an object in those internal buckets produces one row in data_audit.

The table contains the columns below, in addition to the standard Panther fields:

Column	Type	Description
operation	variant	The pipeline operation that occurred
status	variant	Outcome of the operation (SUCCESS or FAILURE)
error	variant	The error occurred. Populated only if status is FAILURE
s3bucket	variant	The S3 object's bucket
s3key	variant	The S3 object key (file path) that was processed
s3objectsize	variant	The size of the S3 object in bytes. -1 if unknown. Empty if status is FAILURE
s3objectversionid	variant	S3 version ID of the object; empty on failure
recordcount	variant	The records parsed from the s3 object. Zero if status is FAILURE
assumedrolearn	variant	The AWS role ARN used to access the s3 object

What this table is good for

Confirming that Panther read a specific object from your source, useful when troubleshooting "is my data reaching Panther?" questions.
Spotting failed reads (status = 'FAILURE') for a particular bucket, key prefix, or source.
Comparing record-count totals.
Investigating ingestion filtering or compression issues.

Example query (failed reads for a specific source label in the last 24 hours):

SELECT s3bucket, s3key, p_event_time, recordcount, status
FROM panther_monitor.public.data_audit
WHERE p_occurs_since('1 day') AND status = 'FAILURE' AND p_source_label = 'My Log Source'
ORDER BY p_event_time DESC
LIMIT 100;

Please note that recordcount reflects what Panther saw at read time on the input object. That said, each row represents one object read, and one object can contain many records so count(*) on data_audit is a count of read operations, not events.

For per-source and per-schema event counts, feel free to check: How do I query daily log collection volume by source and schema in Panther?