What are the fields in the panther_monitor.public.data_audit table, and what is it used for?

Last updated: May 7, 2026

QUESTION

I'd like to understand the schema of panther_monitor.public.data_audit, what each field means and what the table represents. Is it only for S3-based log sources, or does it cover other ingestion paths too? Can I use it as a shortcut for per-source event counts instead of running counts across panther_views.public.all_logs?

ANSWER

The panther_monitor.public.data_audit is part of Panther's data pipeline monitoring infrastructure that provides an audit trail of S3 object processing events. It's essentially a log for every file that Panther ingests from your log sources. Each read of an object in those internal buckets produces one row in data_audit.

The table contains the columns below, in addition to the standard Panther fields:

Column

Type

Description

operation

variant

The pipeline operation that occurred

status

variant

Outcome of the operation (SUCCESS or FAILURE)

error

variant

The error occurred. Populated only if status is FAILURE

s3bucket

variant

The S3 object's bucket

s3key

variant

The S3 object key (file path) that was processed

s3objectsize

variant

The size of the S3 object in bytes. -1 if unknown. Empty if status is FAILURE

s3objectversionid

variant

S3 version ID of the object; empty on failure

recordcount

variant

The records parsed from the s3 object. Zero if status is FAILURE

assumedrolearn

variant

The AWS role ARN used to access the s3 object

What this table is good for

  • Confirming that Panther read a specific object from your source, useful when troubleshooting "is my data reaching Panther?" questions.

  • Spotting failed reads (status = 'FAILURE') for a particular bucket, key prefix, or source.

  • Comparing record-count totals.

  • Investigating ingestion filtering or compression issues.

Example query (failed reads for a specific source label in the last 24 hours):

SELECT s3bucket, s3key, p_event_time, recordcount, status
FROM panther_monitor.public.data_audit
WHERE p_occurs_since('1 day') AND status = 'FAILURE' AND p_source_label = 'My Log Source'
ORDER BY p_event_time DESC
LIMIT 100;

Please note that recordcount reflects what Panther saw at read time on the input object. That said, each row represents one object read, and one object can contain many records so count(*) on data_audit is a count of read operations, not events.

For per-source and per-schema event counts, feel free to check: How do I query daily log collection volume by source and schema in Panther?