QUESTION

How do I obtain the aggregate size of ingested events in Panther? I'm trying to minimize the intake of certain logs and want to know the amount of data ingested into a specific table.

ANSWER

Size information can be queried using Panther's Data Explorer using a query like the following. Please be advised that this query can take a long time, so to obtain information from a long time period, please consider using multiple queries, each with a small time interval for p_occurs_between instead of a single query with your entire target period.

SELECT
  JSON_EXTRACT_PATH_TEXT(data, 'ClientRequestHost') as client_req_host,
  COUNT(1) as number_events,
  SUM(LENGTH(TO_VARIANT(OBJECT_DELETE(
    AS_OBJECT(data), 'p_event_time', 'p_parse_time', 'p_row_id',
    'p_any_ip_addresses', 'p_any_domain_names', 'p_any_sha256_hashes',
    'p_any_md5_hashes', 'p_any_trace_ids', 'p_log_type', 'p_schema_version'))))
        as total_estimated_event_size,
  total_estimated_event_size/number_events as average_size_per_event
FROM panther_logs.public.<your_desired_table>
WHERE p_occurs_since('1 days')
GROUP BY client_req_host
ORDER BY number_events DESC