My query using p_event_time in Panther runs slowly

Last updated: September 3, 2024

Issue

When trying to query data in the Data Explorer when my Panther instance uses Snowflake (as opposed to Athena), with a query like the following, it takes a long time to run:

SELECT * FROM <reasonably_large_table>_variant
ORDER BY p_event_time DESC
LIMIT 10;

The query contains an ORDER BY and LIMIT clause, but no WHERE constraint on p_event_time

Resolution

To accelerate queries like this, constrain the p_event_time search as follows:

SELECT * FROM <reasonably_large table>_variant
WHERE DATE_TRUNC('MINUTE', p_event_time) in (select MAX(DATE_TRUNC('MINUTE', p_event_time)) FROM <reasonably_large_table>_variant)
ORDER BY p_event_time DESC
LIMIT 10;

or even something simpler like:

SELECT * FROM <reasonably_large table>_variant
WHERE p_occurs_since('24 hours')
ORDER BY p_event_time DESC
LIMIT 10;

Adding a WHERE clause will limit the results that are needed to be ordered. When you use a WHERE clause along with a cluster key like p_event_time, it is already pre-ordered, which means it only has to look in a few partitions versus having to look in every partition.

Cause

Row level, transactional DBs like MySQL or Postgres will execute queries like these relatively quickly if you have an index on p_event_time because the index also serves as a sort key. However, in Snowflake, which is an OLAP DB, even though we cluster on p_event_time , the micro partitions on that column themselves aren't sorted. As a result, the above query still needs to execute a full table scan, which can take very long.

For more information about why Snowflake doesn't sort micro-partitions, see Snowflake's documentation here.