When trying to query data in the Data Explorer when my Panther instance uses Snowflake (as opposed to Athena or a different backend database), with a query like the following, it takes a long time to run:
SELECT * FROM <reasonably_large_table>_variant ORDER BY p_event_time DESC LIMIT 10;
The query contains an
ORDER BY and
LIMIT clause, but no
WHERE constraint on
To accelerate queries like this, constrain the
p_event_time search as follows:
SELECT * FROM <reasonably_large table>_variant WHERE DATE_TRUNC('MINUTE', p_event_time) in (select MAX(DATE_TRUNC('MINUTE', p_event_time)) FROM <reasonably_large_table>_variant) ORDER BY p_event_time DESC LIMIT 10;
or even something simpler like:
SELECT * FROM <reasonably_large table>_variant WHERE p_occurs_since('24 hours') ORDER BY p_event_time DESC LIMIT 10;
Adding a WHERE clause will limit the results that are needed to be ordered. When you use a WHERE clause along with a cluster key like p_event_time, it is already pre-ordered, which means it only has to look in a few partitions versus having to look in every partition.
Row level, transactional DBs like MySQL or Postgres will execute queries like these relatively quickly if you have an index on
p_event_time because the index also serves as a sort key. However, in Snowflake, which is an OLAP DB, even though we cluster on
p_event_time , the micro partitions on that column themselves aren't sorted. As a result, the above query still needs to execute a full table scan, which can take very long.
For more information about why Snowflake doesn't sort micro-partitions, see Snowflake's documentation here.