I run a Cloud Connected deployment of Panther. From a cost perspective, when it is better to use a Scheduled Rule instead of a real-time rule?
The answer varies, but the general rules of thumb are:
Rules using simple Python logic are more cost-efficient as real-time rules.
Rules which make use of the KV store (or DynamoDB cache) for aggregations and counting are often more cost efficient as Scheduled Rules, assuming the query can be run infrequently.
There are several things to consider when performing a cost-benefit analysis for real-time vs. Scheduled Rules:
Detection Lambda overhead
For a standard rule (i.e., one that has no external API calls and no DynamoDB calls), the main contributor of runtime cost is the overhead involved in initializing the detection engine environment. The actual volume of logs being processed does not significantly contribute to the total cost of a lambda execution. This overhead will be present for real-time rules, and Scheduled Rules - although, for a real-time rule, Panther performs batching, running through multiple different detections in the same lambda invocation. This is not necessarily true for a scheduled rule.
Caching (DynamoDB) calls take time - and cost money
Calls to your DynamoDB cache increase costs in two ways:
They generate costs via DynamoDB for utilizing the service.
Retrieving and storing information in DynamoDB takes time, which all adds up over many events being evaluated. Since the cost of a Lambda function is dependent on how long it takes to complete, this can drive up Lambda costs as well.
API calls take time
Similar to DynamoDB calls, external API calls take time to return, which slows down rule evaluation and increases the runtime (and cost) of a Lambda invocation.
Snowflake queries have a minimum cost
Snowflake warehouses enter periods of inactivity when not in use. During this time, the warehouse is "asleep" and doesn't consume credits. Such warehouses are "woken up" when a query is performed. While awake, the warehouses consume credits. Snowflake warehouses have a minimum amount of time they remain awake before becoming inactive—typically, this is around one minute. The result is you will be charged for a minimum of one minute, even if your query takes less time to execute. This doesn't matter much for infrequent queries or those that take a long time to return, but the effects can be very noticeable for queries which take little time and are run frequently. A short query, run every five minutes, will consume credits for 20% of the day in a worst-case scenario.
Panther mitigates this issue by using one warehouse for all Scheduled Searches, so in theory, even if one query takes less than a minute to return, another query can run and make use of that lost time. However, this is situational—it requires that two or more queries would be run at or near the same time. If you don't have queries that line up nicely, then you won't see those benefits. Two 30-seconds queries that run every 10 minutes, but are offset from each other by five minutes, won't be able to capitalize on this batching.