Panther is missing column(s) when querying the data lake

Last updated: September 3, 2024

QUESTION

When I run a query in the data lake, it seems like columns which I expect to be present in the results are missing. How can I determine the cause?

ANSWER

There are several potential causes; please check the following possibilities to determine if they explain your situation. If none of the following solve your problem, contact our support team.

Possible causes include:

Your query doesn't reference the specified columns.
Your query is checking a view, instead of the correct log table.
The column is a data model alias.
The column is enrichment.

More details for each case are found below.

1. Your query doesn't reference the specified columns.

This possibility is specific to written queries - if you're query was made through our UI, then disregard this explanation.

When writing queries, you must be explicit about which columns are referenced, since Panther will only retrieve the data requested, and nothing more. Search your query for any SELECT statements, and ensure that they are selecting all the columns you'd expect. To troubleshoot, you may choose to use SELECT * to retrieve all columns (though this is not recommended for queries in general, as it slows down performance.)

2. Your query is checking a view, instead of the correct log table.

If your query is searching the panther_views database, then not all columns will be included from the original log/rule match. The generic panther_views tables only contain Panther's standard fields, such as event time, UDM, and indicator fields. Columns specific to a certain log type, for example, are not included. You should instead adjust your query to check the specific log type's table in either panther_logs database, or panther_rule_matches (depending on which record you're trying to query).

3. Your column is a data model alias.

Data models provide an easy way to alias columns/fields for easy of use. For example, Panther has several provided data models, which abstract log properties from several logs into a consistent set of field names, which can make cross-log detection writing much easier. However, data models in Panther are currently only supported for detections, and any data model field names are not valid as columns in the data lake. Review your columns and ensure that you're not searching for any fields which belong to the log type's data model.

4. The column is enrichment.

The p_enrichment field is a special field which Panther provides during detection runtime, and as extra context for rule matches, but it is not present in the standard panther_logs database. You can read more context on this behaviour in the following article: 📄 Enrichment data field p_enrichment missing from log database