QUESTION

How do I pull in my CloudWatch logs and keep the extra info like logGroup and logStream while still parsing the message field with fastmatch or regex? Can I transform or choose one field of my JSON file to use as my log input?

ANSWER

There are a few ways to parse your text fields with grok patterns, like Panther does with fastmatch log parsing either with Cribl or from inside your detection code.  If you're specifically interested in an individual field parsing feature, reach out to the support team to share your upvote for this functionality!

Cribl

Cribl's grok command can take in the log field and transform it into separate fields.

Python

You can use Python's string split method to parse the log in your detection, if your fields are separated evenly with whitespace or commas (CSV). Otherwise, we recommend using the Cribl parsing method with grok.

The following examples are designed to work with a field that is shaped like this with '-' meaning no value:

{
    "log": "2023-05-10 16:03:05:094 field1 field2 - field4 field5 is all the rest of the line pulled in greedily"
}
String Split Method

Example:

log = {}
log_fields = event.get(log).split(' ', 6)
# Reassemble the timestamp
log['timestamp'] = "{} {}".format(log_fields[0],log_fields[1])
for index in range(2,len(log_fields)):
    # Can be named individually for better naming scheme.
    field_name = "field{}".format(index)
    if log_fields[index] is "-":
        log[field_name] = None
    else:
        log[field_name] = log_fields[index]
PyGrok

You can instead create a custom global helper function using pygrok. (You will need to request this library be installed on your Panther instance by following the instructions in this article: 📄 How do I add a Python library or a module to my runtime environment in Panther?) This will organize the return value into JSON and keep the timestamp together when supplied with the correct pattern definition.

See the full list of grok patterns here.


Example:

from pygrok import Grok

grokparse_text_field(event):
    text = event.get('log')
    pattern = '%{TIMESTAMP_ISO8601:timestamp} %{WORD:field1} %{WORD:field2} %{WORD:field3} %{NOTSPACE:field4} %{GREEDYDATA:field5}'
    grok = Grok(pattern)
    return grok.match(text)