AWS Kinesis: Firehose Delivery Streams combines data into one line to S3. How can Panther ingest the logs?
QUESTION
I have a Kinesis Firehose delivery stream that is sending data to an S3 bucket that I wish to have Panther ingest from. The problem is that Firehose is combining my events all into one line like so:
{"customer_id": 1}{"customer_id": 2}
Which Panther cannot ingest because each event needs to be on its own separate line. How can I change my delivery stream to ensure each event gets on its own separate line?
ANSWER
Firehose delivery streams have two settings that can allow you to transform these events so Panther can ingest them:
- Transformation via a Lambda function
- Firehose can be configured to transform your data by sending it to an AWS Lambda function where you can define your own custom logic to transform this log however you'd like. In this case you can write your Lambda function such that it will separate events at each JSON boundary, ie at each occurrance of
}{
- This is the easiest option.
- Firehose can be configured to transform your data by sending it to an AWS Lambda function where you can define your own custom logic to transform this log however you'd like. In this case you can write your Lambda function such that it will separate events at each JSON boundary, ie at each occurrance of
- Dynamic Partitioning
- Firehose has options to add in newlines for each event when using Dynamic Partitioning. To take advantage of this feature, you would need to enable Dynamic Partitioning when creating your Firehose delivery stream (Note: this is only possible to do when you create the delivery stream. It can't be edited afterwards).
- This option involves less infrastructure to configure, but has a few extra options that make it tricky to configure properly.
Dynamic Partitioning
Before setting up Dynamic Partitioning it is important to call out that this option only works if the data coming into Firehose are already single JSON events. ie: if the data you are sending to Firehose is already multiple JSON events concatenated onto one line, this option won't work, and instead you would need to use the transformation via Lambda function as mentioned above.
Also note: if you are using AWS EventBridge to send data to your Firehose delivery stream, the concatenation issue might be happening on the EventBridge side. Take a look at this article for a guide on how to have EventBridge format records on new lines.
To enable Dynamic Partitioning:
- Add a Jq style query for your JSON event to specify fields that you wish to partition by.
- For example: Using this JSON,
{"customer_id": 1}{"customer_id": 2}
we can try the following Jq configuration:
- For example: Using this JSON,
- Set the "New Line Delimiter" option to be enabled.
- Now when you get events in your Delivery Stream, they will be partitioned into your S3 bucket via the key you specified, and if any events have duplicate values of those keys, they will end up in the same S3 object, but they will have a new line in between them thanks to the "New Line Delimiter" option.
After configuring your Firehose Delivery Stream, your final settings should look similar to the screen shot below: