How can I get Panther to ingest old data from an S3 bucket?

Last updated
Save as PDF

QUESTION

When I create an S3 source in Panther, any new logs that are saved to that bucket will be ingested into Panther. How can I get Panther to retroactively ingest the logs which were in ther bucket prior to me creating the S3 source?

ANSWER

Panther Labs provides a command-line tool specifically designed to backfill old S3 data into Panther. The tool is called s3sns . The tool manually triggers notifications to Panther, informing it to parse and ingest S3 data you specify.

Ensure your environment is configured for using AWS CLI commands.
- This means you'll need some way to pass your AWS account credentials to s3sns, either as environment variables or as explicit parameters.
Download the s3snstool as detailed in our main documentation.
Assuming your AWS account parameters are stored as environment variables, call the following:
./s3sns -s3path s3://<bucket_name>/<key_prefix>

Running this command will tell Panther to ingest all log files stored under the specified prefix. If you wish to ingest all objects within the entire bucket, simply ignore the prefix option when specifying s3path.

Note the following:

Any data ingested still counts towards your monthly data quota - if you have a large but unimportant amount of data to backfill, it may be best to split the load over several months.
Any log events within the prefix specified when calling s3sns will be ingested, even if the log has been ingested previously. Be careful not to run the command more than once with overlapping S3 paths.