I work with AWS data in Splunk, and I have a few tips.
First, I agree with everything that Rich Galloway said. That's great advice for all Splunk apps.
AWS can be a massive amount of data depending on how many accounts you have, what services you use, and other specifics for your use case. Here are my suggestions:
Account Lookup Table
One thing that I've found incredibly useful is to use a lookup table for human-friendly names for the accounts. I don't know what account ID 123456789 is just by the number, so I use the lookup table to enrich the account ID with something like "prod_application_stack." This is very useful for creating metrics and dashboards that you can present to people, but mostly my infrastructure admins are more responsive to a human-friendly name than an account ID.
AWS Description Data
When you setup your ingestion for AWS Description (sourcetype=aws:description) make sure to configure aws_description_tasks.conf to pull for ALL available services for ALL available regions, and not just the ones you think you use. I've identified instances where developers forgot to be in the correct region, and ended up putting resources somewhere they shouldn't. You could also detect account compromise this way (if you have a security inclination) because bad guys will sometimes create resources where they think you aren't looking.
I think the default ingestion rate for description data is five minutes; which we throttled back to reduce Splunk license utilization. We just don't need the data updated that often. Figure out what's right for you to balance your license usage.
If you wanted it to pull every 15 minutes, your stanza should look like this:
[desc:account_id]
account = Instance profile for your heavy-weight forwarder
aws_iam_role = IAM role that has access to the account (if using multiple accounts)
apis = ec2_instances/900,ec2_reserved_instances/900,ebs_snapshots/900,ec2_volumes/900,ec2_security_groups/900,ec2_key_pairs/900,ec2_images/900,ec2_addresses/900,elastic_load_balancers/900,classic_load_balancers/900,application_load_balancers/900,vpcs/900,vpc_subnets/900,vpc_network_acls/900,cloudfront_distributions/900,rds_instance/900,lambda_functions/900,s3_buckets/900,iam_users/900
index = AWS INDEX
regions = us-west-2,us-west-1,us-east-1,us-east-2,eu-central-1,ap-northeast-1,ap-northeast-2,ap-northeast-3,ap-south-1,ap-southeast-1,ap-southeast-2,ca-central-1,cn-north-1,cn-northwest-1,eu-west-1,eu-west-2,eu-west-3,sa-east-1
sourcetype = aws:description
Field Parsing
The field parsing out of the box with the app is okay, but I've had to customize it a bit for my purposes. I would validate that the fields work the way you want them to work. If they don't, there's no real harm in creating your own props.conf file and sticking it in a "local" directory in the AWS app.
Guard Duty
If you're not planning to use AWS Guard Duty, you should. AWS can detect things that you can't because they have access to underlying data that they don't expose to you. This will be a real lifesaver if anything funky happens. Trust me.
CloudTrail
CloudTrail is awesome, but it's also riddled with lots of nested JSON. This is one of the big areas that I forked from the main AWS app. I don't like working with dot notation in my field names, and I don't want to impose that on my users. It makes SPL unnecessarily long and ugly. At a minimum, I recommend using props.conf to field alias out of the JSON dot notation if you can.
Miscellany
I've found creating asset inventories to be invaluable. I'm currently using a method where I have a scheduled search that runs every few hours, and outputs my EC2 inventory data to a lookup table. I'm in the process of re-writing that to utilize the Inventory datamodel instead. I recommend you do something similar.
If you know for a fact that there are only certain regions you use, create some dashboards (or even better scheduled searches that notify someone) if new resources or CloudTrail API activity occurs in the regions you don’t use. There are things that happen by default, but you should be able to tune out the noise and identify when something that shouldn't be there gets created.
Audit your security groups. It's kind of a pain because of the nested JSON. This will give you a good view of your ingress security group rules:
index=* sourcetype=aws:description source=*ec2_security_groups
| spath path=rules{} output=a
| table account_id, description, id, name, owner_id, region, vpc_id, a
| fields - _raw
| mvexpand a
| spath input=a
| where from_port!="null" AND to_port!="null" AND 'grants{}.cidr_ip'!="null"
| fields - a
This will give you the same view of your egress security group rules:
index=* sourcetype=aws:description source=*ec2_security_groups
| spath path=rules_egress{} output=a
| table account_id, description, id, name, owner_id, region, vpc_id, a
| fields - _raw
| mvexpand a
| spath input=a
| where from_port!="null" AND to_port!="null" AND 'grants{}.cidr_ip'!="null"
| fields - a
And if you want to specifically look for potentially insecure rules, I'd add a match for "/0" to the final where clause:
index=* sourcetype=aws:description source=*ec2_security_groups
| spath path=rules_egress{} output=a
| table account_id, description, id, name, owner_id, region, vpc_id, a
| fields - _raw
| mvexpand a
| spath input=a
| where from_port!="null" AND to_port!="null" AND 'grants{}.cidr_ip'!="null" AND match('grants{}.cidr_ip', "/0")
| fields - a
That will check for rules that are set with "0.0.0.0/0"; which means any IP address. If you see that and it's not something like port 443 or port 80, you might want to investigate why that rule exists.
That's off the top of my head. I'll comment again if I think of anything else.
... View more