I am using the AWS Add-On for SPlunk to pull in vpcflowlogs via cloudwatch. The problem is Splunk is incorrectly identifying our accountId in each log as an epoch timestamp, placing all of our longs on the exact same time. I verified the filed extractions for this sourcetype specifically extract the AccountId, however, I don't know how to tell Splunk not to automatically extract that field as an epoch timestamp. has anyone run into this before, where Splunk finds a timestamp where none exists? Any thoughts on how to resolve this?
Timestamp extraction is done at parsing time. You do not set this in inputs.conf. Instead, do this in props.conf on the indexer:
[source::/path/to/the/source/file]
#yourtimestamp settings here
This will allow you to use the sourcetype that you choose, and change only the timestamp processing for this particular file (or set of files). Here is the docs page that describes the various settings in props.conf. In particular, I think you should consider using these two settings:
TIME_PREFIX = <regular expression>
MAX_TIMESTAMP_LOOKAHEAD = <integer>
TIME_PREFIX tells Splunk where to start looking for the timestamp. If the timestamp is at the beginning of the event, you don't need this. But it can be useful to make Splunk skip over fields (like the AccountId).
MAX_TIMESTAMP_LOOKAHEAD tells Splunk how many characters to examine for the timestamp. Usually a number like 25 is enough. Splunk starts from either the beginning of the line (or from the end of the TIME_PREFIX when specified) and looks only at the number of characters that you specify. Again, this keeps Splunk from moving past the region where it should find a timestamp, and picking up data from the wrong parts of the event.
These settings will also make the event parsing a little faster.
Timestamp extraction is done at parsing time. You do not set this in inputs.conf. Instead, do this in props.conf on the indexer:
[source::/path/to/the/source/file]
#yourtimestamp settings here
This will allow you to use the sourcetype that you choose, and change only the timestamp processing for this particular file (or set of files). Here is the docs page that describes the various settings in props.conf. In particular, I think you should consider using these two settings:
TIME_PREFIX = <regular expression>
MAX_TIMESTAMP_LOOKAHEAD = <integer>
TIME_PREFIX tells Splunk where to start looking for the timestamp. If the timestamp is at the beginning of the event, you don't need this. But it can be useful to make Splunk skip over fields (like the AccountId).
MAX_TIMESTAMP_LOOKAHEAD tells Splunk how many characters to examine for the timestamp. Usually a number like 25 is enough. Splunk starts from either the beginning of the line (or from the end of the TIME_PREFIX when specified) and looks only at the number of characters that you specify. Again, this keeps Splunk from moving past the region where it should find a timestamp, and picking up data from the wrong parts of the event.
These settings will also make the event parsing a little faster.
I don't have a local props.conf setup on our indexers specific to this sourcetype. The only props.conf files specific to this sourcetype can be found in the AWS add-on on the forwarder and search head, and both of those are the default props file that comes with the app..
I added the TIME_PREFIX to the props.conf on the forwarder and this appears to have done the trick! Thank you so much!
that's great, but there is still a bug in the addon, or rather a behavior we want to support - would be good to get that sample anyway, even if there is a customization that serves as a workaround.
Also, hopefully you added this in local/props.conf, because if this is in /default/props, it WILL get overwritten on upgrade.
I have a similar case, where i want to filer the incoming data from cloudwatch logs. I am trying to configure props.conf and transform.conf. What should be the value of source in this case?
it should be cloudwatchlogs, I think its getting confused about the format.
I tried both cloudwatchlogs and cloudwatchlogs:vpcflow for the sourcetype and both had the same issue.
Did you configure through UI or conf files? Basically, please share the config.
I configured through the UI. Here is the resulting config for this input:
[input_name]
account = [account_name]
delay = 1800
groups = vpcflowlogs
index = taxhubprod
interval = 30
only_after = 1971-01-01T00:00:00
region = us-east-1
sourcetype = aws:cloudwatchlogs:vpcflow
stream_matcher = .*
Nothing there seems to indicate how Splunk would interpret the time stamp however.
timestamp extraction is not handled through input, but rather native Splunk behavior or props/transforms override. For vpcflow logs it's the former.
The props for this sourcetype extract in the following format:
[aws:cloudwatchlogs:vpcflow]
EXTRACT-all=^\s*(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)
This is the format it expects:
2 000000000000 eni-00000000 #ipv4-1# #ipv4-2# #port-1# #port-2# #protocol# #packets# #bytes# #timestamp# #timestamp# #action# OK
It seems Splunk is getting confused about the timestamp in your vpcflow events: instead of grabbing the start_time, it finds the account which also looks like a timestamp.
This does seem like a bug, could you share a sample event for me to try?
I am trying to get a hold of one, however I can no longer see the events in Splunk due to having too many events with the same time stamp (Splunk won't show any of them). I am trying to get one directly from AWS however.