I've searched all over and haven't found an answer to this one. My summary index has a subset of events from another index which I collect every 5 minutes. I see the _raw events in the index which is great, but how can I store the original host / source / source type fields in the summary? I've tried the eval command to store the host value in a new field, but it doesn't show up in my summary index. What gives?. I don't want to go back to shell commands and grep 🙂
Example search populating the summary:
index="other" | head 3 | eval orig_host=host | fields orig_host host _raw
Thanks,
Rob
You can only preserver these by renaming them as "orig_*" or by overriding on the command line of the "collect" with:
... | collect ... host="X" sourcetype="Y" source="Z"
Hey!
Since I was searching for this topic/solution, I'll just add what I think is the right solution for this case.
To preserve the _time, host, source and sourcetype:
(...)
| collect index=main output_format=hec
Just remember that if you collect with a sourcetype different than "stash", your collected data will get counted against your license.
after further testing, this is my favorite solution
Just add the following after your base search and orig_host, orig_sourcetype, orig_source and orig_index will all be in your summary index :-)
| rename _raw as orig_raw
# a much simpler solution that I got from Splunk guru "D" :-)
# turns out renaming the _raw field corrects the issue of missing some of the "orig" fields, i.e. orig_sourcetype
# this approach is proabaly not as relavant to Splunk 6 which has many automatic acceleration features
# note: the "| collect " command is optional not needed if you are using the summary index checkbox in a saved search
index=other | rename _time as time | rename _raw as raw | stats count by time raw index host sourcetype source | collect index=collect
# I was having trouble recording the raw event, original host, sourcetype and source fields and putting them into a summary index as they were always overridden with the values of the host which runs the search populating the summary index - here's one solution
# step 1 - populate summary index
# search events from an index namded "other" and prepend the _time, host, sourcetype and source fields to the _raw field with "|" as a delimeter and put into a summary index named "collect"
index=other | eval _raw=_time+"|"+host+"|"+sourcetype+"|"+source+"|"+_raw | collect index=collect
# step 2 - read from summary index named "collect"
# extract time, host, sourcetype and source fields that are stashed in the _raw field in the summary index named "collect"
index=collect | rex "^(?<time1>[^|]+)\|(?<host1>[^|]+)\|(?<sourcetype1>[^|]+)\|(?<source1>[^|]+)\|(?<raw1>[^|]+)"
The "collect" summary indexing operation should handle host-> orig_host, and index-> orig_index, but may not do so for source. Personally, I would use a different summarizing search, calling out different fields other than _raw, etc. What's happening when you search those summarized events is that the default field extractions are being applied, and the host is where the summary ran, the index field is the summary index itself, and the _raw is the base summarized event.
Try calling out the fields you really want to summarize. Note that collect may not properly remap sourcetype -> orig_sourcetype, and will probably ignore eventtype as well.
But also, why are you just cherry-picking events without actually doing any summarization? The search against the raw indexed events should handle that without issue.
Ok. I couldn't get the collect command to preserve the orig_host, orig_sourceype, orig_source, etc, however the sistats and sitimechart commands seem to preserve orig_host which you could then send to the collect command
Thanks, I'll give that a try. I wanting to later do matching of text in the raw field with various other searches against the summary. I'm initially trying to populate the summary with a subset of raw events that I want to search against.