Knowledge Management

In a summary index, how can I preserve/capture the original source / sourcetype / host from the event?

bandit
Motivator

I've searched all over and haven't found an answer to this one. My summary index has a subset of events from another index which I collect every 5 minutes. I see the _raw events in the index which is great, but how can I store the original host / source / source type fields in the summary? I've tried the eval command to store the host value in a new field, but it doesn't show up in my summary index. What gives?. I don't want to go back to shell commands and grep 🙂

Example search populating the summary:

index="other" | head 3 | eval orig_host=host | fields orig_host host _raw

Thanks,

Rob

0 Karma

woodcock
Esteemed Legend

You can only preserver these by renaming them as "orig_*" or by overriding on the command line of the "collect" with:

... | collect ... host="X" sourcetype="Y" source="Z"

0 Karma

glc_slash_it
Path Finder

Hey!

Since I was searching for this topic/solution, I'll just add what I think is the right solution for this case.

To preserve the _time, host, source and sourcetype:

(...)

| collect index=main  output_format=hec

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Just remember that if you collect with a sourcetype different than "stash", your collected data will get counted against your license.

0 Karma

bandit
Motivator

after further testing, this is my favorite solution

Just add the following after your base search and orig_host, orig_sourcetype, orig_source and orig_index will all be in your summary index :-)

 | rename _raw as orig_raw

bandit
Motivator
# a much simpler solution that I got from Splunk guru "D" :-)
# turns out renaming the _raw field corrects the issue of missing some of the "orig" fields, i.e. orig_sourcetype
# this approach is proabaly not as relavant to Splunk 6 which has many automatic acceleration features
# note: the "| collect " command is optional not needed if you are using the summary index checkbox in a saved search
index=other | rename _time as time | rename _raw as raw | stats count by time raw index host sourcetype source | collect index=collect

bandit
Motivator
# I was having trouble recording the raw event, original host, sourcetype and source fields and putting them into a summary index as they were always overridden with the values of the host which runs the search populating the summary index - here's one solution

# step 1 - populate summary index
# search events from an index namded "other" and prepend the _time, host, sourcetype and source fields to the _raw field with "|" as a delimeter and put into a summary index named "collect"
index=other | eval _raw=_time+"|"+host+"|"+sourcetype+"|"+source+"|"+_raw | collect index=collect

# step 2 - read from summary index named "collect"
# extract time, host, sourcetype and source fields that are stashed in the _raw field in the summary index named "collect"
index=collect | rex "^(?<time1>[^|]+)\|(?<host1>[^|]+)\|(?<sourcetype1>[^|]+)\|(?<source1>[^|]+)\|(?<raw1>[^|]+)"

sowings
Splunk Employee
Splunk Employee

The "collect" summary indexing operation should handle host-> orig_host, and index-> orig_index, but may not do so for source. Personally, I would use a different summarizing search, calling out different fields other than _raw, etc. What's happening when you search those summarized events is that the default field extractions are being applied, and the host is where the summary ran, the index field is the summary index itself, and the _raw is the base summarized event.

Try calling out the fields you really want to summarize. Note that collect may not properly remap sourcetype -> orig_sourcetype, and will probably ignore eventtype as well.

But also, why are you just cherry-picking events without actually doing any summarization? The search against the raw indexed events should handle that without issue.

0 Karma

bandit
Motivator

Ok. I couldn't get the collect command to preserve the orig_host, orig_sourceype, orig_source, etc, however the sistats and sitimechart commands seem to preserve orig_host which you could then send to the collect command

0 Karma

bandit
Motivator

Thanks, I'll give that a try. I wanting to later do matching of text in the raw field with various other searches against the summary. I'm initially trying to populate the summary with a subset of raw events that I want to search against.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...