Splunk Search

How to map _introspection data.search_props.sid to the SPL of the search?

efavreau
Motivator

I have a search that uses index=_introspection, to return to me searches and their memory consumption. For an event of interest, I receive a data.search_props.sid of: subsearch_scheduler__tdurden__FTCB__RMD50e8af33aa7072735_at_1541509200_79403_61D4784A-AD2B-4D4D-8A8C-172D292069A3_1541509365.1

This string appears to be a concatenation of other fields, but which ones isn't clear. It appears it is not the same as search_id. I have ruled out using this sid wholesale. There must be a way to use a part of it to find the exact SPL used.

I attempted the following, which was able to find something close to the timeframe in index=_audit, but the rest of the search_id string didn't match.

index=_audit search_id='scheduler__tdurden__FTCB__*' search=*
| table user timestamp savedsearch_name search search_id *

What part of this sid can be used to return the SPL executed? Is there a different field to use to get to the SPL?

###

If this reply helps you, an upvote would be appreciated.
1 Solution

dmarling
Builder

This can be a little tricky due to how subsearches and remote searches handle their search ID's and that it's possible for a search to have started several hours prior to the introspection log you are viewing if it is a very long running search. In your example you have there is a subsearch at the beginning of the sid. That sid on the audit event will not have that subsearch at the beginning of it. Therefore you want to have a search that will trim off parts of the sid that are appended when subsearches and remote searches are executed. I have found that the below query will get you to the audit search event from the introspection log. Using the sid in your example:

[ search index=_introspection subsearch_scheduler__tdurden__FTCB__RMD50e8af33aa7072735_at_1541509200_79403_61D4784A-AD2B-4D4D-8A8C-172D292069A3_1541509365.1
    | rename "data.*" as "*" 
    | eval searchstart=round(_time,0)-elapsed 
    | eval earliest=relative_time(searchstart, "-5m@m") 
    | eval latest=relative_time(searchstart, "+5m@m") 
    | rename "search_props.*" as "*" 
    | rex mode=sed field=sid "s/remote_[^\_]+_//g" 
    | rex mode=sed field=sid "s/^subsearch_//g" 
    | stats min(earliest) as earliest max(latest) as latest by sid 
    | eval search="earliest=".earliest." latest=".latest." search_id='*".sid."*'" 
    | table search] index=_audit search=* action=search 
| rex "search='search (?<search>[^\e]+)" 
| rex mode=sed field=search "s/', autojoin=[^\e]+//g"

There is a subsearch at the beginning where you enter the sid from the introspection logs. It then calculates the start time of the search by subtracting the elapsed time from the event time (time). I then create a +/- 5 minute window from that search start time to pass through to the main search as my earliest and latest search parameters. I also perform sid normalization with the two rex sed statements to remove the preceding "remote_hostname" and "subsearch_" parts of the sid that gets passed through to the main search by the search eval statement.

If this comment/answer was helpful, please up vote it. Thank you.

View solution in original post

dmarling
Builder

This can be a little tricky due to how subsearches and remote searches handle their search ID's and that it's possible for a search to have started several hours prior to the introspection log you are viewing if it is a very long running search. In your example you have there is a subsearch at the beginning of the sid. That sid on the audit event will not have that subsearch at the beginning of it. Therefore you want to have a search that will trim off parts of the sid that are appended when subsearches and remote searches are executed. I have found that the below query will get you to the audit search event from the introspection log. Using the sid in your example:

[ search index=_introspection subsearch_scheduler__tdurden__FTCB__RMD50e8af33aa7072735_at_1541509200_79403_61D4784A-AD2B-4D4D-8A8C-172D292069A3_1541509365.1
    | rename "data.*" as "*" 
    | eval searchstart=round(_time,0)-elapsed 
    | eval earliest=relative_time(searchstart, "-5m@m") 
    | eval latest=relative_time(searchstart, "+5m@m") 
    | rename "search_props.*" as "*" 
    | rex mode=sed field=sid "s/remote_[^\_]+_//g" 
    | rex mode=sed field=sid "s/^subsearch_//g" 
    | stats min(earliest) as earliest max(latest) as latest by sid 
    | eval search="earliest=".earliest." latest=".latest." search_id='*".sid."*'" 
    | table search] index=_audit search=* action=search 
| rex "search='search (?<search>[^\e]+)" 
| rex mode=sed field=search "s/', autojoin=[^\e]+//g"

There is a subsearch at the beginning where you enter the sid from the introspection logs. It then calculates the start time of the search by subtracting the elapsed time from the event time (time). I then create a +/- 5 minute window from that search start time to pass through to the main search as my earliest and latest search parameters. I also perform sid normalization with the two rex sed statements to remove the preceding "remote_hostname" and "subsearch_" parts of the sid that gets passed through to the main search by the search eval statement.

If this comment/answer was helpful, please up vote it. Thank you.

efavreau
Motivator

Thank you! This is cool on multiple levels. I appreciate the time and thoroughness of the response @dmarling.

###

If this reply helps you, an upvote would be appreciated.
0 Karma

efavreau
Motivator

Does anyone use the index=_introspection logs?

###

If this reply helps you, an upvote would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...