Splunk Search

What is the best way to find searches without sourcetype or index defined?

sloshburch
Splunk Employee
Splunk Employee

I know that indexed fields accelerate search performance. Many searches take advantage of this with host, source, and _time, but users new to Splunk often overlook embracing index or sourcetype.

What is the best way you've found for identifying existing searches lacking such an index or sourcetype definition?

1 Solution

gjanders
SplunkTrust
SplunkTrust

In Alerts for SplunkAdmins or github I have a few alerts for this under Search Head Level - Non Best-Practices

In particular
SearchHeadLevel - User - Dashboards searching all indexes
SearchHeadLevel - Scheduled searches not specifying an index

They could be adapted for not specifying a sourcetype, audit logs as SloshBurch (Burch) mentioned will work for ad-hoc searches as well.

SearchHeadLevel - User - Dashboards searching all indexes

| rest /servicesNS/-/-/data/ui/views 
| search `comment("A dashboard searching all indexes is an issue just like a scheduled search querying all indexes or using the index=* trick")` eai:data=*query*
| regex eai:data="<search.*" 
| rex field=eai:data "(?P<theSearch><search(?!String)[^>]*>[^<]*<query>.*?)<\/query>" max_match=200 
| mvexpand theSearch 
| rex field=theSearch "<search(?P<searchInfo>[^>]*)>[^<]*<query>(?P<theQuery>.*)" 
| search `comment("If we are seeing post process search then we don't want to check if it has index= because that is likely only in the base query. These are also various exclusions for legitimate searches that will not involve scanning all indexes, such as rest or a savedsearch or similar")` searchInfo!="*base*"
| rename eai:appName AS application, eai:acl.sharing AS sharing, eai:acl.owner AS owner, label AS name
| table theQuery, application, owner, sharing, name, splunk_server, title
| regex theQuery!="index\s*=(?!\s*\*)" 
| regex theQuery!="^(\()?\s*(\`|\$[^|]+\$|eventtype=|<!\[CDATA\[\s*\|\s*((acl)?inputlookup|rest) |\|)"
| rex field=theQuery "^(?P<exampleQueryToDetermineIndexes>[^\|]+)"
| eval exampleQueryToDetermineIndexes=exampleQueryToDetermineIndexes . "| stats values(index) AS index | format | fields search | eval search=replace(search,\"\\)\",\"\"), search=replace(search,\"\\(\",\"\"), search=if(search==\"NOT \",\"No indexes found\",search)"

SearchHeadLevel - Scheduled searches not specifying an index

| rest /servicesNS/-/-/saved/searches
| search `comment("Look over all scheduled searches and find those not specifying/narrowing down to an index, or using the index=* trick")`
| table title, eai:acl.owner, description, eai:acl.app, qualifiedSearch, next_scheduled_time
| search next_scheduled_time!="" 
| regex qualifiedSearch!=".*index\s*(!?)=\s*([^*]|\*\S+)" 
| regex qualifiedSearch="^\s*search "
| regex qualifiedSearch!="^\s*search\s*\[\s*\|\s*inputlookup"
| rex field=qualifiedSearch "^(?P<exampleQueryToDetermineIndexes>[^\|]+)"
| regex exampleQueryToDetermineIndexes!="\`"
| eval exampleQueryToDetermineIndexes=exampleQueryToDetermineIndexes . "| stats values(index) AS index | format | fields search | eval search=replace(search,\"\\)\",\"\"), search=replace(search,\"\\(\",\"\"), search=if(search==\"NOT \",\"No indexes found\",search)"
| rename eai:acl.owner AS owner, eai:acl.app AS Application

View solution in original post

sloshburch
Splunk Employee
Splunk Employee

The Splunk Product Best Practices team provided this question. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

0 Karma

gjanders
SplunkTrust
SplunkTrust

In Alerts for SplunkAdmins or github I have a few alerts for this under Search Head Level - Non Best-Practices

In particular
SearchHeadLevel - User - Dashboards searching all indexes
SearchHeadLevel - Scheduled searches not specifying an index

They could be adapted for not specifying a sourcetype, audit logs as SloshBurch (Burch) mentioned will work for ad-hoc searches as well.

SearchHeadLevel - User - Dashboards searching all indexes

| rest /servicesNS/-/-/data/ui/views 
| search `comment("A dashboard searching all indexes is an issue just like a scheduled search querying all indexes or using the index=* trick")` eai:data=*query*
| regex eai:data="<search.*" 
| rex field=eai:data "(?P<theSearch><search(?!String)[^>]*>[^<]*<query>.*?)<\/query>" max_match=200 
| mvexpand theSearch 
| rex field=theSearch "<search(?P<searchInfo>[^>]*)>[^<]*<query>(?P<theQuery>.*)" 
| search `comment("If we are seeing post process search then we don't want to check if it has index= because that is likely only in the base query. These are also various exclusions for legitimate searches that will not involve scanning all indexes, such as rest or a savedsearch or similar")` searchInfo!="*base*"
| rename eai:appName AS application, eai:acl.sharing AS sharing, eai:acl.owner AS owner, label AS name
| table theQuery, application, owner, sharing, name, splunk_server, title
| regex theQuery!="index\s*=(?!\s*\*)" 
| regex theQuery!="^(\()?\s*(\`|\$[^|]+\$|eventtype=|<!\[CDATA\[\s*\|\s*((acl)?inputlookup|rest) |\|)"
| rex field=theQuery "^(?P<exampleQueryToDetermineIndexes>[^\|]+)"
| eval exampleQueryToDetermineIndexes=exampleQueryToDetermineIndexes . "| stats values(index) AS index | format | fields search | eval search=replace(search,\"\\)\",\"\"), search=replace(search,\"\\(\",\"\"), search=if(search==\"NOT \",\"No indexes found\",search)"

SearchHeadLevel - Scheduled searches not specifying an index

| rest /servicesNS/-/-/saved/searches
| search `comment("Look over all scheduled searches and find those not specifying/narrowing down to an index, or using the index=* trick")`
| table title, eai:acl.owner, description, eai:acl.app, qualifiedSearch, next_scheduled_time
| search next_scheduled_time!="" 
| regex qualifiedSearch!=".*index\s*(!?)=\s*([^*]|\*\S+)" 
| regex qualifiedSearch="^\s*search "
| regex qualifiedSearch!="^\s*search\s*\[\s*\|\s*inputlookup"
| rex field=qualifiedSearch "^(?P<exampleQueryToDetermineIndexes>[^\|]+)"
| regex exampleQueryToDetermineIndexes!="\`"
| eval exampleQueryToDetermineIndexes=exampleQueryToDetermineIndexes . "| stats values(index) AS index | format | fields search | eval search=replace(search,\"\\)\",\"\"), search=replace(search,\"\\(\",\"\"), search=if(search==\"NOT \",\"No indexes found\",search)"
| rename eai:acl.owner AS owner, eai:acl.app AS Application

sloshburch
Splunk Employee
Splunk Employee

Oh nice! And great plug for your app!

Would you be ok with sharing either of the specific searches here so we can learn from you expertise? I ask before I got and copy/paste it myself as I wouldn't want to overstep...

gjanders
SplunkTrust
SplunkTrust

Updated to include them, copy and pasting is fine, the app was designed to be shared.
One of the original goals was to get some of the searches back into the monitoring console but I think they have gone beyond that level of complexity!

I have stripped some of the macros from the copy & paste excluding the comment macro

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Great addition! I think the most critical thing you captured is that, in reality, trying to pin down ad hoc searches might not be an effective use of your time as compared to the saved items that surely will be run, like scheduled, and dashboards.

Furthermore, your proper use of /servicesNS/-/-/saved/searches does right by working around the namespace constraints - I lazily overlooked that.

I'll likely revert back with a new answer that takes this to our next level of index OR sourcetype.

But for now, switching the accepted answer to what you provided! Great job!

0 Karma

sloshburch
Splunk Employee
Splunk Employee

How's this revision for the Scheduled Searches part?

| rest /servicesNS/-/-/saved/searches
| fields qualifiedSearch, next_scheduled_time, title, eai:acl.owner, eai:acl.app
| where match( qualifiedSearch , "^\s*search\s*" )
| rex field=qualifiedSearch "^(?<base_search>search[^\|\[]+)"
| eval 
    check-sourcetype = if( match( base_search , "\s+sourcetype\s*=" ) , "defined" , "missing" ) ,
    check-index = if( match( base_search , "\s+index\s*=" ) , "defined" , "missing" ) ,
    check-hidden = if( match( base_search , "\s+((tag|eventtype)\s*=|\`)" ) , "defined" , "missing" ) ,
    check-scheduled = if( match( next_scheduled_time , ".+" ) , "defined" , "missing" )
| rename eai:acl.* AS namespace-*
| search ( check-sourcetype="missing" OR check-index="missing" ) check-hidden="missing" check-scheduled="missing" namespace-owner!="nobody"
| table check-index, check-sourcetype, base_search, namespace-*

The end of the search is where folks can tweak to add things back in. But the way it's written,

  • namespace-owner!="nobody" means it's limited to items saved by real users since it isn't effective to update 'nobody' owned searches that came with the product.
  • check-scheduled="missing" filters out unscheduled searches. This could be toggled since dashboards might reference a saved, but unscheduled, search. Albeit rare.
  • check-hidden="missing" filters out searches using tag, eventtypes, and macros - things where index or sourcetype might be defined

As of now, this doesn't do much for subsearches but I saw that logic in what you posted.

Whaddya think? Any suggestions to change?

0 Karma

gjanders
SplunkTrust
SplunkTrust

Perhaps something like:

| rest /servicesNS/-/-/saved/searches
 | fields qualifiedSearch, next_scheduled_time, title, eai:acl.owner, eai:acl.app
 | where match( qualifiedSearch , "^\s*search\s*" )
 | rex field=qualifiedSearch "^(?<base_search>search[^\|\[]+)"
 | eval 
     check-sourcetype = if( match( base_search , "\s+sourcetype\s*=" ) , "defined" , "missing" ) ,
     check-index = if( match( base_search , "\s+index\s*(=|IN)" ) , "defined" , "missing" ) ,
     check-index-contains-wildcard = if( match( base_search , "\s+index\s*(=\s*[^\*]+(\s|$)|IN\s*\([^\)\*]+\s*\))" ) , "missing" , "defined" ) ,
     check-index-starts-wildcard = if( match( base_search , "\s+index\s*(=\s*\*|IN\s*\(\s*\*)" ) , "defined" , "missing" ) ,
     check-hidden = if( match( base_search , "\s+((tag|eventtype)\s*=|\`)" ) , "defined" , "missing" ) ,
     check-scheduled = if( match( next_scheduled_time , ".+" ) , "defined" , "missing" )
 | rename eai:acl.* AS namespace-*
 | search ( check-sourcetype="missing" OR check-index="missing" ) check-hidden="missing" check-scheduled="missing" namespace-owner!="nobody"
 | table title, check-index, check-sourcetype, base_search, namespace-*, check-index-contains-wildcard, check-index-starts-wildcard

This version includes the title (which is really useful), works with the IN clause, and I'm unsure if you wanted to check for wildcards in indexes so I added 2 versions as I've found that useful in my last environment (and it will be useful in my current one)

Let me know what you think

Regarding your comments:
"namespace-owner!="nobody" means it's limited to items saved by real users since it isn't effective to update 'nobody' owned searches that came with the product."

This is a nice place to toggle the setting as it can be useful for identifying poorly built addons.

"check-scheduled="missing" filters out unscheduled searches. This could be toggled since dashboards might reference a saved, but unscheduled, search. Albeit rare."

Yes, except in the example you posted your filtering for unscheduled searches, my example does the same but that can be changed easily...

"check-hidden="missing" filters out searches using tag, eventtypes, and macros - things where index or sourcetype might be defined"

I have another alert called "SearchHeadLevel - Scheduled searches not specifying an index macro version" however it's more complicated and needs a bit more work to get running compared to the version I've pasted already. Note that my current searches don't always cater for "IN" as the app was started on 6.5.x and IN was added later, so only some of my searches have been updated (after this discussion I might update the ones for finding the wildcarded index :))

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Thanks for catching my mistakes. Here's an update. I chopped out some of the wildcard stuff - I agree that it's important - I just fear it's making this search beyond understand-ability and so I think it might be easier for fellow engineers to have a second search that focuses solely on wildcard usage for such key items.

 | rest /servicesNS/-/-/saved/searches
  | fields qualifiedSearch, next_scheduled_time, title, eai:acl.owner, eai:acl.app
  | where match( qualifiedSearch , "^\s*search\s*" )
  | rex field=qualifiedSearch "^(?<base_search>search[^\|\[]+)"
  | eval 
      check-sourcetype = if( match( base_search , "\s+sourcetype\s*(=|IN)" ) , "defined" , "missing" ) ,
      check-index = if( match( base_search , "\s+index\s*(=|IN)" ) , "defined" , "missing" ) ,
      check-hidden = if( match( base_search , "\s+((tag|eventtype)\s*(=|IN)|\`)" ) , "defined" , "missing" ) ,
      check-scheduled = if( match( next_scheduled_time , ".+" ) , "defined" , "missing" )
  | rename eai:acl.* AS namespace-*
  | search ( check-sourcetype="missing" OR check-index="missing" ) check-hidden="missing" check-scheduled="defined" namespace-owner!="nobody"
  | table title, check-index, check-sourcetype, base_search, namespace-*

BTW: The toggles we introduced and the prospect of a wildcard audit makes me think we might have another dashboard form in our future. Thereby embracing the use case on a single UI but supporting the toggles we identified.

0 Karma

gjanders
SplunkTrust
SplunkTrust

Sure, looks good, although the title is still missing from the table command

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Oh I'm a dummy - thanks for pointing that out - should be fixed now.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

@gjanders - I'm hooking you up with some karma or something. You taught me about the IN Operator! That slipped past me so thank you for teaching me!

0 Karma

gjanders
SplunkTrust
SplunkTrust

IN is super useful, and it was mentioned when it was released, but it took me a while to start really using it in practice!
The ability to copy and paste a list of items with IN (a,b,c, d) is very useful and it often looks nicer than many "OR" entries

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Ha ha. I feel silly for forgetting the title and filtering to unscheduled searches. Good catches!

0 Karma

sloshburch
Splunk Employee
Splunk Employee

The Splunk Product Best Practices team provided this response. Read more about How Crowdsourcing is Shaping the Future of Splunk Best Practices.

I've had luck running a search for searches that don't already refer to a sourcetype or index.

index=_audit sourcetype=audittrail action=search search="search *" ( search!="* sourcetype*" OR search!="* index*" ) search_type="ad hoc"
| stats first(search) AS search BY search_id

Or, you can explore your saved searches just the same with:

| rest /services/saved/searches
| search search!="*sourcetype=*" OR search!="*index=*" search!="|*"
| table search

In either approach, remember that sometimes the sourcetype or index IS defined, but is abstracted because it is defined within a macro or as part of an eventtype or tag.

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...