I am curious, does including an index help the search any when writing a search?
This comes about as me and a friend are arguing over whether or not one is more necessary over the other. For example, lets say I do a search with just a Sourcetype and then on another search I include an Index.
While I know this "limits" the data, Splunk still has to search data either way. Would including the Index in this case cause for any substantial gain in the effectiveness of the search, or could leaving it out be just as effective as I am specifying a certain index. What are your thoughts?
You should use both whenever possible. The more precise you are with you search the faster you'll get your results because splunk might be able to look into a smaller amount of data to retrieve what you are looking for.
Also both index and sourcetype (along with host, source and _time) are indexed during index_time which means finding data using any of these fields will get you results quite fast. You can even use the |tstats
command to benefit from these indexed fields (and others in case you're doing indexed_extractions).
say you want to know which hosts are sending data to a specific index you could search:
index = blah | stats count by host
OR
| tstats count where index=blah by host
Just give it a try in a big index and check the diference in time taken to complete the search.
If you want to know more about search best practices and how to write really performant search look for search related presentations from conf.
https://conf.splunk.com/files/2016/slides/behind-the-magnifying-glass-how-search-works.pdf
https://conf.splunk.com/files/2016/slides/search-optimization.pdf
there are 2 aspects here-
sourcetype names need not be unique, for example theoretically I can upload any csv with sourcetype as csv across indexes. So if i search for sourcetype csv it will then search ALL such sourcetypes
BUT
when i add index="aaaa" sourcetype= csv it will search for csv sourcetypes ONLY inside the index aaa.Indexes are unique in nature.
In real life there are many instances when sourcenames will overlap , say for _json or cisco or catalina.
Inclusion is always better than exclusion or not not specifying a more exact match 🙂
I rest my case