I need a way to find unused data and disable the input/scheduled_search/summary_index in order to maintain cost of the Splunk environment. I figured out how to pull the sourcetypes from the access logs as well as from the saved searches. Now i just need to figure out a way to dynamically disable the inputs.
START: #!/bin/sh
echo "name, search" > /opt/isv/splunkshared/var/run/splunk/savedsearches_list.csv && /opt/isv/splunk/bin/splunk btool savedsearches list | grep '[|search =|disabled =' | grep -v auto_summarize.command | sed -e 's/"//g' -e "s/'//g" | sed 's/^[/BOOGER/' | tr -d '\r\n' | sed 's/BOOGER/\n/g' | sed -e 's/^/"/' -e 's/]/", "/' -e 's/$/"/' >> /opt/isv/splunkshared/var/run/splunk/savedsearches_list.csv
END:
Using this lookup I will combine it to the data sets we are using in scheduled searches below.
*Search that recursively pulls the search or saved search from the bash script output and pulls out the source type to then join it to the web_access logs. *
index=_internal source=license_usage.log type=Usage pool=default_pool earliest=-1d@d latest=@d | eval GB=b/1024/1024/1024 | stats sum(GB) as GB by st | stats avg(GB) as avg_GB by st | rename st as "sourcetype" | rename avg_GB as "Avg GB/Day" | eval sourcetype=lower(sourcetype)
| join type=outer sourcetype [
| inputcsv savedsearches_list
| rename search as search1
| rex field=search1 mode=sed "s/\"//g" | rex field=search1 "index=(?
| rex field=search1 "savedsearch (?
| join type=outer name1 [
| inputcsv savedsearches_list
| rename name as name1
| rename search as search2
| rex field=search2 mode=sed "s/\"//g" | rex field=search2 "index=(?
| rex field=search2 "disabled = (?
| table name1 search2 index
]
| table name* index* source* search* disable*
| eval index="" | eval disabled="" | eval source="" | eval sourcetype="" | eval name=""
| eval index=case(isnull(index1), index2, isnull(index2), index1)
| eval disabled=case(isnull(disabled1), disabled2, isnull(disabled2), disabled1)
| eval source=case(isnull(source1), source2, isnull(source2), source1)
| eval sourcetype=case(isnull(sourcetype1), sourcetype2, isnull(sourcetype2), sourcetype1)
| table index source sourcetype disabled
| rex field=sourcetype mode=sed "s/*//g"
| rex field=sourcetype mode=sed "s/)//g"
| rex field=sourcetype mode=sed "s/(//g"
| eval sourcetype=lower(sourcetype)
]
| join type=outer sourcetype [search
index=_internal sourcetype=* tag::host=webserver sourcetype earliest=-30d@d latest=now referer=* | eval decode=urldecode(referer) | rex field=decode mode=sed "s/\"//g" | rex field=decode mode=sed "s/&/ /g" | rex field=decode mode=sed "s/\%/ /g"
| stats latest(_time) as _time by user decode referer
| rex field=decode "index=(?\S+)" | rex field=decode "sourcetype=(?\S+)" | rex field=decode "source=(?\S+)"
| fillnull value="n/a"
| eval sourcetype=lower(sourcetype)
| rex field=sourcetype mode=sed "s/*//g"
| rex field=sourcetype mode=sed "s/)//g"
| rex field=sourcetype mode=sed "s/(//g"
| stats latest(user) as latest_user latest(_time) as latest_time by sourcetype
| eval web_accessed="latest_user=".latest_user." latest_time=".latest_time." sourcetype=".sourcetype | fields - latest_time latest_user
]
| rename disabled as is_scheduled
| eval value=case((isnull(web_accessed) AND isnull(is_scheduled)), "not being searched", (isnull(web_accessed) AND isnotnull(is_scheduled)), "saved search only", (isnotnull(web_accessed) AND isnull(is_scheduled)), "web search only", (isnotnull(web_accessed) AND isnotnull(is_scheduled)), "both saved and web searched")
| fields - index source is_scheduled
| sort - value, "Avg GB/Day"
I see a possible limit to your approach.
The license usage only contains metadata like source/sourcetype/host/index.
You can eventually figure that a precise search (like "index=A sourcetype=B" ) can be used to match the license usage.
But for searches with broad conditions (like "index=*" or "keyword" ), you will not be able to know what was the scope of the data,