Knowledge Management

How to identify unused data in order to dynamically disable inputs that are not being used?

Kyle_Jackson
Explorer

I need a way to find unused data and disable the input/scheduled_search/summary_index in order to maintain cost of the Splunk environment. I figured out how to pull the sourcetypes from the access logs as well as from the saved searches. Now i just need to figure out a way to dynamically disable the inputs.

Try this bash script

START: #!/bin/sh

author Kyle_Jackson

This pulls from splunk btool savedsearches.conf and writes the results to /opt/isv/splunkshared/var/run/splunk

VERSION 2

echo "name, search" > /opt/isv/splunkshared/var/run/splunk/savedsearches_list.csv && /opt/isv/splunk/bin/splunk btool savedsearches list | grep '[|search =|disabled =' | grep -v auto_summarize.command | sed -e 's/"//g' -e "s/'//g" | sed 's/^[/BOOGER/' | tr -d '\r\n' | sed 's/BOOGER/\n/g' | sed -e 's/^/"/' -e 's/]/", "/' -e 's/$/"/' >> /opt/isv/splunkshared/var/run/splunk/savedsearches_list.csv
END:
Using this lookup I will combine it to the data sets we are using in scheduled searches below.

Scheduled Search Audit Lookup

*Search that recursively pulls the search or saved search from the bash script output and pulls out the source type to then join it to the web_access logs. *

index=_internal source=license_usage.log type=Usage pool=default_pool earliest=-1d@d latest=@d | eval GB=b/1024/1024/1024 | stats sum(GB) as GB by st | stats avg(GB) as avg_GB by st | rename st as "sourcetype" | rename avg_GB as "Avg GB/Day" | eval sourcetype=lower(sourcetype)
| join type=outer sourcetype [
| inputcsv savedsearches_list
| rename search as search1
| rex field=search1 mode=sed "s/\"//g" | rex field=search1 "index=(?\S+)" | rex field=search1 "sourcetype=(?\S+)" | rex field=search1 "source=(?\S+)"
| rex field=search1 "savedsearch (?\S+)" | rex field=search1 "disabled = (?\d)search"
| join type=outer name1 [
| inputcsv savedsearches_list
| rename name as name1
| rename search as search2
| rex field=search2 mode=sed "s/\"//g" | rex field=search2 "index=(?\S+)" | rex field=search2 "sourcetype=(?\S+)" | rex field=search2 "source=(?\S+)"
| rex field=search2 "disabled = (?\d)search"
| table name1 search2 index
source* disable*
]
| table name* index* source* search* disable*
| eval index="" | eval disabled="" | eval source="" | eval sourcetype="" | eval name=""
| eval index=case(isnull(index1), index2, isnull(index2), index1)
| eval disabled=case(isnull(disabled1), disabled2, isnull(disabled2), disabled1)
| eval source=case(isnull(source1), source2, isnull(source2), source1)
| eval sourcetype=case(isnull(sourcetype1), sourcetype2, isnull(sourcetype2), sourcetype1)
| table index source sourcetype disabled
| rex field=sourcetype mode=sed "s/*//g"
| rex field=sourcetype mode=sed "s/)//g"
| rex field=sourcetype mode=sed "s/(//g"
| eval sourcetype=lower(sourcetype)
]

| join type=outer sourcetype [search
index=_internal sourcetype=* tag::host=webserver sourcetype earliest=-30d@d latest=now referer=* | eval decode=urldecode(referer) | rex field=decode mode=sed "s/\"//g" | rex field=decode mode=sed "s/&/ /g" | rex field=decode mode=sed "s/\%/ /g"
| stats latest(_time) as _time by user decode referer
| rex field=decode "index=(?\S+)" | rex field=decode "sourcetype=(?\S+)" | rex field=decode "source=(?\S+)"
| fillnull value="n/a"
| eval sourcetype=lower(sourcetype)

| rex field=sourcetype mode=sed "s/*//g"
| rex field=sourcetype mode=sed "s/)//g"
| rex field=sourcetype mode=sed "s/(//g"
| stats latest(user) as latest_user latest(_time) as latest_time by sourcetype
| eval web_accessed="latest_user=".latest_user." latest_time=".latest_time." sourcetype=".sourcetype | fields - latest_time latest_user
]

| rename disabled as is_scheduled

| eval value=case((isnull(web_accessed) AND isnull(is_scheduled)), "not being searched", (isnull(web_accessed) AND isnotnull(is_scheduled)), "saved search only", (isnotnull(web_accessed) AND isnull(is_scheduled)), "web search only", (isnotnull(web_accessed) AND isnotnull(is_scheduled)), "both saved and web searched")
| fields - index source is_scheduled
| sort - value, "Avg GB/Day"

Please let me know if there is a better way to do this with a **possible solution** to the problem at hand. Also if you happen to try this, let me know if you have any ideas to make this better.

yannK
Splunk Employee
Splunk Employee

I see a possible limit to your approach.
The license usage only contains metadata like source/sourcetype/host/index.
You can eventually figure that a precise search (like "index=A sourcetype=B" ) can be used to match the license usage.
But for searches with broad conditions (like "index=*" or "keyword" ), you will not be able to know what was the scope of the data,

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...