Splunk Search

How to determine if logs are not being used?

JimSchlaker
New Member

Is there a way to determine which logs are not being used anymore, and therefore can be deleted? For example, maybe a team started logging something a year ago, but the team no longer uses that log for any reports/dashboards/etc... Is there a way to find these unused logs?

0 Karma

woodcock
Esteemed Legend
0 Karma

kellewic
Path Finder

In addition to the links mentioned by adonio, this search might get you some of the way there. However, things like macros can hide indexes/sourcetypes so it's not 100% but does also include data models/nodenames being used.

The search filters out all "*" and "_*" references as those aren't very useful. It prefixes data models with "DM-" and nodenames with "ND-" and treats those as an index/sourcetype combo. Macros are prefixed with "MC-" to easily identify and look at manually.

You could compare this against a REST call to the indexes or indexes-extended endpoint to get a starting point. BUT, you will want to confirm with data owners the indexes aren't actually being used since, again, this search is not 100%.

index=_internal sourcetype=splunkd_remote_searches
|dedup search

|eval
  search=replace(search, "(datamodel\s*=[\s\"]*)(.*?)([\|\s\"\)])", "\1DM-\2\3"),
  search=replace(search, "(eval\s+datamodel\s*=[\s\"]*)DM-", "\1"),
  search=replace(search, "(\|\s*pivot\s+)(.*?)(\s)", "\1DM-\2\3"),
  search=replace(search, "(nodename\s*=[\s\"]*)(.*?)([\|\s\"\)])", "\1ND-\2\3"),
  search=replace(search, "(eval\s+nodename\s*=[\s\"]*)ND-", "\1"),
  search=replace(search, "(search\s*`)(.*?)([`\(])", "\1MC-\2\3")

|rex field=search max_match=0 "index\s*=[\s\"]*(?<idx>.*?)[\|\s\"\)]"
|rex field=search max_match=0 "sourcetype\s*=[\s\"]*(?<st>.*?)[\|\s\"\)]"
|rex field=search max_match=0 "search\s*`(?<macro_index>MC-.*?)[`\(]"
|rex field=search max_match=0 "datamodel\s*=[\s\"]*(?<dm>DM-.*?)[\|\s\"\)]"
|rex field=search max_match=0 "nodename\s*=[\s\"]*(?<node>ND-.*?)[\|\s\"\)]"
|rex field=search max_match=0 "\|\s*pivot\s+(?<pv>.*?)\s"

|eval
  idx=mvdedup(mvappend(idx, macro_index, dm, pv)),
  idx=mvfilter(idx!="*" AND idx!="_*" AND NOT match(idx, "^_") AND NOT match(idx, "^\d+[\*_]")),
  st=mvdedup(mvappend(st, node))

|where isnotnull(idx) AND isnotnull(st)
|stats c by idx, st

|table idx, st

kellewic
Path Finder

One comment - I missed it when posting - in the mvfilter(), remove the last condition as that was specific to my use case when I made this - AND NOT match(idx, "^\d+[*_]"). We have indexes that start with a numeric ID for each customer and I wanted to ignore those.

0 Karma

adonio
Ultra Champion

hello JimSchlaker,
there are answers here around which indexes are used for reports / saved searches / dashboards and more that you can relay on. for example:
https://answers.splunk.com/answers/273176/how-can-i-determine-how-much-an-index-is-being-sea.html
https://answers.splunk.com/answers/186268/how-to-search-for-and-remove-indexes-in-splunk-tha.html
considering you mention also time span, meaning they might look at that particular index / source /sourcetype but not utilizing the old data, i would suggest an opposite way of approaching that challenge.
will suggest to either check the timerange on searches using | rest or the _audit index to determine. or verify with teams, how far back they need their data and set a hard time limit in indexes.conf on the index contains that data
hope it helps

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...