Splunk Search

Finding devices that are "off" at a date, and then counting how many of them turn "on" on each subsequent day

ALevin123
New Member

Hi,

I'm a first time splunk user trying to figure out how to do the following:
I have data describing devices, the devices are either "on" or "off" on any given day. I want to search for the ID's of the devices that are "off" on a specific day (let's call them my "new_set") and then track those (and only those) devices over time and output a count of how many of those devices in new_set are "on" and how many are "off" on each subsequent day.

Illustration, on day 0 I might find 200,000 devices that are "off" out of a total of 500,000 devices; then I want to count on each subsequent day how many of those specific 200,000 devices are "on" and how many are "off". It may look like this
Day 0: 200,000 off; 300,000 on
Then, I find the ID's of those 200,000 and call this new_set and then track only them ( I no longer care about the devices that are not in "new_set"):
Day 0: 200,000 off; 0 on
Day 1: 199,000 off: 1,000 on
Day 2: 197,000 off: 3,000 on
Day n: .....

I have tried using a Join command (inner), which eventually works, however the search is very slow / inefficient and I think there must be a more efficient way?

Does anyone have some recommendations or thoughts?

Tags (1)
0 Karma
1 Solution

sundareshr
Legend

I think you have couple of option

*Option 1 (preferred/more efficient)*
Get a list of all devices off on Day 0 and store them in a lookup csv (http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Usefieldlookups). Your search for this will look something like this

index=xyz earliest=day0 latest=day0 state="off" | stats count by id | fields - count | outputlookup listofdevicesoffonday0.csv

Once you have this file, you can use it to filter, like this

index=xyz earliest=day1 [| inputlookup listofdevicesoffonday0.csv ] | timechart span=1d count by state | fields - on

*Option 2, less efficient*
Use sub-search (http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Useasubsearch) to get the filter list. So your search will look like this

index=xyz earliest=day1 [search  index=xyz earliest=day0 latest=day0 state="off" | stats count by id | table id ] | timechart span=1d count by state | fields - on

View solution in original post

0 Karma

sundareshr
Legend

I think you have couple of option

*Option 1 (preferred/more efficient)*
Get a list of all devices off on Day 0 and store them in a lookup csv (http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Usefieldlookups). Your search for this will look something like this

index=xyz earliest=day0 latest=day0 state="off" | stats count by id | fields - count | outputlookup listofdevicesoffonday0.csv

Once you have this file, you can use it to filter, like this

index=xyz earliest=day1 [| inputlookup listofdevicesoffonday0.csv ] | timechart span=1d count by state | fields - on

*Option 2, less efficient*
Use sub-search (http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchTutorial/Useasubsearch) to get the filter list. So your search will look like this

index=xyz earliest=day1 [search  index=xyz earliest=day0 latest=day0 state="off" | stats count by id | table id ] | timechart span=1d count by state | fields - on
0 Karma

ALevin123
New Member

Hey,

Thank you for the help, that definitely got me further. However, it appears that my output is giving me the total "On"/"Off" counts from the original population ( the 500,000 in the above example) rather than from the sample of "new_set" (where "new_set" are those devices that were "off" on day=0 i.e. the 200,000). Any idea what I'm doing wrong?

Also, in my data there appear to be duplicates of the device ID's - so I would need to remove duplicates, would I use a "dedup" command and if so, where in the query?

0 Karma

sundareshr
Legend

Can you share your query

0 Karma

ALevin123
New Member

Thank you!

index=xyz source=abc mac!=pqr
[search index=xyz source=abc mac!=pqr timeStamp="day0" state= “off”
[ search source=uvw product=lmn timeStamp="day0"
| dedup Id
| table Id]
| stats count by Id
| dedup time,mac
| table Id]
| bin_time span=1d
| dedup_time,mac
| stats count by _time,operStatus
| sort_time

I changed some of your recommendations; for example, timechart was giving me "0" for weekend days, so I used the bin and sort instead. And "fields - on" wasn't giving me the comparison I needed of seeing "off" and "on" together for a day.

0 Karma

sundareshr
Legend

To limit your search, you need at add 'earliest=' in your base search. So in your example you should add day1 as the earliest OR timestamp>day0

Also, I notice you have two subsearches, not sure you need both. Combine them for faster searches

To eliminate duplicates, you can do dedup, or stats dc(id) as distinc_ids

0 Karma

ALevin123
New Member

Can I ask why you recommended field lookup.csv rather than a KV store in this scenario?

0 Karma

sundareshr
Legend

KV store will work as well. .csv is the first thing that came to mind.

If this helped you with your query, please mark it as answered to close it out.

0 Karma

ALevin123
New Member

Thank you.

0 Karma

ALevin123
New Member

Thank you, appreciate your help.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...