Splunk Search

How to find records that only exist in a particular time range

vjverma
Explorer

I need to query the logs to find records with names that only exists in the current month but not in the previous month (or ever).

The SQL equivalent would be something on these lines

select * FROM visitors where datevisit >= '10/1/2014' and datevisit <= '10/31/2014' AS Oct_Visits
Outer join visitors where datevisit >= '9/1/2014' and datevisit <= '9/30/2014' as Sep_Visits
ON Oct_Visits.name = Sep_Visits.name AND Sep_Visits.name = null

Appreciate your help!

Thanks,
Vijay

Tags (2)
0 Karma

somesoni2
Revered Legend

Give this a try

source=visitors earliest=-4mon@mon latest=@mon | eval Month=strftime(_time,"%Y-%m") | stats values(Month) as Month by name | where mvcount(Month)=1 AND Month=[|gentimes start=-1 | addinfo | eval search="\"".strftime(relative_time(info_max_time,"-1mon"),"%Y-%m")."\"" | return $search]
0 Karma

aweitzman
Motivator

This is a little crazy, but it does most of the job. I think some tweaking might be in order, though:

  1. Replacing info_max_time with info_search_time will make it dependent on when you actually execute the search, not the latest time in the time picker window. Otherwise, if your time picker has a range in it that doesn't extend to a day in the current month, your results will be off.
  2. Replacing stats values(Month) with stats count values(Month) will add the event count information you seem to be looking for.
  3. Adding | fields - Month will remove the Month column from the result table, so you'll just have event count by name. Alternatively, if you just want a total count of events rather than a count by name, you can add | stats sum(count) as count instead.
0 Karma

aweitzman
Motivator

Something like this:

sourcetype=visitors earliest=@mon NOT [search sourcetype=visitors earliest=-1mon@mon latest=@mon | table name | dedup name]

aweitzman
Motivator

Add the following to your main search:

earliest=@mon

This tells your search to "snap" the earliest time in your search to the beginning of the current month.

See http://docs.splunk.com/Documentation/Splunk/6.1.4/Search/Specifytimemodifiersinyoursearch for more details on this subject.

0 Karma

vjverma
Explorer

I tried:

source=visitors earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name] | stats count

The result is same with or without the NOT clause. It appears that when the query runs, the time frame is set to last month and hence the inner search does not kick in.

0 Karma

aweitzman
Motivator

Shouldn't be. The inner search is actually run first. As a subsearch, this creates something like ((name="a") OR (name="b") OR (name="c")) . By putting a NOT in front of this, you should be negating the contents of that, so that you only see things that don't match it.

What results do you get when you just do the following:

source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name | format

By adding the format command to the end, you should get the string that is generated as the subsearch for your main search above. Is it what you think it is supposed to be?

(Also note, this may not work with a real-time search. Choose any other kind of search with the time picker to see the results properly.)

0 Karma

vjverma
Explorer

When I run this individually, the result list looks okay. I haven't tried the format that you suggested but I see results come back when I run just "source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name" with "All time" option.

It is only when I execute "source=visitors earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name] | stats count" that the result doesn't look accurate.

Also, I see the following lines above the results window that tell me the timeframe is not being picked up correctly.

[subsearch]: Your timerange was substituted based on your search string
[subsearch]: No matching fields exist
≥ 1 result during September 2014

If I run the sub-query on its own, the timeframe indicated is correct "June through August"

0 Karma

aweitzman
Motivator

Right, because the main search is during September. That is what you would expect to see.

What makes you believe the result doesn't "look accurate?" By adding a stats count to the end of it, you're not inspecting the data but guessing based on a statistic. You need to inspect the data in order to see whether it's actually doing what you want or not.

Instead of doing a stats count at the end, why not replace it with a table name | dedup name there, too? That should at least get you some data to inspect: it should be a list of names that logged in during September that did not log in between June and August. Once you have that list, do some searches between June and August on those names, and they shouldn't show up there.

0 Karma

vjverma
Explorer

These two queries return the same result as a list and count.

  1. source=visitors earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name]

  2. source=visitors earliest=-1mon@mon latest=@mon

I did a spot check for some of the results that show up. They exist in the June-aug range as well.

0 Karma

aweitzman
Motivator

You still haven't run the format version of the subsearch, as listed above. What does it return?

(Also, is the field name you're looking for really "name"? If not, what is it really?)

0 Karma

vjverma
Explorer

Sorry, I wasn't explicit. I ran the query with format turned on as well. If I run the inner query on its own, then it returns accurate data.

However, in following syntax, it seems to have no bearing on the results. The following two queries return identical result (count, format, table, ...)

source=visitors earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table name | dedup name]

source=visitors earliest=-1mon@mon latest=@mon

The actual field is "Email Address".

source=emailusers earliest=-1mon@mon latest=@mon NOT [search source=visitors earliest=-4mon@mon latest=-1mon@mon | table "Email Address" | dedup "Email Address"]

0 Karma

aweitzman
Motivator

I hate to ask the "is it plugged in" question, but in that last search in your response with the "Email Address" field, you're not really passing source=visitors in the subsearch, are you? If your real source is emailusers then that needs to be emailusers also.

Assuming that's the case, it is possible that you could get the same response from your first and second searches if, in fact, every one of your visitors from September did not visit you between June and August. But you are asserting that there are visitors from September that show up between June and August. I'm wondering if you aren't running into a subsearch limit problem...

Given that I can't seem to come up with a way to make this work for you, you should try @somesoni2's answer below (with my suggested tweaks).

0 Karma

vjverma
Explorer

Thanks but how should I go about joining the results from two different time ranges.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...