Splunk Search

Data count issues using | where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity") on a .csv to be able to search based on time

Bentash
Explorer

Using | where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity") on a .csv to be able to search based on time.
I get a wrong count of data everytime i run

| eval _time=strptime(date,"%m/%d/%Y")
| sort - _time
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")
| stats count as "total status" by "status"

returns a count of = 9

But::
| eval _time=strptime(date,"%m/%d/%Y")
| sort - _time
| addinfo
| stats count as "total status" by "status"

returns a count of 11.

This appears to be on not all .csv's but most

Tags (2)
0 Karma
1 Solution

elliotproebstel
Champion

What you're describing sounds like the expected behavior to me, so maybe you can clarify. Let's say you set your timepicker to search for events "Yesterday". Then when you run this:

your base search
| eval _time=strptime(date,"%m/%d/%Y") 
| sort - _time
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")
| stats count as "total status" by "status"

You should get back two columns: total status and status, and the number will reflect the number of events that took place yesterday. When you run this:

your base search
| eval _time=strptime(date,"%m/%d/%Y") 
| sort - _time
| addinfo
| stats count as "total status" by "status"

you aren't filtering down to events that match your timepicker, so it makes complete sense that you would often get fewer events returned from the first search than from the second.

Let's take it step by step:

| eval _time=strptime(date,"%m/%d/%Y") 
| sort - _time
| addinfo

The eval statement converts a field called date into a field called _time by using the strptime function. The sort statement is sorting your events into reverse chronological order. The addinfo statement is arguably the most complex of the three lines. As per documentation, this is adding four fields: info_min_time (the earliest time defined by your timepicker), info_max_time (the latest time defined by your timepicker), info_sid (the search ID number of this search), and info_search_time (the epoch time at which the search was run).

After these, in the first search, you do some filtering:

| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")

This discards all events with a timestamp before or after the window defined by your timepicker. So if your timepicker says "Yesterday", then this will discard events from before or after yesterday.

Finally, you call:

| stats count as "total status" by "status"

This is just a counting function, but of course it will yield different counts based on whether or not you've discarded events based on when they took place. If you use the | where... statement, then you will retain fewer events for this count.

View solution in original post

Bentash
Explorer

alt text

alt text

@elliotproebstel check it, same issue

0 Karma

elliotproebstel
Champion

Yes, that's what I expected. But can you determine which two events are present in the second set that aren't present in the first? Let's look at those (particularly the date field) to see if we can tell what's different about those than the others.

0 Karma

Bentash
Explorer

alt text

alt text

@elliotproebstel this i what i am currently seeing and i am running both all time. Count is OFF

0 Karma

elliotproebstel
Champion

I'd recommend running both searches without the stats command at the end and comparing the results to see if you can identify the two events that are getting filtered out in the search using the time filters. Can you post them here (redacting sensitive info, if needed), along with some that are not getting filtered out? Maybe we can find the pattern of why they're being treated differently. I suspect the date field isn't formatted exactly the same on all events. My gut instinct is that maybe the two events that are getting trashed have a date with a single digit day of the month, because %d will match only if the day is two digits (requiring a leading 0 if the day is 01-09).

0 Karma

Bentash
Explorer

@elliotproebstel check below

0 Karma

elliotproebstel
Champion

What you're describing sounds like the expected behavior to me, so maybe you can clarify. Let's say you set your timepicker to search for events "Yesterday". Then when you run this:

your base search
| eval _time=strptime(date,"%m/%d/%Y") 
| sort - _time
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")
| stats count as "total status" by "status"

You should get back two columns: total status and status, and the number will reflect the number of events that took place yesterday. When you run this:

your base search
| eval _time=strptime(date,"%m/%d/%Y") 
| sort - _time
| addinfo
| stats count as "total status" by "status"

you aren't filtering down to events that match your timepicker, so it makes complete sense that you would often get fewer events returned from the first search than from the second.

Let's take it step by step:

| eval _time=strptime(date,"%m/%d/%Y") 
| sort - _time
| addinfo

The eval statement converts a field called date into a field called _time by using the strptime function. The sort statement is sorting your events into reverse chronological order. The addinfo statement is arguably the most complex of the three lines. As per documentation, this is adding four fields: info_min_time (the earliest time defined by your timepicker), info_max_time (the latest time defined by your timepicker), info_sid (the search ID number of this search), and info_search_time (the epoch time at which the search was run).

After these, in the first search, you do some filtering:

| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")

This discards all events with a timestamp before or after the window defined by your timepicker. So if your timepicker says "Yesterday", then this will discard events from before or after yesterday.

Finally, you call:

| stats count as "total status" by "status"

This is just a counting function, but of course it will yield different counts based on whether or not you've discarded events based on when they took place. If you use the | where... statement, then you will retain fewer events for this count.

Bentash
Explorer

So @elliotproebstel, how did you fix it?

0 Karma

Bentash
Explorer

it was a data issue

0 Karma

elliotproebstel
Champion

@Bentash - I don't think I've yet understood your goal. If you want to only count entries that fall within the time range specified by your timepicker selection, then the first query will do the trick. If you want to count all entries, regardless of their timestamp, then the second query is what you want.

If none of this is helping, perhaps you can state again what your goal is and why you think neither of these queries is getting you to your goal. I want to help, but I'm still pretty confused about what isn't working for you.

0 Karma

Bentash
Explorer

@elliotproebstel, the plan here is to be able to run a search based on time from .csv. If I run the search without | where ... I get an accurate count of field values from the csv but when I add the |where .... I’m getting a -1 or -2 On the total count .. hope this helps clarify

0 Karma

elliotproebstel
Champion

Let's dig into the real details. You have a csv file, and this file contains lines with a column named _time.

Question #1: How many lines are in the csv overall?
Question #2: How many lines in the csv have a timestamp that falls within the window you're using to search? For example, if the timepicker says "Yesterday", how many lines in the csv have a timestamp from yesterday?
Question #3: When you run the following code snippet, do you see all events from the csv:

your base search
| eval _time=strptime(date,"%m/%d/%Y") 
| sort - _time
| addinfo

Question #4: When you run the following code snippet, do you see all events from the csv that fall within the time defined by the timepicker?

your base search
| eval _time=strptime(date,"%m/%d/%Y") 
| sort - _time
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")

If #3 and/or #4 are a "no", then try to provide as much detail as you can, and I'll help dig into the weeds with you.

0 Karma

Bentash
Explorer

please see images below @elliotproebstel

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...