How to create a search for multiple earliest dates...

kpavan · ‎09-07-2022

Hi All,

Am looking for query to have multiple earliest days

index=something sourcetype=something earliest=-7d@d latest=@d
| timechart span=1d dc(id) as total

its giving output as

2022-08-31	13548
2022-09-01	13438
2022-09-02	13782
2022-09-03	9831
2022-09-04	13602
2022-09-05	12856
2022-09-06	12849

But actual data per day is something above 25k, but because of data is getting split so number showing very less per day wise as above table.

If i use

index=something sourcetype=something earliest=-7d@d latest=@d
| stats dc(id) as total

output is 26894

index=something sourcetype=something earliest=-8d@d latest=-1@d
| stats dc(id) as total

output 27099

so on, if I change earliest and latest to get last 7 days i get above 25k or 26k but if use timechart then its half the number.

It would be great help If anyone has query to get correct output within single query.

Thanks in advance!

yuanliu · ‎09-07-2022

@kpavan wrote:
But actual data per day is something above 25k, but because of data is getting split so number showing very less per day wise as above table.
... so on, if I change earliest and latest to get last 7 days i get above 25k or 26k but if use timechart then its half the number.

I think you misunderstood what timechart span=1d does. The problem does not exist. Let me break down a little.

First, if you perform

index=something sourcetype=something earliest=-1d@d latest=-0d@d
| stats max(_time) as _time dc(id) as total

you'll get something like

2022-09-06

12849

Then, perform

index=something sourcetype=something earliest=-2d@d latest=-1d@d
| stats max(_time) as _time dc(id) as total

output will look like

2022-09-05

12856

and so on. The point is, this sequence is exactly what timechart span=1d@d does. Timechart does not reduce counts by half; it simply performs the count day by day, day after day.

Secondly, why does

index=something sourcetype=something earliest=-7d@d latest=@d
| stats dc(id) as total

end up only 26894 instead of the sum of the 7 days of timechart, i.e., ~ 90,000? That's because you are performing distinct count (dc). There are large number of overlaps in field id day over day. For example, if on day one, id A, B, and C appears, on day two, A, C, and D appears, your dc(id) will be 3 on both days individually; that's what timechart span=1d will show. But if you set earliest=-2d and perform dc(id), the output will be 4.

Do another experiment:

index=something sourcetype=something earliest=-7d@d latest=@d
| timechart span=7d dc(id) as total

This will give you that magical number ~27,000.

All this is a long way to say that timechart span=1d is really giving correct results (as far as dc is concerned).

ITWhisperer · ‎09-07-2022

What were you expecting as a correct result as the values you shown are not inconceivably consistent.

richgalloway · ‎09-07-2022

Please explain what is meant by "multiple earliest days". A query can have only one earliest_time setting.

I think the distinct_count (dc) function may be confusing the matter. It may be normal for a span of 7 or 8 days to have 26000+ unique values for a field, but for each day in that same range to have far less. It merely means id values are repeated over the days. If you use count instead of distinct_count then you should see the totals for each day add up to the count for all 7 or 8 days.

---
If this reply helps you, Karma would be appreciated.

How to create a search for multiple earliest dates for 7day output?

eval

timechart

Introducing the Splunk Community Dashboard Challenge!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...