Hi All,
Am looking for query to have multiple earliest days
index=something sourcetype=something earliest=-7d@d latest=@d
| timechart span=1d dc(id) as total
its giving output as
2022-08-31 | 13548 |
2022-09-01 | 13438 |
2022-09-02 | 13782 |
2022-09-03 | 9831 |
2022-09-04 | 13602 |
2022-09-05 | 12856 |
2022-09-06 | 12849 |
But actual data per day is something above 25k, but because of data is getting split so number showing very less per day wise as above table.
If i use
index=something sourcetype=something earliest=-7d@d latest=@d
| stats dc(id) as total
output is 26894
index=something sourcetype=something earliest=-8d@d latest=-1@d
| stats dc(id) as total
output 27099
so on, if I change earliest and latest to get last 7 days i get above 25k or 26k but if use timechart then its half the number.
It would be great help If anyone has query to get correct output within single query.
Thanks in advance!
@kpavan wrote:But actual data per day is something above 25k, but because of data is getting split so number showing very less per day wise as above table.
... so on, if I change earliest and latest to get last 7 days i get above 25k or 26k but if use timechart then its half the number.
I think you misunderstood what timechart span=1d does. The problem does not exist. Let me break down a little.
First, if you perform
index=something sourcetype=something earliest=-1d@d latest=-0d@d
| stats max(_time) as _time dc(id) as total
you'll get something like
2022-09-06 | 12849 |
Then, perform
index=something sourcetype=something earliest=-2d@d latest=-1d@d
| stats max(_time) as _time dc(id) as total
output will look like
2022-09-05 | 12856 |
and so on. The point is, this sequence is exactly what timechart span=1d@d does. Timechart does not reduce counts by half; it simply performs the count day by day, day after day.
Secondly, why does
index=something sourcetype=something earliest=-7d@d latest=@d
| stats dc(id) as total
end up only 26894 instead of the sum of the 7 days of timechart, i.e., ~ 90,000? That's because you are performing distinct count (dc). There are large number of overlaps in field id day over day. For example, if on day one, id A, B, and C appears, on day two, A, C, and D appears, your dc(id) will be 3 on both days individually; that's what timechart span=1d will show. But if you set earliest=-2d and perform dc(id), the output will be 4.
Do another experiment:
index=something sourcetype=something earliest=-7d@d latest=@d
| timechart span=7d dc(id) as total
This will give you that magical number ~27,000.
All this is a long way to say that timechart span=1d is really giving correct results (as far as dc is concerned).
What were you expecting as a correct result as the values you shown are not inconceivably consistent.
Please explain what is meant by "multiple earliest days". A query can have only one earliest_time setting.
I think the distinct_count (dc) function may be confusing the matter. It may be normal for a span of 7 or 8 days to have 26000+ unique values for a field, but for each day in that same range to have far less. It merely means id values are repeated over the days. If you use count instead of distinct_count then you should see the totals for each day add up to the count for all 7 or 8 days.