Reporting

Running report from large amount of data

yuwtennis
Communicator

Hi!

When you are creating a report from millions of data, I believe using summary indexing is a good solution.

However , if you have a requirement as on demand, would this still be a solution? In my case, I need to create a report that is mixture of average,.sum ,.most frequent value ,.etc, this makea complicated.

I appreciate if someone can give me an advice.

Tags (2)

ShaneNewman
Motivator

I have several summary indexes that do this. The most important thing, should be fairly easy, is to figure out a time span. This is a saved search template I use to populate summary indexes capturing data you described above:

[savedsearchname]
    enableSched = 1
    cron_schedule = */5 * * * *
    dispatch.earliest_time = -8m@m
    dispatch.latest_time = -3m@m
    action.summary_index = 1
    action.summary_index._name = sum_index
    action.summary_index.stat_tag = statistics
    search = index=your_index sourcetype=your_sourcetype | bucket _time span=1m | sistats\stats avg(your_field) AS your_field_avg, median(your_field) AS your_field_median, mode(your_field) AS your_field_mode, count(your_field) AS your_field_count, dc(your_field) AS your_field_dc, max(your_field) AS your_field_max, min(your_field) AS your_field_min, stdev(your_field) AS your_field_stdev, var(your_field) as your_field_var by _time

You can use a macros.conf to make the search look cleaner, as I do. I just wrote it all out to show how to set up the values you need using sistats. As you can see above, this data is on (up to) an 8 minute delay from real-time. You can adjust the delay by changing the earliest_time and latest_time parameters.

Also, when getting data back out after using sistats, you will need to rerun the stats command to "reheat" the data for use.

0 Karma

ShaneNewman
Motivator

Easy enough, just use sub-searches in your search string. There is no real reason to create a temp index, you are just adding another failure point.

0 Karma

yuwtennis
Communicator

Hello ShaneNewman.

Thank you for the reply.
I did not mention but we have 3 indexes to summarize and
each has approximately

index A : 50,000,000 events (150,000 events indexed per day)
index B : 10,000,000 events (50,000 events indexed per day)
index C : 50,000 events (few events indexed per day)

and 31 summary items to calculate.
Some summary items needs to be calculated in different dimension thus we need to create search separately.

I believe I would need to create temporary summary index and then concatenate it to single daily summary index where user can use time modifier .

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...