Solved: POST PROCESS worth it?

subtrakt · ‎07-07-2014

I am experimenting with post process but post process seems to make the dashboard reload slower than the same dashboard that is using scheduled searches only. Is it safe to say with a dashboard with a high number of panels, I should use scheduled searches and avoid post process? Post process would seem more organized but between, post process or performance, i am going to have to go with performance.

sideview · ‎07-07-2014

PostProcess and Scheduled Searches are separate techniques that do quite different things. As such you can't really compare one to the other. Nor are you forced to pick one over the other for that matter - you can use postProcess on scheduled jobs just fine...

So let me back up and explain the usefulness of scheduled saved searches versus postprocess.

1) Scheduled Saved Searches
You're taking a search or a report and telling splunk to run it on a schedule. When a user then hits a dashboard that references this given search or report, by default that dashboard will then not re-run the search "on demand" - instead the dashboard will grab the most-recently-run results. The most-recently-run results will load almost instantly, whereas if the dashboard ran the search on demand it would take some number of seconds roughly proportional to the number of events Splunk has to get off disk.

2) Postprocess
PostProcess is a way to use the Splunk Search API, and by extension it covers a range of modules and techniques that use that API. The basic idea is that given some search result (and it can be a scheduled search result or a result from a search run "on demand"), you can use postprocess to carve up the results in more than one way, such that you can serve more than one panel or more than one "thing" using the same search results.

Let's take a simple example and walk through the benefits of both!

Let's say you have a very simple dashboard where you want to show the user two things.

A) a graph of traffic over time split by the top users (in SPL, lets say * | timechart count by username)

B) A table showing the 10 most active overall users (in SPL, lets say `* | top 10 username"

1) Enter "scheduled saved searches" -

You can either let these both be ad-hoc searches, OR you can schedule one or both as separate scheduled saved searches. If you schedule one or both, the panels served by the scheduled results will load almost instantly. Any ad-hoc panel will take a while to populate.

2) OK. Now enter "Postprocess" -
This is a very different thing. With postprocess you can actually refactor this picture such that only one search result is needed to give the data to both panels.

* | bin _time span=1h | stats count by _time username

and that search is carved up by two different postprocess searches --

| timechart sum(count) by username

and

| stats sum(count) as count by username

And that's it. The postprocess machinery has nothing to do with how that search result came into existence -- it might be a scheduled search result, or it might be ad-hoc.

View solution in original post

sideview · ‎07-07-2014