Issue with count -- How can I search a large data ...

closeset · ‎08-28-2017

Hi,

I would like query all data over the past year and then use "stats count by some fields" to calculate the counts.

However, the data is too large (at least a few millions) and Splunk truncates data when querying, so the number of counts is inaccurate.

Does anyone know a good way to fix it?

PS. I tried 'sistats' and set a report run every hour to query data from the previous year.
Ideally, I hope the report can collect data in a smaller time interval accurately, and the aggregate the result.
However, in each hour, the report query the whole previous data inaccurately and then added up all counts as the result.

jkat54 · ‎08-31-2017

I think you have several options.

Create and accelerate a data model.
Create summary indexes (using searches that run every day, or more frequently (like every 5-15 minutes) and then use a backfill script).
Do both of the above, create an accelerated DM and summary indexes from the DM using the tstats command, etc.

Number one being the easiest approach. Number 2 being a faster approach. Number three being necessary if you need to correlate data from more than one really large data set.

skoelpin · ‎08-28-2017

Are you referring to the number of rows getting truncated?

If so, I had a simialr problem a while back where it would truncate anything more than 50,000 rows and lead to inaccurate results. Luckily this is a simple fix to limits.conf

maxresultrows = <integer>
* Configures the maximum number of events are generated by search commands which 
grow the size of your result set (such as multikv) or that create events. Other search commands are explicitly 
controlled in specific stanzas below.
* This limit should not exceed 50000. Setting this limit higher than 50000 causes instability.
* Defaults to 50000.

http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/Limitsconf

closeset · ‎08-28-2017

Thank you skoelpin!
This is one possible solution for me. In this case, because increasing the limit might cause some instability, do you happen to know other possible methods?

skoelpin · ‎08-30-2017

Are the number of rows getting truncated after 50k? If so then this may be your only solution

I've increased the limit before and haven't seen any instability issues. I would contact support and get their opinion before trying this in production

jkat54 · ‎08-30-2017

The limit is there to protect your browser from locking up (amoung other reasons... or at least that's what I believe). When you load than much into memory things can get funny. "Unstable" even!

cmerriman · ‎08-28-2017

can you provide the original query that ended up being truncated as well as what query you're using to try summary indexing? replace any sensitive information. This will help the community answer your question more accurately.

closeset · ‎08-28-2017

Hi, Thanks for reminding me. The code is here:

The code to create summary indexing report:
sourcetype=my_source event_id =*
| sistats count by event_id field1 field2

The name of the report is "my_report_name."

The code to retrieve the result:
index=summary search_name="my_report_name"
|stats count by event_id field1 field2

Issue with count -- How can I search a large data set without Splunk truncating the data?

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM