Splunk Search

Issue with count -- How can I search a large data set without Splunk truncating the data?

closeset
New Member

Hi,

I would like query all data over the past year and then use "stats count by some fields" to calculate the counts.

However, the data is too large (at least a few millions) and Splunk truncates data when querying, so the number of counts is inaccurate.

Does anyone know a good way to fix it?

PS. I tried 'sistats' and set a report run every hour to query data from the previous year.
Ideally, I hope the report can collect data in a smaller time interval accurately, and the aggregate the result.
However, in each hour, the report query the whole previous data inaccurately and then added up all counts as the result.

0 Karma

jkat54
SplunkTrust
SplunkTrust

I think you have several options.

  1. Create and accelerate a data model.
  2. Create summary indexes (using searches that run every day, or more frequently (like every 5-15 minutes) and then use a backfill script).
  3. Do both of the above, create an accelerated DM and summary indexes from the DM using the tstats command, etc.

Number one being the easiest approach. Number 2 being a faster approach. Number three being necessary if you need to correlate data from more than one really large data set.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Are you referring to the number of rows getting truncated?

If so, I had a simialr problem a while back where it would truncate anything more than 50,000 rows and lead to inaccurate results. Luckily this is a simple fix to limits.conf

maxresultrows = <integer>
* Configures the maximum number of events are generated by search commands which 
grow the size of your result set (such as multikv) or that create events. Other search commands are explicitly 
controlled in specific stanzas below.
* This limit should not exceed 50000. Setting this limit higher than 50000 causes instability.
* Defaults to 50000. 

http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/Limitsconf

0 Karma

closeset
New Member

Thank you skoelpin!
This is one possible solution for me. In this case, because increasing the limit might cause some instability, do you happen to know other possible methods?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Are the number of rows getting truncated after 50k? If so then this may be your only solution

I've increased the limit before and haven't seen any instability issues. I would contact support and get their opinion before trying this in production

0 Karma

jkat54
SplunkTrust
SplunkTrust

The limit is there to protect your browser from locking up (amoung other reasons... or at least that's what I believe). When you load than much into memory things can get funny. "Unstable" even!

0 Karma

cmerriman
Super Champion

can you provide the original query that ended up being truncated as well as what query you're using to try summary indexing? replace any sensitive information. This will help the community answer your question more accurately.

closeset
New Member

Hi, Thanks for reminding me. The code is here:

The code to create summary indexing report:
sourcetype=my_source event_id =*
| sistats count by event_id field1 field2

The name of the report is "my_report_name."

The code to retrieve the result:
index=summary search_name="my_report_name"
|stats count by event_id field1 field2

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...