Splunk Search

How to search losing events as number of events increase?

jpawloski
Path Finder

I have a search that compares an expanded multi value field against a lookup table and returns those events where at least one of the field values was not found. My thinking is: If a singleColumns value is not found, I'll have at least two events with a shared _cdvalue in my results, which I then dedup to ensure my counts are correct.

base search | eval UID = _cd | eval singleColumns=split(column_name, " ") | mvexpand 
singleColumns | search NOT [|inputlookup Known_Bad_Columns | rename bad_columns as 
singleColumns ] | dedup UID | stats count by field1, field2 | sort by count desc

I ran this against some known events (roughly 7 million prior to the expand) and some (not all) of my event counts were lower than expected. I then reran this search filtering to those specific event values (500 thousand prior to expand) and my counts came back correct. Can someone explain my loss of precision and possibly suggest a correction?

0 Karma
1 Solution

jpawloski
Path Finder

This issue actually came down to the use of _cd as my unique identifier. I opted to use the tuple detailed here and it is returning all events: https://answers.splunk.com/answers/49/does-each-splunk-event-have-a-unique-identifier.html

View solution in original post

0 Karma

jpawloski
Path Finder

This issue actually came down to the use of _cd as my unique identifier. I opted to use the tuple detailed here and it is returning all events: https://answers.splunk.com/answers/49/does-each-splunk-event-have-a-unique-identifier.html

0 Karma

KailA
Contributor

Hi,

If you have no warning message when using the mvexpand function (like memory problem), it might be because of the sort limit.

Can you try that :

 base search 
| eval UID = _cd 
| eval singleColumns=split(column_name, " ") 
| mvexpand  singleColumns 
| search NOT [|inputlookup Known_Bad_Columns | rename bad_columns as  singleColumns ] 
| dedup UID 
| stats count by field1, field2 
| sort 0 -count

You can see in the documentation of sort http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Sort that there is a default limit to 10000.

I hope this will solve the problem.

Kail

0 Karma

jpawloski
Path Finder

Appreciate the help but it turned out to be my UID. I'll post the answer shortly.

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...