Splunk Search

Results are missing from this search

bj6192
Explorer

Hi,
I use the below search to get the row with max value;

(index="indexa" OR index="indexb") sourcetype="sourceA"  | table _time,money,user | eventstats max(_time) as mtime by user |where _time=mtime

but some user can not find in result. And When I add user in below search, it exists in result.

(index="indexa" OR index="indexb") sourcetype="sourceA"  user="XXX" | table _time,money,user | eventstats max(_time) as mtime by user |where _time=mtime

How can I know what different in above search? Thanks
ps.above search has 1 million row in first phase and the final result should has 220000 row output

Tags (1)
0 Karma
1 Solution

DalJeanis
Legend

Updated with feedback from @woodcock -

(index="indexa" OR index="indexb") sourcetype="sourceA" 
| fields money,user 
| rename _time as time
| fields - _*
| rename time as _time
| dedup user

Everything before the dedup will run on the indexers, leaving only three fields to be transmitted to the search head, at which point the records are collated in descending _time order and due to the dedup command, only the latest record for each user will be retained.


I believe you are just running out of time (updated - time and/or RAM). Try this...

(index="indexa" OR index="indexb") sourcetype="sourceA" 
| fields money,user 
| eventstats max(_time) as mtime by user 
| where _time=mtime

Explanation - fields is a distributed streaming command, table is not. Because table has some aggregate effects (limiting the total number of records, for example) it cannot run at the individual indexers. The table command requires ALL the data be held and transmitted to the search head, where the aggregate effects can be effected.

(owww - did I really just type that sentence? ...let's pretend I typed "where the search head can make the aggregate effects happen.")

The fields command will strip out the unneeded fields, streamlining the process so that it should run faster and require less resources. Hopefully, that will make your user information reappear.

After you get through testing that, try this...

(index="indexa" OR index="indexb") sourcetype="sourceA" 
| fields money,user 
| dedup user

It should achieve the same result slightly faster. dedup keeps the first event, and events are returned with the latest first, so a simple dedup on user should get you the latest records for each.

View solution in original post

woodcock
Esteemed Legend

Your problem is the use of the table command, which is a finalizing command (or something like that). It has the disastrous effect of ceasing all efficient map-reduction and sending the intermediate results as-is to the search head. When you limit this result set by adding user="XXX" it just so happens that you have reduced your totally absurd amount of intermediate results into a just-barely-manageable amount (maybe) so you don't run out of RAM on your search head (there is probably a log somewhere in index=_* that says your search blew up the RAM) and the rest of your pipeline has at least a partially complete result set to work on.

The Pro-Tip here is: Always use fields, never use table. Try this:

(index="indexa" OR index="indexb") sourcetype="sourceA"
| fields _time money user
| rename _time AS time
| fields - _*
| eventstats max(time) as mtime by user
| where time=mtime
| convert ctime(*time)
0 Karma

woodcock
Esteemed Legend

I like my explanation better but I didn't even look at what the overall search was trying to do and @DalJeanis has a better answer: dedup. I stopped reading his at first when he wrote "running out of time" because I knew that you were actually "running out of RAM". But really, it could be either. In the end, the root cause is the same: table. My search is still useful for others that are doing something that doesn't reduce down to dedup.

0 Karma

DalJeanis
Legend

Updated with feedback from @woodcock -

(index="indexa" OR index="indexb") sourcetype="sourceA" 
| fields money,user 
| rename _time as time
| fields - _*
| rename time as _time
| dedup user

Everything before the dedup will run on the indexers, leaving only three fields to be transmitted to the search head, at which point the records are collated in descending _time order and due to the dedup command, only the latest record for each user will be retained.


I believe you are just running out of time (updated - time and/or RAM). Try this...

(index="indexa" OR index="indexb") sourcetype="sourceA" 
| fields money,user 
| eventstats max(_time) as mtime by user 
| where _time=mtime

Explanation - fields is a distributed streaming command, table is not. Because table has some aggregate effects (limiting the total number of records, for example) it cannot run at the individual indexers. The table command requires ALL the data be held and transmitted to the search head, where the aggregate effects can be effected.

(owww - did I really just type that sentence? ...let's pretend I typed "where the search head can make the aggregate effects happen.")

The fields command will strip out the unneeded fields, streamlining the process so that it should run faster and require less resources. Hopefully, that will make your user information reappear.

After you get through testing that, try this...

(index="indexa" OR index="indexb") sourcetype="sourceA" 
| fields money,user 
| dedup user

It should achieve the same result slightly faster. dedup keeps the first event, and events are returned with the latest first, so a simple dedup on user should get you the latest records for each.

niketn
Legend

@bj6192, would the following query work for you?

(index="indexa" OR index="indexb") sourcetype="sourceA" user=*
| fields _time user
| stats max(_time) as _time by user

Based on your query, both indexa and indexb have sourcetype sourceA containing user info. Is this correct?
What is the time span you are using, how do the result look if you reduce the timespan to say 1 hour or 15 min?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...