Hi,
I use the below search to get the row with max value;
(index="indexa" OR index="indexb") sourcetype="sourceA" | table _time,money,user | eventstats max(_time) as mtime by user |where _time=mtime
but some user can not find in result. And When I add user in below search, it exists in result.
(index="indexa" OR index="indexb") sourcetype="sourceA" user="XXX" | table _time,money,user | eventstats max(_time) as mtime by user |where _time=mtime
How can I know what different in above search? Thanks
ps.above search has 1 million row in first phase and the final result should has 220000 row output
Updated with feedback from @woodcock -
(index="indexa" OR index="indexb") sourcetype="sourceA"
| fields money,user
| rename _time as time
| fields - _*
| rename time as _time
| dedup user
Everything before the dedup
will run on the indexers, leaving only three fields to be transmitted to the search head, at which point the records are collated in descending _time
order and due to the dedup
command, only the latest record for each user
will be retained.
I believe you are just running out of time (updated - time and/or RAM). Try this...
(index="indexa" OR index="indexb") sourcetype="sourceA"
| fields money,user
| eventstats max(_time) as mtime by user
| where _time=mtime
Explanation - fields
is a distributed streaming command, table
is not. Because table has some aggregate effects (limiting the total number of records, for example) it cannot run at the individual indexers. The table command requires ALL the data be held and transmitted to the search head, where the aggregate effects can be effected.
(owww - did I really just type that sentence? ...let's pretend I typed "where the search head can make the aggregate effects happen.")
The fields
command will strip out the unneeded fields, streamlining the process so that it should run faster and require less resources. Hopefully, that will make your user information reappear.
After you get through testing that, try this...
(index="indexa" OR index="indexb") sourcetype="sourceA"
| fields money,user
| dedup user
It should achieve the same result slightly faster. dedup
keeps the first event, and events are returned with the latest first, so a simple dedup
on user should get you the latest records for each.
Your problem is the use of the table
command, which is a finalizing
command (or something like that). It has the disastrous effect of ceasing all efficient map-reduction and sending the intermediate results as-is to the search head. When you limit this result set by adding user="XXX"
it just so happens that you have reduced your totally absurd amount of intermediate results into a just-barely-manageable amount (maybe) so you don't run out of RAM on your search head (there is probably a log somewhere in index=_*
that says your search blew up the RAM) and the rest of your pipeline has at least a partially complete result set to work on.
The Pro-Tip here is: Always use fields
, never use table
. Try this:
(index="indexa" OR index="indexb") sourcetype="sourceA"
| fields _time money user
| rename _time AS time
| fields - _*
| eventstats max(time) as mtime by user
| where time=mtime
| convert ctime(*time)
I like my explanation better but I didn't even look at what the overall search was trying to do and @DalJeanis has a better answer: dedup
. I stopped reading his at first when he wrote "running out of time" because I knew that you were actually "running out of RAM". But really, it could be either. In the end, the root cause is the same: table
. My search is still useful for others that are doing something that doesn't reduce down to dedup
.
Updated with feedback from @woodcock -
(index="indexa" OR index="indexb") sourcetype="sourceA"
| fields money,user
| rename _time as time
| fields - _*
| rename time as _time
| dedup user
Everything before the dedup
will run on the indexers, leaving only three fields to be transmitted to the search head, at which point the records are collated in descending _time
order and due to the dedup
command, only the latest record for each user
will be retained.
I believe you are just running out of time (updated - time and/or RAM). Try this...
(index="indexa" OR index="indexb") sourcetype="sourceA"
| fields money,user
| eventstats max(_time) as mtime by user
| where _time=mtime
Explanation - fields
is a distributed streaming command, table
is not. Because table has some aggregate effects (limiting the total number of records, for example) it cannot run at the individual indexers. The table command requires ALL the data be held and transmitted to the search head, where the aggregate effects can be effected.
(owww - did I really just type that sentence? ...let's pretend I typed "where the search head can make the aggregate effects happen.")
The fields
command will strip out the unneeded fields, streamlining the process so that it should run faster and require less resources. Hopefully, that will make your user information reappear.
After you get through testing that, try this...
(index="indexa" OR index="indexb") sourcetype="sourceA"
| fields money,user
| dedup user
It should achieve the same result slightly faster. dedup
keeps the first event, and events are returned with the latest first, so a simple dedup
on user should get you the latest records for each.
@bj6192, would the following query work for you?
(index="indexa" OR index="indexb") sourcetype="sourceA" user=*
| fields _time user
| stats max(_time) as _time by user
Based on your query, both indexa and indexb have sourcetype sourceA containing user info. Is this correct?
What is the time span you are using, how do the result look if you reduce the timespan to say 1 hour or 15 min?