Thankyou very much that worked and is a viable solution to my problem. I have marked your answer accordingly. If i understand correctly, this effectively runs two separate queries in parallel and then merges the results based on the stats based on the normalized field being common. Given the DNS Requests I am looking for are likely going to be in the numbers of thousands of events where as the process information as a whole is going to be in the hundreds of millions depending upon the timescale and the majority being of no value. The query as you can imagine operates quite slowly on large datasets as it is gathering all process information.
Is it possible to expand upon this foundation to optimize it to make it even more efficient for larger datasets?
For example perform the DNS query and ascertain the ID's we care about then search the process events just for those IDs.
I have tested this concept and it works albeit manual and a two stage process which ideally I would like to streamline as much as possible.
Stage 1
Initially identify the process event id's for the dns entries I care about. (This is very quick). Export the list and comma delimit in an editor then use these event id's as a filter for stage 2.
event_simpleName=DNSRequest | table ContextProcessId_decimal
Stage 2
Perform the below query specifying the process id's identified above for both the DNS Requests and Process Information using the query you provided.
(event_simpleName=DNSRequest ContextProcessId_decimal IN (10348240759135,1939925450819)) OR (event_simpleName=ProcessRollup2 TargetProcessId_decimal IN (10348240759135,1939925450819))
| eval NormalizedProcessId_decimal=coalesce(ContextProcessId_decimal, TargetProcessId_decimal)
| stats values(ProcessName) as ProcessName, values(DomainName) as DomainName by NormalizedProcessId_decimal
Notes:
Whilst the above executes much faster it is a manual two stage process. Ideally I would like to automate this concept for speed of execution benefits if possible.
Although the above queries to prove the concept I have only specified two IDs, the reality is this will be in the thousands. I suspect there will be a finite number.
Is it possible to output stage 1 into a list then use that list in stage 2 seamlessly?
Any assistance would be appreciated.
Thanks
... View more