I have two indexes that I want to create a summary from every hour.
Index1
request_type, request_guid, request_timestamp, meta_field1, meta_field2, ...
Index1 contains log entries from each processing steps in each service request. Each service request is assigned a unique request_guid and all ~10 logs for the processing of a request have that id. The time the request was made is stored in request_timestamp and also remains the same through all logs for a request..
index2
request_guid, meta_fieldA, meta_fieldB, ...
index2 contains more data for the logs, but is in a separate index so that it can be secured differently from index1. The request_guid is the same value as in index1
I want to summarize by collecting stats for each request type by hour.
The approach I have taken is to select all the logs from Index1 where the request_timestamp is in the hour. I cannot use the log time directly as a request logs might span into the next hour ( as in started at 9:59:59 and ended at 10:00:01)
index=index1 earliest=0
| addinfo | eval timemillis=strftime(strptime(request_timestamp,"%Y-%m-%dT%H:%M:%S.%3N%z"),"%s")
| where timemillis>=info_min_time AND timemillis<info_max_time
This gives me a list of all the logs from index1 that I want to summarize. I then join the logs from index2
| join usetime=false type=left overwrite=false request_guid max=0
[search index=index2]
I then create a summary with the following
| sort _time desc
| transaction request_guid maxspan=10m mvlist=true
| stats
count as requests_in_period
by request_type
What results is elevated counts per period. They all are multiples of 2, which tells me that a cross product seems to be happening. I think the problem is in the join command. There are multiple entries selected from index1 that have duplicate values for request_guid that I think are over selecting from index2, yet I want all the matching entries from index2 with the request_guid.
When I view the events right after the join, all the logs are from index1 and none are from index2
Background
I started with a subselect of the request_guids that should be summarized, but a subselect is limited to 10500 entries, which will not be enough for an hours of transactions logs.
... View more