I have a report which I'm trying to enhance to use a lookup report. The existing query is doing a simple agg to count by date and identifier:
type="REST" resource="Order" status="FINISHED" | bucket span=1d _time | stats count by client_id, _time
_time client_id count
6/7/12 12:00:00.000 AM 20006 123
6/8/12 12:00:00.000 AM 20006 117
6/7/12 12:00:00.000 AM 20008 36
6/7/12 12:00:00.000 AM 20009 101
I'm using a lookup table to enhance the data but I also want to see records from the lookup table which had no events. Something like this:
_time client_id Name count
6/7/12 12:00:00.000 AM 20006 Client A 123
6/8/12 12:00:00.000 AM 20006 Client A 117
6/7/12 12:00:00.000 AM 20008 Client B 36
6/8/12 12:00:00.000 AM 20008 Client B 0
6/7/12 12:00:00.000 AM 20009 Client C 101
6/8/12 12:00:00.000 AM 20009 Client C 0
6/7/12 12:00:00.000 AM 20010 Client D 0
6/8/12 12:00:00.000 AM 20010 Client D 0
I thought I'd be able to accomplish this by using inputlookup and left joining to the results from the original query
| inputlookup client_lookup.csv | JOIN type=left client_id [type="REST" resource="Order" status="FINISHED" | bucket span=1d _time | stats count by client_id, _time]
When I do this, the date from the subquery is lost from the output:
client_id Name count
20006 Client A 123
20008 Client B 36
20009 Client C 101
20010 Client D
I obviously have my head stuck in RDBMS land! I'm sure there's probably a better way of doing this or I'm missing something obvious but I'm a little stuck at the moment. Any help would be greatly appreciated.
Assuming a client_lookup.csv like:
client_id,Name 20006,Client A
You could do something like:
type="REST" resource="Order" status="FINISHED" | bucket span=1d _time | stats count by client_id, _time | lookup client_lookup.csv client_id OUTPUT Name
No join needed.
OK.. I think I have it now. There is a max option to the join command which defaults to 1. I adjusted it to match the number of days I was searching for and the all of the records are showing up.
Instead of:
*sent client_id Name count *
6/7/12 12:00:00.000 AM 20006 Client A 123
6/8/12 12:00:00.000 AM 20006 Client A 117
6/7/12 12:00:00.000 AM 20008 Client B 36
6/8/12 12:00:00.000 AM 20008 Client B 0
6/7/12 12:00:00.000 AM 20009 Client C 101
6/8/12 12:00:00.000 AM 20009 Client C 0
6/7/12 12:00:00.000 AM 20010 Client D
6/8/12 12:00:00.000 AM 20010 Client D
It appears that it probably doesn't work exactly like it would in SQL. Maybe there's a better approach to accomplish the same end.
Hmmmm, renaming the field helps a little. The date is available now but something is weird in how it's joining the records. It's finding the first match for each client and then dumping the other dates.
Example:
*sent client_id Name count *
6/7/12 12:00:00.000 AM 20006 Client A 123
6/7/12 12:00:00.000 AM 20008 Client B 36
6/7/12 12:00:00.000 AM 20009 Client C 101
6/7/12 12:00:00.000 AM 20010 Client D
Join is really not preferred, but if you want to mess with your original query, then rename the _time
field to something not beginning with an _
, then name it back after the join.
The reason for the join is so that I also display the clients which don't have any matching events. Without the left join, they drop off the report.
Ultimately, I want to draw attention to clients which have low or no activity. I realize this is sort of the reverse of what splunk is meant to do but I've never been shy of driving a square peg through a round hole 😉