Time from subquery lost when joining to lookup tab...

mcantrell · ‎06-08-2012

I have a report which I'm trying to enhance to use a lookup report. The existing query is doing a simple agg to count by date and identifier:

type="REST" resource="Order" status="FINISHED" | bucket span=1d _time | stats count by client_id, _time

_time client_id count 6/7/12 12:00:00.000 AM 20006 123 6/8/12 12:00:00.000 AM 20006 117 6/7/12 12:00:00.000 AM 20008 36 6/7/12 12:00:00.000 AM 20009 101

I'm using a lookup table to enhance the data but I also want to see records from the lookup table which had no events. Something like this:

_time client_id Name count 6/7/12 12:00:00.000 AM 20006 Client A 123 6/8/12 12:00:00.000 AM 20006 Client A 117 6/7/12 12:00:00.000 AM 20008 Client B 36 6/8/12 12:00:00.000 AM 20008 Client B 0 6/7/12 12:00:00.000 AM 20009 Client C 101 6/8/12 12:00:00.000 AM 20009 Client C 0 6/7/12 12:00:00.000 AM 20010 Client D 0 6/8/12 12:00:00.000 AM 20010 Client D 0

I thought I'd be able to accomplish this by using inputlookup and left joining to the results from the original query

| inputlookup client_lookup.csv | JOIN type=left client_id [type="REST" resource="Order" status="FINISHED" | bucket span=1d _time | stats count by client_id, _time]

When I do this, the date from the subquery is lost from the output:

client_id Name count 20006 Client A 123 20008 Client B 36 20009 Client C 101 20010 Client D

I obviously have my head stuck in RDBMS land! I'm sure there's probably a better way of doing this or I'm missing something obvious but I'm a little stuck at the moment. Any help would be greatly appreciated.

sowings · ‎06-08-2012

Assuming a client_lookup.csv like:

client_id,Name
20006,Client A

You could do something like:

type="REST" resource="Order" status="FINISHED" | bucket span=1d _time 
| stats count by client_id, _time
| lookup client_lookup.csv client_id OUTPUT Name

No join needed.

mcantrell · ‎06-11-2012

OK.. I think I have it now. There is a max option to the join command which defaults to 1. I adjusted it to match the number of days I was searching for and the all of the records are showing up.

mcantrell · ‎06-09-2012

Instead of:

*sent client_id Name count *
6/7/12 12:00:00.000 AM 20006 Client A 123
6/8/12 12:00:00.000 AM 20006 Client A 117
6/7/12 12:00:00.000 AM 20008 Client B 36
6/8/12 12:00:00.000 AM 20008 Client B 0
6/7/12 12:00:00.000 AM 20009 Client C 101
6/8/12 12:00:00.000 AM 20009 Client C 0
6/7/12 12:00:00.000 AM 20010 Client D

6/8/12 12:00:00.000 AM 20010 Client D

It appears that it probably doesn't work exactly like it would in SQL. Maybe there's a better approach to accomplish the same end.

mcantrell · ‎06-09-2012

Hmmmm, renaming the field helps a little. The date is available now but something is weird in how it's joining the records. It's finding the first match for each client and then dumping the other dates.

Example:

*sent client_id Name count *
6/7/12 12:00:00.000 AM 20006 Client A 123
6/7/12 12:00:00.000 AM 20008 Client B 36
6/7/12 12:00:00.000 AM 20009 Client C 101
6/7/12 12:00:00.000 AM 20010 Client D

gkanapathy · ‎06-08-2012

Join is really not preferred, but if you want to mess with your original query, then rename the _time field to something not beginning with an _, then name it back after the join.

mcantrell · ‎06-08-2012

The reason for the join is so that I also display the clients which don't have any matching events. Without the left join, they drop off the report.

Ultimately, I want to draw attention to clients which have low or no activity. I realize this is sort of the reverse of what splunk is meant to do but I've never been shy of driving a square peg through a round hole 😉

Time from subquery lost when joining to lookup table

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!