Splunk Search

Can't run "lookup" on a lookup table created by "outputlookup"

samjenk_2
Explorer

About my Environment

Everything here is run using Splunk 6.4.2.

The Problem

I need to correlate session IDs and IP addresses between two sets of
data. It involves:

  • Finding the session IDs (sid) and source IPs (src_ip) from the first
    set of data.

  • Finding those same session IDS (sid) in the second set of data, even
    if they don't match the src_ip from the first set of data.

  • Yielding events from the second dataset with the fields sid, src_ip
    and first_ip, where sid is the same between both data sets, src_ip
    is unique to the second data set, and first_ip is the value of the
    sid's src_ip from the first data set.

I've come very close to a resolution using the following pipeline:

(a) pull the sid from the first set as a subsearch on the second set
(b) create a sid/first_ip pairing in a lookup table "mylookup.csv"
(c) perform a lookup on mylookup.csv to match sid to first_ip

Here's the search I've tried:

(01) {query for second set} [search {query for first set} | rename src_ip as first_ip | table sid,first_ip | outputlookup mylookup.csv | fields sid] | lookup mylookup.csv sid OUTPUT first_ip

Search (01) runs fine without the "lookup" clause, in that it returns
all of the events from the second data set with the same sid as those
in the first. When I run the search as written, though, I get the
error:

(02) Error in 'lookup' command: The lookup table 'mylookup.csv' does not exist or is not available.

What's strange is that I know the lookup must exist, because after
running search (01), I can retrieve the table's contents using the
following command [1]:

(03) | inputlookup mylookup.csv

Things I've Tried

I tried using mylookup.csv to lookup sid as another field [2], like this:

(05) {query} | lookup mylookup.csv sid AS my_sid.

And that returned the same does not exists/not available error.

I've even tried first running a search that creates mylookup.csv, then
running a search to perform a lookup on mylookup.csv, like this:

(04a) {query for first set} | rename src_ip as first_ip | table sid,first_ip | outputlookup mylookup.csv

(04b) {query for second set} [ search {query for first set} | fields sid ] | lookup mylookup.csv sid OUTPUT first_ip

Search (04a) completes, but I still get the same error at (02) when I
run the search at (04b).

I've checked the "Exploring Splunk" book, my Splunk training material,
and answers.splunk.com and haven't found anything else explictly
talking about using the lookup table created by outputlookup, just how
to create the lookup table.

Questions

(A) Is there a canonical way of referencing lookups that you've
created using outputlookup that I'm missing? Do I need to create a
lookup definition for the lookup table I create, or is
mylookup.csv sufficient?

(B) Is there a better way to perform the kind of correlation I want? I
haven't tried the KV Store yet, as I'd like to know that I can use
the output of outputlookup first.

Thanks!

[1] https://answers.splunk.com/answers/144139/how-do-i-search-a-csv-file-created-via-outputlookup.html
[2] https://answers.splunk.com/answers/54165/lookup-use-without-lookup-definition.html

0 Karma

samjenk_2
Explorer

Thanks again to maciep and sundareshr for their input. I've gotten a little closer on this issue, this time revisiting the notion of breaking my query into parts. I can get the behavior I want by doing the following:

(1) {query for first set} | rename src_ip as first_ip | table sid,first_ip | outputlookup mylookup.csv

(2) {query for second set} [| inputlookup mylookup.csv | fields sid] | lookup mylookup.csv sid OUTPUT first_ip

This approach is a variation on the one I tried in my first post, but with the subsearch based on the lookup table itself, rather than running (1) as a subsearch. This allows me to make first_ip a field for every event in the second set matching on sid, with which I can then treat as any other field with stats, timechart, and the like. It'd still be nice to be able to do this in one search, though.

0 Karma

sundareshr
Legend

You may be able to accomplish this without any subsearches or lookup. Try something like this

(index=firstset OR index=secondset) | chart values(src_ip) as ip over sid by index | where <<however you want to filter it>>
0 Karma

samjenk_2
Explorer

Thanks for the suggestion, sundareshr. The query you suggest is about 3 times slower than maciep's, but I like how it rolls up all src_ip values for a given index, sort of like the list() function.

0 Karma

maciep
Champion

Not sure why your lookup isn't working, but you might be able to accomplish the same thing with join. Something like this:

{your first search} | rename src_ip as first_ip | table sid first_ip | join type=left sid [search {your second search} | table sid src_ip]

This should join your two data sets together by sid

0 Karma

samjenk_2
Explorer

Thank you, maciep, for the suggestion. This search does display the information I want as a chart, so thanks for that.

I'm still chasing how to do this with a lookup table so that the first_ip field would be available for piping to stats and other summary commands. Have you been able to run queries on dynamically-generated lookups with 'outputlookup' before?

0 Karma

maciep
Champion

hey samjenk, glad that worked. I have been able to use a lookup table created by outputlookup. I'm not sure I have ever tried using one from the same search where I created it, but seems like it should work. Of course, not sure if Splunk does any optimizing behind the scenes resulting in the lookup search running before the outputlookup runs?

But it looks like you tried to do them in separate searches too. And you were able to display it with inputlookup. So I have no idea why that wouldn't work. When I first saw your post, I tried on my home machine (6.4.x). And was able to successfully use the lookup command against a table I had just created with outputlookup.

I would suggest abandoning the search you're working on and see if you can just do it with a simple example. And then troubleshoot from there. For example, I think I used something like this to create mine:

index=_internal earliest=-60m | stats count by component | outputlookup test.csv
0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...