Splunk Search

More Elegant way to do a Lookup?

stakor
Path Finder

I am trying to find a better way of doing the following search:

<Search_all_proxies>
[|inputlookup list_of_naughty_urls.csv|table URL | rename URLS as proxy_url] 

I have looked at:
http://docs.splunk.com/Documentation/Splunk/6.5.3/SearchReference/Lookup
But I think there is an issue with the way that transforms.conf is set up on my system, and they way it is set up elsewhere? I am not an admin, so modifying that file is not, at this moment, an option for me.

I ran into:
https://answers.splunk.com/answers/455463/how-to-match-my-search-with-a-lookup.html?utm_source=typea...

Which is where I got the idea for the above search. And, it does work. But there has to be a more elegant way of doing this. I have tried:

<Search_all_proxies>
[| lookup list_of_naughty_urls.csv URLS as proxy_url]

But that did not return any results. I would have thought that it would, based on the documentation. I thought that having two lists of values in proxy_url would use the default AND, resulting in a list that only has urls that are in both lists. (I have checked, and there are values in both lists that correlate, as the first search listed does produce results. [same time frame, etc])

Anyone see where the second search might be going wrong? Or is the first search a good way to go? (It is functional...)

0 Karma
1 Solution

jwiedemann_splu
Splunk Employee
Splunk Employee

There are two general ways to achieve what you are trying to do... the "normal lookup" way and the "subsearch" way. Either is fine, but the more Splunky solution is probably just the "normal lookup" way... Let's break down how each way works and differs

The Subsearch way
What you're doing with "[| inputlookup list_of_naughty_urls.csv | table URL]" is a subsearch whose results are pulled from the lookup file. (as opposed to say another index or dataset) The behavior of subsearches when they are used as you are using them is to run first, and return the results as a giant OR statement which is then fed to the outer search. Because your final subsearch results is a table of field URL and a set of values... your resulting giant OR statement will look something like this

( (URL="some value") OR (URL="some other value") OR (URL="again some other value") )

To see this behavior in action you can leverage the | format command. So in a separate search window run this

| inputlookup list_of_naughty_urls.csv | table URL | format

and you'll see the big old OR statement that's created and fed to the main search. For the main search to work as you'd expect you had better have a field in your dataset called URL or the OR statement won't work

The Normal Lookup way
The answers above are basically telling you to do a normal lookup using the | lookup command where splunk will iterate through each event and append one or more fields to each event depending on if the lookup matches results and what the contents of the lookup file are. Let's assume for a moment your lookup looks something like this

URL, is_bad
www.google.com, 0
www.botnet.badguys.net, 1

A normal lookup would add the is_bad field for every URL that matches in your proxy dataset to the URL in the lookup file presuming your search was structured something like this

some_proxy_search
| lookup list_of_naughty_urls.csv URL as url_field_name_from_event OUTPUT is_bad
| search is_bad=1

The | search is_bad=1 allows you to filter down to events that have a naughty URL as defined by your lookup CSV

View solution in original post

jwiedemann_splu
Splunk Employee
Splunk Employee

There are two general ways to achieve what you are trying to do... the "normal lookup" way and the "subsearch" way. Either is fine, but the more Splunky solution is probably just the "normal lookup" way... Let's break down how each way works and differs

The Subsearch way
What you're doing with "[| inputlookup list_of_naughty_urls.csv | table URL]" is a subsearch whose results are pulled from the lookup file. (as opposed to say another index or dataset) The behavior of subsearches when they are used as you are using them is to run first, and return the results as a giant OR statement which is then fed to the outer search. Because your final subsearch results is a table of field URL and a set of values... your resulting giant OR statement will look something like this

( (URL="some value") OR (URL="some other value") OR (URL="again some other value") )

To see this behavior in action you can leverage the | format command. So in a separate search window run this

| inputlookup list_of_naughty_urls.csv | table URL | format

and you'll see the big old OR statement that's created and fed to the main search. For the main search to work as you'd expect you had better have a field in your dataset called URL or the OR statement won't work

The Normal Lookup way
The answers above are basically telling you to do a normal lookup using the | lookup command where splunk will iterate through each event and append one or more fields to each event depending on if the lookup matches results and what the contents of the lookup file are. Let's assume for a moment your lookup looks something like this

URL, is_bad
www.google.com, 0
www.botnet.badguys.net, 1

A normal lookup would add the is_bad field for every URL that matches in your proxy dataset to the URL in the lookup file presuming your search was structured something like this

some_proxy_search
| lookup list_of_naughty_urls.csv URL as url_field_name_from_event OUTPUT is_bad
| search is_bad=1

The | search is_bad=1 allows you to filter down to events that have a naughty URL as defined by your lookup CSV

s2_splunk
Splunk Employee
Splunk Employee

If you are using lookup you need to specify the input field (the search field and related event field). Then you can add OUTPUT URL as proxy_url to rename the looked up field.

0 Karma

gcusello
SplunkTrust
SplunkTrust

If you want to Know if a lookup url is present in tour events use something like this
Your_search [| inputlookup your_lookup | fields url ] | stats count by url
Taking attention that url field name is the same both in lookup and logs.
Bye.
Giuseppe

0 Karma

stakor
Path Finder

That does not appear to work. When I use:

<Search_all_proxies>
[| inputlookup list_of_naughty_urls.csv | table URL]

I do not get any results. Also, shouldn't I be trying to check against a specific field, and not against _raw?

For clarity, when I run:

| inputlookup list_of_naughty_urls.csv | table URL

I do get results.

I also created a list_of_naughty_urls_test.csv, where I added google.com - so I could make sure I was getting results, and I am.

0 Karma

jwiedemann_splu
Splunk Employee
Splunk Employee

Also, shouldn't I be trying to check against a specific field, and not against _raw?
Yes, you should be. The field you will be checking against will be the name of the field from your subsearch, which in your example above is URL. I'm also fairly certain that field must be present and extracted via props/transforms for your subsearch results filtering to work.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi stakor,
sorry but I don't understand your question:
your problem is to automatically populate your lookup?
if this is your need try something like this:

your search
| table proxy_url
| outputlookup your_lookup

If instead you want to manually modify a lookup, use Lookup Editor App.

Bye.
Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi stakor,
sorry but I don't understand your question:
your problem is to automatically populate your lookup?
if this is your need try something like this:

your search
| table proxy_url
| outputlookup your_lookup

If instead you want to manually modify a lookup, use Lookup Editor App.

Bye.
Giuseppe

0 Karma

stakor
Path Finder

I am not looking to create a new lookup. I am attempting to use an existing lookup table, to see if anything listed in it, is being seen by Splunk. At some point in the future, maybe to alert if traffic is seen going to one of the sites listed in the lookup table.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...