I am trying to find a better way of doing the following search:
<Search_all_proxies>
[|inputlookup list_of_naughty_urls.csv|table URL | rename URLS as proxy_url]
I have looked at:
http://docs.splunk.com/Documentation/Splunk/6.5.3/SearchReference/Lookup
But I think there is an issue with the way that transforms.conf is set up on my system, and they way it is set up elsewhere? I am not an admin, so modifying that file is not, at this moment, an option for me.
I ran into:
https://answers.splunk.com/answers/455463/how-to-match-my-search-with-a-lookup.html?utm_source=typea...
Which is where I got the idea for the above search. And, it does work. But there has to be a more elegant way of doing this. I have tried:
<Search_all_proxies>
[| lookup list_of_naughty_urls.csv URLS as proxy_url]
But that did not return any results. I would have thought that it would, based on the documentation. I thought that having two lists of values in proxy_url would use the default AND, resulting in a list that only has urls that are in both lists. (I have checked, and there are values in both lists that correlate, as the first search listed does produce results. [same time frame, etc])
Anyone see where the second search might be going wrong? Or is the first search a good way to go? (It is functional...)
There are two general ways to achieve what you are trying to do... the "normal lookup" way and the "subsearch" way. Either is fine, but the more Splunky solution is probably just the "normal lookup" way... Let's break down how each way works and differs
The Subsearch way
What you're doing with "[| inputlookup list_of_naughty_urls.csv | table URL]" is a subsearch whose results are pulled from the lookup file. (as opposed to say another index or dataset) The behavior of subsearches when they are used as you are using them is to run first, and return the results as a giant OR statement which is then fed to the outer search. Because your final subsearch results is a table of field URL and a set of values... your resulting giant OR statement will look something like this
( (URL="some value") OR (URL="some other value") OR (URL="again some other value") )
To see this behavior in action you can leverage the | format command. So in a separate search window run this
| inputlookup list_of_naughty_urls.csv | table URL | format
and you'll see the big old OR statement that's created and fed to the main search. For the main search to work as you'd expect you had better have a field in your dataset called URL or the OR statement won't work
The Normal Lookup way
The answers above are basically telling you to do a normal lookup using the | lookup command where splunk will iterate through each event and append one or more fields to each event depending on if the lookup matches results and what the contents of the lookup file are. Let's assume for a moment your lookup looks something like this
URL, is_bad
www.google.com, 0
www.botnet.badguys.net, 1
A normal lookup would add the is_bad field for every URL that matches in your proxy dataset to the URL in the lookup file presuming your search was structured something like this
some_proxy_search
| lookup list_of_naughty_urls.csv URL as url_field_name_from_event OUTPUT is_bad
| search is_bad=1
The | search is_bad=1 allows you to filter down to events that have a naughty URL as defined by your lookup CSV
There are two general ways to achieve what you are trying to do... the "normal lookup" way and the "subsearch" way. Either is fine, but the more Splunky solution is probably just the "normal lookup" way... Let's break down how each way works and differs
The Subsearch way
What you're doing with "[| inputlookup list_of_naughty_urls.csv | table URL]" is a subsearch whose results are pulled from the lookup file. (as opposed to say another index or dataset) The behavior of subsearches when they are used as you are using them is to run first, and return the results as a giant OR statement which is then fed to the outer search. Because your final subsearch results is a table of field URL and a set of values... your resulting giant OR statement will look something like this
( (URL="some value") OR (URL="some other value") OR (URL="again some other value") )
To see this behavior in action you can leverage the | format command. So in a separate search window run this
| inputlookup list_of_naughty_urls.csv | table URL | format
and you'll see the big old OR statement that's created and fed to the main search. For the main search to work as you'd expect you had better have a field in your dataset called URL or the OR statement won't work
The Normal Lookup way
The answers above are basically telling you to do a normal lookup using the | lookup command where splunk will iterate through each event and append one or more fields to each event depending on if the lookup matches results and what the contents of the lookup file are. Let's assume for a moment your lookup looks something like this
URL, is_bad
www.google.com, 0
www.botnet.badguys.net, 1
A normal lookup would add the is_bad field for every URL that matches in your proxy dataset to the URL in the lookup file presuming your search was structured something like this
some_proxy_search
| lookup list_of_naughty_urls.csv URL as url_field_name_from_event OUTPUT is_bad
| search is_bad=1
The | search is_bad=1 allows you to filter down to events that have a naughty URL as defined by your lookup CSV
If you are using lookup you need to specify the input field (the search field and related event field). Then you can add OUTPUT URL as proxy_url
to rename the looked up field.
If you want to Know if a lookup url is present in tour events use something like this
Your_search [| inputlookup your_lookup | fields url ] | stats count by url
Taking attention that url field name is the same both in lookup and logs.
Bye.
Giuseppe
That does not appear to work. When I use:
<Search_all_proxies>
[| inputlookup list_of_naughty_urls.csv | table URL]
I do not get any results. Also, shouldn't I be trying to check against a specific field, and not against _raw?
For clarity, when I run:
| inputlookup list_of_naughty_urls.csv | table URL
I do get results.
I also created a list_of_naughty_urls_test.csv, where I added google.com - so I could make sure I was getting results, and I am.
Also, shouldn't I be trying to check against a specific field, and not against _raw?
Yes, you should be. The field you will be checking against will be the name of the field from your subsearch, which in your example above is URL. I'm also fairly certain that field must be present and extracted via props/transforms for your subsearch results filtering to work.
Hi stakor,
sorry but I don't understand your question:
your problem is to automatically populate your lookup?
if this is your need try something like this:
your search
| table proxy_url
| outputlookup your_lookup
If instead you want to manually modify a lookup, use Lookup Editor App.
Bye.
Giuseppe
Hi stakor,
sorry but I don't understand your question:
your problem is to automatically populate your lookup?
if this is your need try something like this:
your search
| table proxy_url
| outputlookup your_lookup
If instead you want to manually modify a lookup, use Lookup Editor App.
Bye.
Giuseppe
I am not looking to create a new lookup. I am attempting to use an existing lookup table, to see if anything listed in it, is being seen by Splunk. At some point in the future, maybe to alert if traffic is seen going to one of the sites listed in the lookup table.