I have a file (servers.csv) with a set of server addresses, e.g.
1.2.3.4
4.3.2.1
5.6.7.8
I uploaded the file, and I am trying to use an inputlookup to find relevant logs to any address. My query does not work:
index="palo_logs" [|inputlookup servers.csv | return src_ip ]
The columns on my csv file are:
src_ip,servername
I can't for the life of me have the query populate, but when I search for the src_ip independently, it shows up.
As already mentioned return by default only returns the first result, but it can take a parameter of the maximum number of results to process. Additionally there are three different field formats it can take... if you use return src_ip
this returns sets of src_ip="value"
in your parent search, or'ed together. You can call the field something else in the parent search with aliasing syntax return ip=src_ip
this results in ip="value"
or even return raw strings with return $src_ip
which results in just value
and these can be used together:
So a few options for you:
index="palo_logs" [inputlookup servers.csv | return 100000000000 $src_ip]
Most basic... just return a bunch of ored values... and I really hope your lookup isn't really that big.
index="palo_logs" [inputlookup servers.csv | eval src_ip="\"".src_ip."\"" | return 100000000000 $src_ip]
Adding some string quoting to the raw field values...
index="palo_logs" [inputlookup servers.csv | return 100000000000 ip=src_ip]
Assuming the field you have is actually an extracted field in your palo_logs called ip instead of src_ip
index="palo_logs" [inputlookup servers.csv| eval OR="OR" | return 100000000000 src_ip $OR dest_ip=src_ip]
Assuming you have two fields, src_ip and dest_ip extracted in your palo logs that you want to match against.
As mentioned before you can also just run the subsearch as a search (remember leading pipe characters) to see what it would insert into your parent search, for some random simulated values though, makeresults and eval random do in a pinch... e.g.
| makeresults count=100 | eval src_ip=random().".".random().".".random().".".random()| eval OR="OR" | return 100000000000 src_ip $OR dest_ip=src_ip
EDIT TO ADD: As I think about it, instead of arbitrary large number in the return you could use what we know, channel Xzibit, and put a subsearch in your subsearch in true meme form... For example the first option would be instead:
index="palo_logs" [inputlookup servers.csv | return [inputlookup servers.csv | stats count | return $count] $src_ip]
All of these answers are excellent, but just to add a little more colour to subsearches ...
Any time you use a subsearch, think of it like backticks in the unix shell. (If you don't get that reference then look at https://unix.stackexchange.com/questions/27428/what-does-backquote-backtick-mean-in-commands ) So the thing inside the subsearch runs, and then its output textually replaces the subsearch itself. (Commands like foreach
, apppend
, join
, etc that use subsearch syntax don't necessarily apply here. This is the context of using a subsearch as part of search criteria)
Subsearches normally return field-value pairs. You can cheat to see what a subsearch is going to return by running it in conjunction with the format
command.
Like suppose I had a lookup file like your example:
src_ip,servername
1.2.3.4,server1
5.6.7.8,server2
9.10.11.12,server3
If I do a | inputlookup serverlist.csv
then I see the CSV file itself as a search result. But if I do a | inputlookup serverlist.csv | format
then I get something different / more interesting ...
( ( servername="server1" AND src_ip="1.2.3.4" ) OR ( servername="server2" AND src_ip="5.6.7.8" ) OR ( servername="server3" AND src_ip="9.10.11.12" ) )
This gives me a picture of what a search like:
sourcetype=pan:traffic index=firewalls [ | inputlookup serverlist.csv ]
Will look more like this after the subsearch has returned:
sourcetype=pan:traffic index=firewalls ( ( servername="server1" AND src_ip="1.2.3.4" ) OR ( servername="server2" AND src_ip="5.6.7.8" ) OR ( servername="server3" AND src_ip="9.10.11.12" ) )
Ultimately, this probably won't return ANYTHING because pan:traffic
likely does not have a field named servername
. What if I instead ran:
sourcetype=pan:traffic index=firewalls [ | inputlookup serverlist.csv | fields src_ip ]
The return
command works similarly to format
and outside of a subsearch will give you an idea of what it is up to as well. Compare:
| inputlookup serverlist.csv | return src_ip
to
| inputlookup serverlist.csv | return 1000 src_ip
One other thing that is interesting about subsearches is that return
is not required! If you don't call return
then as I demonstrated above, some default functionality happens. Another piece of default functionality includes specially named fields. If your subsearch outputs a field named query
then the field name disappears from the subsearch output., like so:
| inputlookup serverlist.csv | fields src_ip | rename src_ip as query | format
will return
( ( "1.2.3.4" ) OR ( "5.6.7.8" ) OR ( "9.10.11.12" ) )
"Look ma, no field names!"
Hopefully this helps you peer a little more into how your subsearches are working....
As already mentioned return by default only returns the first result, but it can take a parameter of the maximum number of results to process. Additionally there are three different field formats it can take... if you use return src_ip
this returns sets of src_ip="value"
in your parent search, or'ed together. You can call the field something else in the parent search with aliasing syntax return ip=src_ip
this results in ip="value"
or even return raw strings with return $src_ip
which results in just value
and these can be used together:
So a few options for you:
index="palo_logs" [inputlookup servers.csv | return 100000000000 $src_ip]
Most basic... just return a bunch of ored values... and I really hope your lookup isn't really that big.
index="palo_logs" [inputlookup servers.csv | eval src_ip="\"".src_ip."\"" | return 100000000000 $src_ip]
Adding some string quoting to the raw field values...
index="palo_logs" [inputlookup servers.csv | return 100000000000 ip=src_ip]
Assuming the field you have is actually an extracted field in your palo_logs called ip instead of src_ip
index="palo_logs" [inputlookup servers.csv| eval OR="OR" | return 100000000000 src_ip $OR dest_ip=src_ip]
Assuming you have two fields, src_ip and dest_ip extracted in your palo logs that you want to match against.
As mentioned before you can also just run the subsearch as a search (remember leading pipe characters) to see what it would insert into your parent search, for some random simulated values though, makeresults and eval random do in a pinch... e.g.
| makeresults count=100 | eval src_ip=random().".".random().".".random().".".random()| eval OR="OR" | return 100000000000 src_ip $OR dest_ip=src_ip
EDIT TO ADD: As I think about it, instead of arbitrary large number in the return you could use what we know, channel Xzibit, and put a subsearch in your subsearch in true meme form... For example the first option would be instead:
index="palo_logs" [inputlookup servers.csv | return [inputlookup servers.csv | stats count | return $count] $src_ip]
Hi @splunkybutfunky,
Which of the field from your csv file is mapped in your events logs? I am assuming its src_ip.
If yes then can you try below query if it helps?
index=palo_logs | lookup servers.csv src_ip AS OUTPUT servername AS servername
Try:
index=palo_logs [| inputlookup servers.csv | stats values(src_ip) AS src_ip | eval search=mvjoin(src_ip, “ OR “)]
Because the subsearch returns a field named search
, the contents of that field will be dropped inline into your search string, yielding something like index=palo_logs <ip1> OR <ip2>...
.
It’s worth noting that this may yield an inefficient search, and you may want to try this to make it more efficient:
index=palo_logs [| inputlookup servers.csv | eval src_ip=“TERM(“.src_ip.”) | stats values(src_ip) AS src_ip | eval search=mvjoin(src_ip, “ OR “)]
This was written up on a phone, so it probably has a typo or two or more.
Thanks I tried both. At this point (late at night) seeing double will revisit this.
I went back, renamed my CSV column names and got this working now
return
only returns the first result. To see this in action try running the portion of your search within the square brackets by itself.
Instead try
index=palo_logs [| inputlookup servers.csv | table src_ip]
The subsearch has a limit of 10k rows, which some work around by piping to format
. You may not need that, but adding it for completeness.
Thank you for the response. Ran the query and it shows no results:
0 of 25,791,763 events matched
But if I search any individual address, I do get back results. Servers file is:
1.1.1.1,servername1,
2.2.2.2,servername2,
Just odd because I have tried everything (Splunk Enterprise 6.5.3)
What iis your search string when you search individual addresses?
If it doesn’t include src_ip=<ip>
and instead is just index=palo_logs <ip1> OR <ip2>
then it makes sense why my search didn’t work. If that’s the case I’ll post a different answer to handle that.
index="palo_logs" 1.1.1.1
Works
index="palo_logs" 1.1.1.1 2.2.2.2 3.3.3.3
Works
Also index-"palo_logs" src_ip [|inputlookup servers.csv | table src_IP ]
Doesn't work
What does this return:
| inputlookup servers.csv | table src_ip
"Error in 'inputlookup' command. This command must be the first command of a search
index="palo_logs"
| inputlookup servers.csv | table src_ip
Run it exactly as I specified it in the previous comment, don’t add the index=
portion.
Shows everything I have in my csv file