Splunk Search

Query with 1 subquery optimization ideas

justinjohn83
Explorer

I'm looking for ideas on how to possibly optimize this query. Right now I see two options A) Get faster hardware B) Index extracted fields (but would need to re-index all my data and this solution is brittle)

here is the search query:

index=dmca 
tag::eventtype="dmca_login" 
[ search index=dmca 
         tag::eventtype="dmca_traffic" AND 
         dmca_src_ip=155.123.64.18 AND 
         latest=02/28/2011:13:20:11 
  | head limit=1 
  | fields + dmca_priv_ip ] 
latest=02/28/2011:13:20:11 
| head limit=1 
| fields + dmca_cnet + dmca_src_ip + dmca_priv_ip + dmca_mac

the tag::eventtype="dmca_login" is a tag on an eventtype of a login with a private ip address. The tag::eventtype="dmca_traffic" is a tag on an eventtype identifying events with PAT traffic translating a public ip address + port to a private ip address.

The current hardware is a vm:

64-bit linux 2 CPUS each @2.66 GHZ 4 GB RAM 50 GB of network storage

Search appears to be cpu bound since splunkd ~ 160% cpu (200% is both cores) and Using 337 MB RAM. System has 160 MB free with no swap usage.

Any help is appreciated!

Thanks,

Justin

Tags (1)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

The above search, for most data sets, and assuming nothing "weird" in the eventtypes or the tags should run pretty quickly. The only thing that really stands out for me is perhaps the network storage. If you can move your index to a local disk, it may help. Otherwise, you may have a poorly configured index, but even this shouldn't make it that bad. Also, I'd be interested in hearing just how slow this runs. I can't imagine it takes more than a few seconds to return results, even including browser rendering (except again for the caveats I've mentioned).

0 Karma

justinjohn83
Explorer

tags.conf

######### cvpn

[eventtype=cvpn-login]
dmca_login = enabled
dmca_traffic = enabled

#radius

[eventtype=radius-login]
dmca_login = enabled

[eventtype=radius-logout]
dmca_logout = enabled

[eventtype=pat]
dmca_traffic = enabled

#### captive portal

[eventtype=captiveportal-login]
dmca_login = enabled
dmca_traffic = enabled

[eventtype=captiveportal-logout]
dmca_logout = enabled
dmca_traffic = enabled

#### dhcp

[eventtype=dhcp-ack]
dmca_mac = enabled

0 Karma

justinjohn83
Explorer

[pat]
search = sourcetype="pat"

[radius-login]
search = sourcetype = "radius" AND ";login;"

[radius-logout]
search = sourcetype = "radius" AND ";logout;"

[captiveportal-login]
search = sourcetype = "perfigo" AND (";login;" OR (NOT ";login" AND NOT ";logout;"))

[captiveportal-logout]
search = sourcetype = "perfigo" AND ";logout;"

[cvpn-login]
search = sourcetype = "cvpn"

[dhcp-ack]
search = sourcetype = "dhcp"

0 Karma

justinjohn83
Explorer

The total number of indexed events is about 2 billion with about 95% of them identified as "dmca_traffic." The search is run on demand using the splunk web service api with the actual search parameters substituted in. I put my own front end web service in front of the splunk one. The search itself takes at least 20 secs.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Also, some information about the size of the index and number of total events would also help.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

it would be helpful if you also posted the definitions of the eventtypes as well. Also, I will assume that the given dmca_src_ip and the latest are variable, and that these searches are just run on demand (e.g., when you get a DMCA investigation request).

0 Karma

proctorgeorge
Path Finder

Hey Justin,

A few questions about the search:

Does it need to be optimized because the result set is very large or because it is slow to produce a few results?

Is this a scheduled search or part of a dashboard or just randomly used saved search? i.e. whats the use scenario/case?

When are you running this search?

Possible Solution:

If it is a search that is done once a day and traverses a large data set you could try using a Summary Index to split the load throughout the day.

Summary Indexing Wiki

0 Karma

justinjohn83
Explorer

Basically the returned result set is either 0 or 1 record as I limit to 1, but the number of matching results can be quite large, especially for the sub-query. The total number of indexed events is about 2 billion with about 95% of them identified as "dmca_traffic." The search is run on demand using the splunk web service api with the actual search parameters substituted in. I put my own front end web service in front of the splunk one. The search itself takes at least 20 secs.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...