Splunk Search

Query with 1 subquery optimization ideas

justinjohn83
Explorer

I'm looking for ideas on how to possibly optimize this query. Right now I see two options A) Get faster hardware B) Index extracted fields (but would need to re-index all my data and this solution is brittle)

here is the search query:

index=dmca 
tag::eventtype="dmca_login" 
[ search index=dmca 
         tag::eventtype="dmca_traffic" AND 
         dmca_src_ip=155.123.64.18 AND 
         latest=02/28/2011:13:20:11 
  | head limit=1 
  | fields + dmca_priv_ip ] 
latest=02/28/2011:13:20:11 
| head limit=1 
| fields + dmca_cnet + dmca_src_ip + dmca_priv_ip + dmca_mac

the tag::eventtype="dmca_login" is a tag on an eventtype of a login with a private ip address. The tag::eventtype="dmca_traffic" is a tag on an eventtype identifying events with PAT traffic translating a public ip address + port to a private ip address.

The current hardware is a vm:

64-bit linux 2 CPUS each @2.66 GHZ 4 GB RAM 50 GB of network storage

Search appears to be cpu bound since splunkd ~ 160% cpu (200% is both cores) and Using 337 MB RAM. System has 160 MB free with no swap usage.

Any help is appreciated!

Thanks,

Justin

Tags (1)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

The above search, for most data sets, and assuming nothing "weird" in the eventtypes or the tags should run pretty quickly. The only thing that really stands out for me is perhaps the network storage. If you can move your index to a local disk, it may help. Otherwise, you may have a poorly configured index, but even this shouldn't make it that bad. Also, I'd be interested in hearing just how slow this runs. I can't imagine it takes more than a few seconds to return results, even including browser rendering (except again for the caveats I've mentioned).

0 Karma

justinjohn83
Explorer

tags.conf

######### cvpn

[eventtype=cvpn-login]
dmca_login = enabled
dmca_traffic = enabled

#radius

[eventtype=radius-login]
dmca_login = enabled

[eventtype=radius-logout]
dmca_logout = enabled

[eventtype=pat]
dmca_traffic = enabled

#### captive portal

[eventtype=captiveportal-login]
dmca_login = enabled
dmca_traffic = enabled

[eventtype=captiveportal-logout]
dmca_logout = enabled
dmca_traffic = enabled

#### dhcp

[eventtype=dhcp-ack]
dmca_mac = enabled

0 Karma

justinjohn83
Explorer

[pat]
search = sourcetype="pat"

[radius-login]
search = sourcetype = "radius" AND ";login;"

[radius-logout]
search = sourcetype = "radius" AND ";logout;"

[captiveportal-login]
search = sourcetype = "perfigo" AND (";login;" OR (NOT ";login" AND NOT ";logout;"))

[captiveportal-logout]
search = sourcetype = "perfigo" AND ";logout;"

[cvpn-login]
search = sourcetype = "cvpn"

[dhcp-ack]
search = sourcetype = "dhcp"

0 Karma

justinjohn83
Explorer

The total number of indexed events is about 2 billion with about 95% of them identified as "dmca_traffic." The search is run on demand using the splunk web service api with the actual search parameters substituted in. I put my own front end web service in front of the splunk one. The search itself takes at least 20 secs.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Also, some information about the size of the index and number of total events would also help.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

it would be helpful if you also posted the definitions of the eventtypes as well. Also, I will assume that the given dmca_src_ip and the latest are variable, and that these searches are just run on demand (e.g., when you get a DMCA investigation request).

0 Karma

proctorgeorge
Path Finder

Hey Justin,

A few questions about the search:

Does it need to be optimized because the result set is very large or because it is slow to produce a few results?

Is this a scheduled search or part of a dashboard or just randomly used saved search? i.e. whats the use scenario/case?

When are you running this search?

Possible Solution:

If it is a search that is done once a day and traverses a large data set you could try using a Summary Index to split the load throughout the day.

Summary Indexing Wiki

0 Karma

justinjohn83
Explorer

Basically the returned result set is either 0 or 1 record as I limit to 1, but the number of matching results can be quite large, especially for the sub-query. The total number of indexed events is about 2 billion with about 95% of them identified as "dmca_traffic." The search is run on demand using the splunk web service api with the actual search parameters substituted in. I put my own front end web service in front of the splunk one. The search itself takes at least 20 secs.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...