Splunk Search

Join command for big data

amit97ee354
Explorer

I am trying to perform the join on different multi search for a set of time called "Before" and set of time called "After" to find out the request names counts that are only found in the "After" search. While running the query, the result set goes beyond 50,000 and whole query provides incorrect output. Is there any way to fix this issue?

index=xxx  earliest="-15m" latest="now"
| rex field=_raw "<RequestName>(?<Request_Name>.*?)</RequestName>" 
| rex field=_raw "<RequestSequenceNumber>(?<RequestSequence_Number>.*?)</RequestSequenceNumber>" 
| join Request_Name  [ |multisearch [search index=xxx earliest="-7d@w0" latest="@w0"|eval type="Before" ]
                 [search index=xxx  earliest="-15m" latest="now"|eval type="After"] 
    | rex field=_raw "<RequestName>(?<Request_Name>.*?)</RequestName>" 
    |dedup Request_Name,type
    | stats values(type) as Report_Key by Request_Name
    | mvcombine delim=" " Report_Key 
    | search Report_Key="After_Release"|table Request_Name]
| dedup  Request_Name | table Request_Name | stats count
Tags (2)

lguinn2
Legend

Thanks for your explanation. First, try this

 index=xxx earliest="-7d@w0" latest="now"
 | eval type=case(_time>=relative_time(now(),"-7d@w0") AND _time<=relative_time(now(),"@w0"),"Before",
                  _time>=relative_time(now(),"-15m","After",
                  1==1,"Other")
 | where type!="Other"
 | rex field=_raw "<RequestName>(?<Request_Name>.*?)</RequestName>" 
 | stats count(eval(type=="Before")) as Before_Count count(eval(type=="After")) as After_Count by Request_Name
 | where Before_Count = 0 | fields Request_Name

The above search will list only the requests where there were no incidents in the "before" category, only in the "after" category.
The second issue that you mention was that the time range of the search could vary. This is a different problem entirely. To deal with it, I suggest
1 - Enclose the search in a form. Use the form to force the user to pick/enter a particular date/time: the $release_timestamp$ token
Assume that the release_timestamp is a string in the form "11/12/2015:20:00:00"
2 - Restrict the search to the last 30 days overall.

 index=xxx earliest=-30d@d
 | eval release_epoch = strptime("$release_timestamp$","%m/%d/%Y:%H:%M:%S")
 | eval type=case( _time<=release_epoch AND _time>=release_epoch-(7*86400),"Before",
                  _time>=release_epoch AND _time<=release_epoch+(15*60),"After",
                  1==1,"Other")
 | where type!="Other"
 | rex field=_raw "<RequestName>(?<Request_Name>.*?)</RequestName>" 
 | stats count(eval(type=="Before")) as Before_Count count(eval(type=="After")) as After_Count by Request_Name
 | where Before_Count = 0 | fields Request_Name

The above search defines "Before" as "the 7 days before the release, and "After" as "the 15 minutes after the release." You can tweak the time ranges in the case function.

lguinn2
Legend

I'm not sure that I understand what you are trying to do, but I think this search might be close.

index=xxx earliest="-7d@w0" latest="now"
| eval type=case(_time>=relative_time(now(),"-7d@w0") AND _time<=relative_time(now(),"@w0"),"Before",
                 _time>=relative_time(now(),"-15m","After",
                 1==1,"Other")
| where type!="Other"
| rex field=_raw "<RequestName>(?<Request_Name>.*?)</RequestName>" 
| stats count(eval(type=="Before")) as Before_Count count(eval(type=="After")) as After_Count by Request_Name
| where After_Count > 0
| appendpipe [ stats count as Before_Count | eval Request_Name="TOTAL REQUEST COUNT" ]

The first part of the search examines the event time and categorizes the events into "Before", "After" and "Other" categories. Next, it eliminates all events in the "Other" category. It only extracts the Request_Name field, since your search didn't use the RequestSequence_Number field. Next, it counts all the events in the "Before" and "After" categories for each Request_Name. Because you said "to find out the request names counts that are only found in the 'After' search", the last command eliminates any requests that have no events in the "After" category. Finally, it appends a total count of the number of requests that meet the criteria.

If all you need is the total count, the search can be simplified to the following:

index=xxx earliest="-7d@w0" latest="now"
| eval type=case(_time>=relative_time(now(),"-7d@w0") AND _time<=relative_time(now(),"@w0"),"Before",
                 _time>=relative_time(now(),"-15m","After",
                 1==1,"Other")
| where type!="Other"
| rex field=_raw "<RequestName>(?<Request_Name>.*?)</RequestName>" 
| dedup Request_Name, type
| stats count(eval(type=="Before")) as Before_Count count(eval(type=="After")) as After_Count by Request_Name
| where After_Count > 0
| stats count

Neither of these searches uses a subsearch, so they will not be limited. They should also run much faster.

If this doesn't help, perhaps you could explain what you are trying to do, and help me understand better.
HTH

amit97ee354
Explorer

Lguinn

Sorry for the late reply.

I am basically trying to compare data within 2 time stamps, lets say 12/05/2015 to 12/10/2015 with 12/12/2015 to 12/14/2015, assuming our release is on 12/11/2015 night. What we need is the request name that are only found in the latter timeframe. So the normal logic is to get all the request names for former timestamps, get request names for latter timestamps, compare those, remove the ones that are matching with former timestamp, get the new ones (which are not matching) from the latter timestamp.

The search you provided is working up to certain limit but, I am finding following issues
1) What if the user selects a particular time frame or last 15 minutes. In that case, the "After" needs to be populated with AND _time<=relative_tim*e(now(),"now")* or AND _time<=relative_time(now(),"-10m") . The first one here does NOT work but the second one works fine.
2) The relative time field doesn't take EPOCH time -- following does not work

| eval type=case(_time>=relative_time(now(),"1449986400") AND _time<=relative_time(now(),"1450072800"),"Before",
_time>=relative_time(now(),"1450072800") AND _time<=relative_time(now(),"1450159200"),"After",
1==1,"Other")

Here is what I am doing to feed in the relative timestamps but it does not work as expected:
| eval type=case(_time>=relative_time(now(),"$field1.earliest$") AND _time<=relative_time(now(),"$field1.latest$"),"Before",
_time>=relative_time(now(),"$field2.earliest$") AND _time<=relative_time(now(),"$field2.latest$"),"After",
1==1,"Other")

Thanks for your help!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...