Splunk Search

How can I do multiple lookups from 2 additional sources?

jcachosousa
Explorer

Hi,

I am looking for a way to efficiently set up multiple lookups (or ideally a more efficient function) within one search, and I am having a hard time figuring it out.

I have a key field between source A and source B, and I need to get a value from source C based on the lookup result from source B. I am also using append within my search. This is basically what my current search looks like without the lookup:

source=A earliest=-0mon@mon latest=now |eval ReportKey="This month" 
| append [search source=A earliest=-12mon@mon latest=-11mon@mon
        | eval ReportKey="Last year" 
        | eval _time=_time+60*60*24*365]
| append [search source=A earliest=-12mon@mon latest=-52w
        | eval ReportKey="Last year by now"
        | eval _time=_time+60*60*24*365]
| bucket _time span=1d
| chart count(variableX) by LookUpField,ReportKey
| sort -"Last year", -"This month"
| table LookUpField, "Last year", "This month", "Last year by now"

What I am trying to do is get LookUpField to be the relevant value from source C.

Any help would be greatly appreciated.

0 Karma
1 Solution

DalJeanis
Legend

1) You are doing WAY too much work manually with those appends. Just grab all the records you need and assign them multiple values depending on what they are.

 index=foo source=bar earliest=-12mon@mon
 | rename COMMENT as "limit to fields we need"
 | fields _time variableX ... fields needed for lookups....

 | rename COMMENT as "Throw away records in the middle months"
 | where _time <= relative_time(now(),"-11mon@mon") OR _time >= relative_time(now(),"@mon")  
 | bucket _time span=1d

 | rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey"
 | lookup myfirstlookup keyFieldInLookup1 as keyField1inEvent OUTPUTNEW LookupResult
 | lookup mysecondlookup keyFieldInLookup2 as keyField2inEvent OUTPUTNEW LookupResult

 | rename COMMENT as "Create two records for the year ago month to date, one record otherwise"
 | eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"),
      _time <= now() - 3000000, "Last year",
      true(),"this month"
  | mvexpand ReportKey
  | chart count(variableX) by LookupResult,ReportKey 

2) Technically that 3 million number could be 2678400 seconds, the number of seconds in a 31 day month, but since the middle 11 months are dropped, the exactness is not needed. If exactness was needed, then you'd actually need to calculate the date anyway.

3) You are binning the _time but not using it. I believe that may be left over from an attempted timechart, or other aggregation. I left it in and moved it where it should be.

4) Efficiency for where in the code the lookup needs to go will depend on what fields are being used for the lookups. The binning and aggregation should probably be done before the lookups. The last two sections would then look like this...

| fillnull value="" keyField1inEvent keyField2inEvent 
| stats count(variableX) as xCount  by _time keyField1inEvent keyField2inEvent 

 | rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey"
 | lookup myfirstlookup keyFieldInLookup1 as keyField1inEvent OUTPUTNEW LookupResult
 | lookup mysecondlookup keyFieldInLookup2 as keyField2inEvent OUTPUTNEW LookupResult

 | rename COMMENT as "Create two records for the year ago month to date, one record otherwise"
 | eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"),
      _time <= now() - 3000000, "Last year",
      true(),"this month"
  | mvexpand ReportKey
  | chart sum(xCount) by LookupResult,ReportKey 

Okay, now that we've straightened up your base search, here's the answer to your "how do I connect three different types of records that have two different keys" question...

 (your search that gets sourcetype=A) OR
 (your search that gets sourcetype=B) OR
 (your search that gets sourcetype=C)
| fields sourcetype ... keyAB keyBC fieldC ... list all the fields you need)

| rename COMMENT as "Roll the data from C to B based on keyBC, then throw away C"
| eventstats values(fieldC) as fieldC by keyBC
| where sourcetype!="C"

| rename COMMENT as "Roll the data from B to A based on keyAB, then throw away B"
| eventstats values(fieldC) as fieldC by keyAB
| where sourcetype="A"

How to connect these two sets of pseudocode will depend on the characteristics of the data in sourcetype B and C. If you need more help, you'll need to give us more information.

View solution in original post

DalJeanis
Legend

1) You are doing WAY too much work manually with those appends. Just grab all the records you need and assign them multiple values depending on what they are.

 index=foo source=bar earliest=-12mon@mon
 | rename COMMENT as "limit to fields we need"
 | fields _time variableX ... fields needed for lookups....

 | rename COMMENT as "Throw away records in the middle months"
 | where _time <= relative_time(now(),"-11mon@mon") OR _time >= relative_time(now(),"@mon")  
 | bucket _time span=1d

 | rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey"
 | lookup myfirstlookup keyFieldInLookup1 as keyField1inEvent OUTPUTNEW LookupResult
 | lookup mysecondlookup keyFieldInLookup2 as keyField2inEvent OUTPUTNEW LookupResult

 | rename COMMENT as "Create two records for the year ago month to date, one record otherwise"
 | eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"),
      _time <= now() - 3000000, "Last year",
      true(),"this month"
  | mvexpand ReportKey
  | chart count(variableX) by LookupResult,ReportKey 

2) Technically that 3 million number could be 2678400 seconds, the number of seconds in a 31 day month, but since the middle 11 months are dropped, the exactness is not needed. If exactness was needed, then you'd actually need to calculate the date anyway.

3) You are binning the _time but not using it. I believe that may be left over from an attempted timechart, or other aggregation. I left it in and moved it where it should be.

4) Efficiency for where in the code the lookup needs to go will depend on what fields are being used for the lookups. The binning and aggregation should probably be done before the lookups. The last two sections would then look like this...

| fillnull value="" keyField1inEvent keyField2inEvent 
| stats count(variableX) as xCount  by _time keyField1inEvent keyField2inEvent 

 | rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey"
 | lookup myfirstlookup keyFieldInLookup1 as keyField1inEvent OUTPUTNEW LookupResult
 | lookup mysecondlookup keyFieldInLookup2 as keyField2inEvent OUTPUTNEW LookupResult

 | rename COMMENT as "Create two records for the year ago month to date, one record otherwise"
 | eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"),
      _time <= now() - 3000000, "Last year",
      true(),"this month"
  | mvexpand ReportKey
  | chart sum(xCount) by LookupResult,ReportKey 

Okay, now that we've straightened up your base search, here's the answer to your "how do I connect three different types of records that have two different keys" question...

 (your search that gets sourcetype=A) OR
 (your search that gets sourcetype=B) OR
 (your search that gets sourcetype=C)
| fields sourcetype ... keyAB keyBC fieldC ... list all the fields you need)

| rename COMMENT as "Roll the data from C to B based on keyBC, then throw away C"
| eventstats values(fieldC) as fieldC by keyBC
| where sourcetype!="C"

| rename COMMENT as "Roll the data from B to A based on keyAB, then throw away B"
| eventstats values(fieldC) as fieldC by keyAB
| where sourcetype="A"

How to connect these two sets of pseudocode will depend on the characteristics of the data in sourcetype B and C. If you need more help, you'll need to give us more information.

jcachosousa
Explorer

Thank you so much! I am relatively new to Splunk and had assumed I probably wasn't doing this search very efficiently.

I am going to take a stab at this with what you recommend and hopefully I can get it to work.

In case it is useful, I was able to get my original search with the appends working with lookups. If you have any additional recommendations with this info they would be greatly appreciated. Here is the search:

source=A earliest=-1mon@mon latest=-0mon@mon | eval ReportKey="This month"
| append [search source=A earliest=-13mon@mon latest=-12mon@mon | eval ReportKey="Last year" | eval _time=_time+60*60*24*365]
| rename A-to-BKey AS AtoBKey
| join AtoBKey [search source=B earliest=-12y latest=now | dedup FieldX sortby -_time]
| join BtoCKey [search source=C earliest=-12y latest=now | dedup FieldY sortby -_time]
| chart count(variableX) by LookUpField,ReportKey
| sort -"Last year", -"This month"
| table LookUpField, "Last year", "This month", "Last year by now"

0 Karma

DalJeanis
Legend

Glad you got what you needed. As best practices, we would definitely have you specify using the fields command what fields are being returned by each leg of the search (the mainline, the append, and the subsearches). That will increase efficiency enormously in most cases. I believe you only need _time, ReportKey, variableX, and/or FieldX and/or FieldY.

You should be able to add variableX=* to every relevant leg of the search and get the same result with slightly less CPU. That's a bit data dependent, though, depending on whether you want to throw NULLs away after you've deduped to get the most recent record for each FieldX / FieldY, or whether you want to keep them.

jcachosousa
Explorer

Thanks a lot!

Based on your suggestions, I currently have this:

source=A earliest=-12mon@mon | rename COMMENT as "limit to fields we need" | fields _time status number id type code code2 rep name ReportKey Count | rename code AS code2
| search NOT (status=*request* OR status=*pending* OR status=Error OR status="Expired") | dedup number sortby -id | search NOT (status="Voided" OR type="nullified")
| rename COMMENT as "Throw away records in the middle months" | where _time <= relative_time(now(),"-11mon@mon") OR _time >= relative_time(now(),"@mon")
| rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey" | join code2 [search source=B earliest=-12y latest=now | dedup code2 sortby -_time] | join id [search source=C earliest=-12y latest=now | dedup rep sortby -_time]
| fillnull value="" name ReportKey | stats count(no) as Count by _time code2 name
| rename COMMENT as "Create two records for the year ago month to date, one record otherwise" | eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"), _time <= now() - 3000000, "Last year", true(),"This month") | mvexpand ReportKey
| chart sum(Count) by name,ReportKey | fillnull | sort -"Last year", -"This month"| table name, "Last year", "This month", "Last Year by now"

That version perform better than what I originally had, which is great.

I can't quite figure out how to incorporate what you suggested for the last section. Any other recommendations on getting this to perform better would be greatly appreciated.

Thanks!

0 Karma

DalJeanis
Legend

If you have full admin, you could look into accelerating those subsearches, or making them scheduled subsearches, or ... * putting the results daily into a summary index !!!! * .... , but for the most part, what you have is workable.

0 Karma

mstjohn_splunk
Splunk Employee
Splunk Employee

hi @jcachosousa,

Did you have a chance to try out DalJeanis's answer? If it worked, please resolve this post by approving it! If your problem is still not solved, keep us updated so that someone else can help ya.

Thanks for posting!

richgalloway
SplunkTrust
SplunkTrust

Source B and Source C are not mentioned in your current search. How are they related to Source A? What fields link them?

---
If this reply helps you, Karma would be appreciated.

woodcock
Esteemed Legend

I agree. Your description mentions Source B and Source C but your search does not. Please EITHER fix your description, or your SPL (and be careful not to destroy my reformatting).

jcachosousa
Explorer

Thanks for the help. Sorry, I had addressed richgalloway's comment but my comment seems to have been lost after editing it (sorry, still getting used to the forum).

I actually got the search providing the expected results with the search below. I am now going to take a stab at making it more efficient based on what DalJeanis recommended below. Any other feedback or tips would be appreciated.

source=A earliest=-1mon@mon latest=-0mon@mon | eval ReportKey="This month"
| append [search source=A earliest=-13mon@mon latest=-12mon@mon | eval ReportKey="Last year" | eval _time=_time+60*60*24*365]
| rename A-to-BKey AS AtoBKey
| join AtoBKey [search source=B earliest=-12y latest=now | dedup FieldX sortby -_time]
| join BtoCKey [search source=C earliest=-12y latest=now | dedup FieldY sortby -_time]
| chart count(variableX) by LookUpField,ReportKey
| sort -"Last year", -"This month"
| table LookUpField, "Last year", "This month", "Last year by now"

0 Karma

woodcock
Esteemed Legend

You should post your final version as a comment to the answer by @daljeanis and click Accept on his answer. Also UpVote anything else (comments even) that was helpful to you.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...