Hi,
I am looking for a way to efficiently set up multiple lookups (or ideally a more efficient function) within one search, and I am having a hard time figuring it out.
I have a key field
between source A
and source B
, and I need to get a value from source C
based on the lookup result from source B
. I am also using append
within my search. This is basically what my current search looks like without the lookup
:
source=A earliest=-0mon@mon latest=now |eval ReportKey="This month"
| append [search source=A earliest=-12mon@mon latest=-11mon@mon
| eval ReportKey="Last year"
| eval _time=_time+60*60*24*365]
| append [search source=A earliest=-12mon@mon latest=-52w
| eval ReportKey="Last year by now"
| eval _time=_time+60*60*24*365]
| bucket _time span=1d
| chart count(variableX) by LookUpField,ReportKey
| sort -"Last year", -"This month"
| table LookUpField, "Last year", "This month", "Last year by now"
What I am trying to do is get LookUpField
to be the relevant value
from source C
.
Any help would be greatly appreciated.
1) You are doing WAY too much work manually with those appends. Just grab all the records you need and assign them multiple values depending on what they are.
index=foo source=bar earliest=-12mon@mon
| rename COMMENT as "limit to fields we need"
| fields _time variableX ... fields needed for lookups....
| rename COMMENT as "Throw away records in the middle months"
| where _time <= relative_time(now(),"-11mon@mon") OR _time >= relative_time(now(),"@mon")
| bucket _time span=1d
| rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey"
| lookup myfirstlookup keyFieldInLookup1 as keyField1inEvent OUTPUTNEW LookupResult
| lookup mysecondlookup keyFieldInLookup2 as keyField2inEvent OUTPUTNEW LookupResult
| rename COMMENT as "Create two records for the year ago month to date, one record otherwise"
| eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"),
_time <= now() - 3000000, "Last year",
true(),"this month"
| mvexpand ReportKey
| chart count(variableX) by LookupResult,ReportKey
2) Technically that 3 million number could be 2678400 seconds, the number of seconds in a 31 day month, but since the middle 11 months are dropped, the exactness is not needed. If exactness was needed, then you'd actually need to calculate the date anyway.
3) You are binning the _time but not using it. I believe that may be left over from an attempted timechart, or other aggregation. I left it in and moved it where it should be.
4) Efficiency for where in the code the lookup needs to go will depend on what fields are being used for the lookups. The binning and aggregation should probably be done before the lookups. The last two sections would then look like this...
| fillnull value="" keyField1inEvent keyField2inEvent
| stats count(variableX) as xCount by _time keyField1inEvent keyField2inEvent
| rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey"
| lookup myfirstlookup keyFieldInLookup1 as keyField1inEvent OUTPUTNEW LookupResult
| lookup mysecondlookup keyFieldInLookup2 as keyField2inEvent OUTPUTNEW LookupResult
| rename COMMENT as "Create two records for the year ago month to date, one record otherwise"
| eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"),
_time <= now() - 3000000, "Last year",
true(),"this month"
| mvexpand ReportKey
| chart sum(xCount) by LookupResult,ReportKey
Okay, now that we've straightened up your base search, here's the answer to your "how do I connect three different types of records that have two different keys" question...
(your search that gets sourcetype=A) OR
(your search that gets sourcetype=B) OR
(your search that gets sourcetype=C)
| fields sourcetype ... keyAB keyBC fieldC ... list all the fields you need)
| rename COMMENT as "Roll the data from C to B based on keyBC, then throw away C"
| eventstats values(fieldC) as fieldC by keyBC
| where sourcetype!="C"
| rename COMMENT as "Roll the data from B to A based on keyAB, then throw away B"
| eventstats values(fieldC) as fieldC by keyAB
| where sourcetype="A"
How to connect these two sets of pseudocode will depend on the characteristics of the data in sourcetype B and C. If you need more help, you'll need to give us more information.
1) You are doing WAY too much work manually with those appends. Just grab all the records you need and assign them multiple values depending on what they are.
index=foo source=bar earliest=-12mon@mon
| rename COMMENT as "limit to fields we need"
| fields _time variableX ... fields needed for lookups....
| rename COMMENT as "Throw away records in the middle months"
| where _time <= relative_time(now(),"-11mon@mon") OR _time >= relative_time(now(),"@mon")
| bucket _time span=1d
| rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey"
| lookup myfirstlookup keyFieldInLookup1 as keyField1inEvent OUTPUTNEW LookupResult
| lookup mysecondlookup keyFieldInLookup2 as keyField2inEvent OUTPUTNEW LookupResult
| rename COMMENT as "Create two records for the year ago month to date, one record otherwise"
| eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"),
_time <= now() - 3000000, "Last year",
true(),"this month"
| mvexpand ReportKey
| chart count(variableX) by LookupResult,ReportKey
2) Technically that 3 million number could be 2678400 seconds, the number of seconds in a 31 day month, but since the middle 11 months are dropped, the exactness is not needed. If exactness was needed, then you'd actually need to calculate the date anyway.
3) You are binning the _time but not using it. I believe that may be left over from an attempted timechart, or other aggregation. I left it in and moved it where it should be.
4) Efficiency for where in the code the lookup needs to go will depend on what fields are being used for the lookups. The binning and aggregation should probably be done before the lookups. The last two sections would then look like this...
| fillnull value="" keyField1inEvent keyField2inEvent
| stats count(variableX) as xCount by _time keyField1inEvent keyField2inEvent
| rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey"
| lookup myfirstlookup keyFieldInLookup1 as keyField1inEvent OUTPUTNEW LookupResult
| lookup mysecondlookup keyFieldInLookup2 as keyField2inEvent OUTPUTNEW LookupResult
| rename COMMENT as "Create two records for the year ago month to date, one record otherwise"
| eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"),
_time <= now() - 3000000, "Last year",
true(),"this month"
| mvexpand ReportKey
| chart sum(xCount) by LookupResult,ReportKey
Okay, now that we've straightened up your base search, here's the answer to your "how do I connect three different types of records that have two different keys" question...
(your search that gets sourcetype=A) OR
(your search that gets sourcetype=B) OR
(your search that gets sourcetype=C)
| fields sourcetype ... keyAB keyBC fieldC ... list all the fields you need)
| rename COMMENT as "Roll the data from C to B based on keyBC, then throw away C"
| eventstats values(fieldC) as fieldC by keyBC
| where sourcetype!="C"
| rename COMMENT as "Roll the data from B to A based on keyAB, then throw away B"
| eventstats values(fieldC) as fieldC by keyAB
| where sourcetype="A"
How to connect these two sets of pseudocode will depend on the characteristics of the data in sourcetype B and C. If you need more help, you'll need to give us more information.
Thank you so much! I am relatively new to Splunk and had assumed I probably wasn't doing this search very efficiently.
I am going to take a stab at this with what you recommend and hopefully I can get it to work.
In case it is useful, I was able to get my original search with the appends working with lookups. If you have any additional recommendations with this info they would be greatly appreciated. Here is the search:
source=A earliest=-1mon@mon latest=-0mon@mon | eval ReportKey="This month"
| append [search source=A earliest=-13mon@mon latest=-12mon@mon | eval ReportKey="Last year" | eval _time=_time+60*60*24*365]
| rename A-to-BKey AS AtoBKey
| join AtoBKey [search source=B earliest=-12y latest=now | dedup FieldX sortby -_time]
| join BtoCKey [search source=C earliest=-12y latest=now | dedup FieldY sortby -_time]
| chart count(variableX) by LookUpField,ReportKey
| sort -"Last year", -"This month"
| table LookUpField, "Last year", "This month", "Last year by now"
Glad you got what you needed. As best practices, we would definitely have you specify using the fields
command what fields are being returned by each leg of the search (the mainline, the append, and the subsearches). That will increase efficiency enormously in most cases. I believe you only need _time, ReportKey, variableX, and/or FieldX and/or FieldY.
You should be able to add variableX=*
to every relevant leg of the search and get the same result with slightly less CPU. That's a bit data dependent, though, depending on whether you want to throw NULLs away after you've deduped to get the most recent record for each FieldX / FieldY, or whether you want to keep them.
Thanks a lot!
Based on your suggestions, I currently have this:
source=A earliest=-12mon@mon | rename COMMENT as "limit to fields we need" | fields _time status number id type code code2 rep name ReportKey Count | rename code AS code2
| search NOT (status=*request* OR status=*pending* OR status=Error OR status="Expired") | dedup number sortby -id | search NOT (status="Voided" OR type="nullified")
| rename COMMENT as "Throw away records in the middle months" | where _time <= relative_time(now(),"-11mon@mon") OR _time >= relative_time(now(),"@mon")
| rename COMMENT as "Your lookup code goes here, as long as it doesn't depend on the ReportKey" | join code2 [search source=B earliest=-12y latest=now | dedup code2 sortby -_time] | join id [search source=C earliest=-12y latest=now | dedup rep sortby -_time]
| fillnull value="" name ReportKey | stats count(no) as Count by _time code2 name
| rename COMMENT as "Create two records for the year ago month to date, one record otherwise" | eval ReportKey=case(_time <= relative_time(now(),"-52w"), mvappend("Last year","Last Year by now"), _time <= now() - 3000000, "Last year", true(),"This month") | mvexpand ReportKey
| chart sum(Count) by name,ReportKey | fillnull | sort -"Last year", -"This month"| table name, "Last year", "This month", "Last Year by now"
That version perform better than what I originally had, which is great.
I can't quite figure out how to incorporate what you suggested for the last section. Any other recommendations on getting this to perform better would be greatly appreciated.
Thanks!
If you have full admin, you could look into accelerating those subsearches, or making them scheduled subsearches, or ... * putting the results daily into a summary index !!!! * .... , but for the most part, what you have is workable.
hi @jcachosousa,
Did you have a chance to try out DalJeanis's answer? If it worked, please resolve this post by approving it! If your problem is still not solved, keep us updated so that someone else can help ya.
Thanks for posting!
Source B and Source C are not mentioned in your current search. How are they related to Source A? What fields link them?
I agree. Your description mentions Source B
and Source C
but your search does not. Please EITHER fix your description, or your SPL (and be careful not to destroy my reformatting).
Thanks for the help. Sorry, I had addressed richgalloway's comment but my comment seems to have been lost after editing it (sorry, still getting used to the forum).
I actually got the search providing the expected results with the search below. I am now going to take a stab at making it more efficient based on what DalJeanis recommended below. Any other feedback or tips would be appreciated.
source=A earliest=-1mon@mon latest=-0mon@mon | eval ReportKey="This month"
| append [search source=A earliest=-13mon@mon latest=-12mon@mon | eval ReportKey="Last year" | eval _time=_time+60*60*24*365]
| rename A-to-BKey AS AtoBKey
| join AtoBKey [search source=B earliest=-12y latest=now | dedup FieldX sortby -_time]
| join BtoCKey [search source=C earliest=-12y latest=now | dedup FieldY sortby -_time]
| chart count(variableX) by LookUpField,ReportKey
| sort -"Last year", -"This month"
| table LookUpField, "Last year", "This month", "Last year by now"
You should post your final version as a comment to the answer by @daljeanis and click Accept
on his answer. Also UpVote
anything else (comments even) that was helpful to you.