Getting Data In

query optimization without join

vikashperiwal
Path Finder

I am having multiple index and sources , initially we wrote query using join and we got desired output , but now our planners want the query to be optimized and not to use join and append, below is my query can any one help me with better solution without join and append

index=ndspr sourcetype=SUMMARY_CDR_VW  OPC_CLLI_CD=CLGRAB1201T OR OPC_CLLI_CD=CLGRAB12GT0 OR OPC_CLLI_CD=EDTNAB02GT0 OR OPC_CLLI_CD=KMLPBC01GT0 OR OPC_CLLI_CD=VANCBC01GT0 OR OPC_CLLI_CD=EDTNAB02GT1 
| fields TERM_SUBGRPG_CD ORIG_POINT_CD TERM_GRPG_CD DPC_CARRIER_LONG_NM CALLED_PARTY_NOA_CD DEST_POINT_CD DPC_TOLL_IND GENERIC_PORTED_SUBGRPG_CD OPC_CARRIER_LONG_NM OPC_CLLI_CD DPC_CLLI_CD SUM_BILL_TM_CNT SUM_NTWK_DURTN_CNT TOTAL_CALL_CNT DMS_FILE 
| stats sum(SUM_BILL_TM_CNT) as SUM_BILL_TM_CNT, sum(SUM_NTWK_DURTN_CNT) as SUM_NTWK_DURTN_CNT, sum(TOTAL_CALL_CNT) as TOTAL_CALL_CNT by DPC_CARRIER_LONG_NM, CALLED_PARTY_NOA_CD, ORIG_POINT_CD, DEST_POINT_CD, DPC_CLLI_CD, OPC_CLLI_CD, TERM_SUBGRPG_CD TERM_GRPG_CD 
| join type=left TERM_SUBGRPG_CD 
    [ search index=csvlookups source="*NRS_Lite_SNP*" earliest=1 
    | rename NPANXX as TERM_SUBGRPG_CD 
    | fields TERM_SUBGRPG_CD COMPANY LIR "Rate  Center"] 
| join type=left TERM_GRPG_CD 
    [ search index=csvlookups source="*npa_report.csv*" earliest=1
    | rename NPA_ID as TERM_GRPG_CD 
    | fields TERM_GRPG_CD COUNTRY ] 
| join type=left ORIG_POINT_CD 
    [ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\191008_CLGRAB1201T_C7NETWRK.csv" earliest=-20d@d latest=now() 
    | rex field=PC "(?<p1>[\d]*).(?<p2>[\d]*).(?<p3>[\d]*)" 
    | eval p3=case(len(p3)==1,"00".p3,len(p3)==2,"0".p3) 
    | eval ORIG_POINT_CD=p1.".".p2.".".p3 
    | rename NAME as NETNAME 
    | fields ORIG_POINT_CD NETNAME NETTYPE NODENUM NODETYPE] 
| join type=left NETNAME DEST_POINT_CD 
    [ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\191008_CLGRAB1201T_C7RTESET.csv" earliest=-20d@d
    | rex field=PC "(?<p1>[\d]*).(?<p2>[\d]*).(?<p3>[\d]*)" 
    | eval p3=case(len(p3)==1,"00".p3,len(p3)==2,"0".p3) 
    | eval DPC=p1.".".p2.".".p3 
    | rename DPC as DEST_POINT_CD DESTNAME as ROUTESET 
    | fields NETNAME DEST_POINT_CD NETTYPE ROUTESET] 
| join type=left ROUTESET 
    [ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\191008_CLGRAB1201T_ISUPDEST.csv" earliest=-20d@d
    | rename ISUPROUT as ROUTESET 
    | fields ROUTESET CLLI ] 
| join type=left CLLI 
    [ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\191008_CLGRAB1201T_CLLI.csv" earliest=-20d@d
    | rename TRUNKCLLI as CLLI 
    | fields CLLI ADMININF ADNUM TRKGRSIZ] 
| eval country=if(CALLED_PARTY_NOA_CD==3,COUNTRY, 0) 
| fillnull value=0 
| table NETNAME, ROUTESET, ADMININF, TRKGRSIZ, CLLI, COMPANY,country LIR, TERM_GRPG_CD "Rate  Center", OPC_CLLI_CD DPC_CARRIER_LONG_NM CALLED_PARTY_NOA_CD ORIG_POINT_CD DEST_POINT_CD DPC_CLLI_CD TERM_SUBGRPG_CD SUM_BILL_TM_CNT SUM_NTWK_DURTN_CNT TOTAL_CALL_CNT 
| sort 0 - TOTAL_CALL_CNT
0 Karma

to4kawa
Ultra Champion
(first search) OR (join sub search 1) OR (join sub search 2) ....
| eventstats values(COMPANY) as COMPANY values(LIR) as LIR  values("Rate  Center") as  Rate_Center by TERM_SUBGRPG_CD 
| eventstats .....

In this way, you can search for the first time and collect them based on the fields you want to join in eventstats.

0 Karma

vikashperiwal
Path Finder

any optimized solution for the above query, i do not expect the complete query, but if any suggestions that would be help ful

0 Karma

ololdach
Builder

Hi,
to me it appears as if you import csv system snapshots into an index on a regular basis. In your query, I assume that you only need the last snapshot result as a lookup in order to enrich your data with a current status from the last reports. If so, you should consider to replace all of your joins with regular lookups.
Best regards
Oliver

0 Karma

vikashperiwal
Path Finder

removing lookup was our requirement, initially we had look ups but then we indexed the data with multiple source.

0 Karma

ololdach
Builder

Hi, from a data point of view, join is one of the most expensive operations you can do. If you want to optimise your query without using join, you will have to restructure your data in such a way that it can be avoided. For enriching the events with data that is available and valid at index time, you could consider to include lookups at index time or automatic lookups at search time. If .csv files are too slow for like 1M+ rows of lookup data, check out the kvstore as an alternative or use filtering or advanced lookup features like time lookups. IMHO join was never meant to be a replacement for lookups as it is a totally different operation and unless you actually need the product of two sets of data, should be avoided due to the high complexity of the operation - Oliver

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...