Splunk Search

rex field limit

theoneNeo
New Member

Hi,
I got the ff script working but putting in more rex field hangs splunk

index=xxx
| rex field=_raw "tel:001001(?9379|9370|9377|9378|35569|35567|35568|35566|213770|21350|213550|21366|168425|68577|3763|24492|124625|126878|124625|186966|5434|54320|5407|37491|37493|1876387|2975|61411|61418|61414|61415|61426|61430|42379|43664|43676|43650|43699|43681|43660|99450|99470|99455|1242|97339|97336|97333|88017|88018|88019|88016|124625|32475|32495|32486|375296|375297|50161|22993|22998|22997|22995|1441336|1441790|97517|97577|59172|59170|59177|38763|38761|38765|26773|26772|55005|55219499|55119101|55519111|55118801|55619201|55718101|55418801|55488822|55129157|55438804|55539111|55549111|55619207|55629294|55659243|55819444|55859139|55169173|55318401|55199101|55149101|55679244|55918401|55928401|55958401|55968401|55004|55002|55003|552181|552182|553191|553193|553491|557191|558191|558299|558499|558599|558899|559181|559182|559281|559381|559581|559681|559881|551181|551183|551184|551185|551281|551381|551681|551981|554199|554499|554799|554899|555181|556181|556281|559481|559981|555381|558799|558198|558897|55023|55319933|55031|55016|55006|55010|55011|128444|67387|67386|67389|67381|35988|35989|35987|2268|22676|22670|25779|85593|85512|85516|85515|85511|85513|85590|85593|85597|2379|33150|2377|164738|1403408|1438276|190271|160434|151499|1705896|170579|1807631|151442|164758|164721|143858|158758|177858|23899|23891|187681|23670|23672|23675|2356|86139|86138|86157|86135|86136|86159|86158|86150|86151|86152|86188|86147|86187|86183|86130|86131|86132|86145|86155|86156|86186|56982|56920|56916|57300|57310|57316|2693|24201|24206|24204|6825|506300|506600|22501|22503|22507|22505|38598|38597|38591|537264|35799|35796|420603|420733|420602|24399|24381|24389|45401|45609|45405|45311|4528|25377|124625|1829991|1809888|180985|182930|593994|593995|2012|4179991|4179995|2010|2011|503776|50387|2402|37256|37250|37255|25191|500|29824|29850|67970|6799|35840|35850|358451|3584570|33695|33689|33609|33611)\d+"
| rex field=_raw "tel:001001(?33660|33751|68977|59669690|59035|24105|24107|22077|2206|49151|49160|49170|49171|49175|49172|49177|49176|49176|99577|99593|99555|99599|23327|23324|23354|23320|23326|35058|3094|30693|30699|30690|3099|3097|2995|124625|1671788|1671987|1671488|502530|502520|447781|22464|22465|22462|22463|2456|59264|59266|50944|504899|50495|50439|852902|852921|852901|852949|852633|852923|852923|852920|852645|3630|3670|3620|35477|35489|35489|354611|354650|35469|919842|919851|919852|919853|919854|919856|919857|919858|919804|919716|919700|919722|919738|919762|919768|919782|919802|919806|919809|919890|919849|919898|919974|919896|919895|919893|919892|919815|919897|919810|919935|919845|919816|919831|919894|919829|919840|919030|919031|917796|919028|919029|919033|919034|918091)\d+"
| lookup roaming_db.csv HLR_ADDR OUTPUT Destination Operator_Name
| eval HLR_ADDR = HLR_ADDR + " - " + Destination + " - " + Operator_Name
| stats count by HLR_ADDR

Is there other ways to approach this?

Tags (1)
0 Karma

lguinn2
Legend

New Answer: I am still using a lookup, but to simplify things I have included a macro. Will my first answer work? Sure, but you will need to create hlr_addr4 hlr_addr5 hlr_addr6 hlr_addr7 hlr_addr8 - where will the madness end!!

The new solution still includes a lookup, defined as follows in transforms.conf

[roaming_lookup]
filename = roaming_db.csv
min_matches = 1
default_match =Unknown

It adds a macro, which is defined in macros.conf (or the GUI)

[check_hrl_addr(1)]
args = len
definition = "rex "tel:001001(?<HRL_ADDR$len$>\d{$len$}) | lookup roaming_lookup HLR_ADDR OUTPUT Destination$len$  Operator_Name$len$"
validation =isnum(len)
errormsg = len must be a number

Now the search is still pretty long, but not quite so bad.

index=xxx
| `check_hrl_addr(4)` | `check_hrl_addr(5)` | `check_hrl_addr(6)` | `check_hrl_addr(7)` | `check_hrl_addr(8)`
| eval Destination = case(Destination4!="Unknown", Destination4, Destination5!="Unknown", Destination5, 
    Destination6!="Unknown", Destination6,  Destination7!="Unknown", Destination7, 
    Destination8!="Unknown", Destination8, 1==1, "Unknown")
| eval Operator_Name = case(Operator_Name4!="Unknown", Operator_Name4, Operator_Name5!="Unknown",
     Operator_Name5, Operator_Name6!="Unknown", Operator_Name6,  
     Operator_Name7!="Unknown", Operator_Name7, Operator_Name8!="Unknown", Operator_Name8, 1==1, "Unknown")
| eval HRL_ADDR = case(HRL_ADDR4!="Unknown", HRL_ADDR4, HRL_ADDR5!="Unknown", HRL_ADDR5,
     HRL_ADDR6!="Unknown", HRL_ADDR6,  HRL_ADDR7!="Unknown", HRL_ADDR7, 
     HRL_ADDR8!="Unknown", HRL_ADDR8, 1==1, "Unknown")
| fields - HRL_ADDR4 HRL_ADDR5 HRL_ADDR6 HRL_ADDR7 HRL_ADDR8 Destination4 Destination5 Destination6 Destination7 
    Destination8 Operator_Name4 Operator_Name5 Operator_Name6 Operator_Name7 Operator_Name8
| stats count by HLR_ADDR Destination Operator_Name

Still, I think that there might be a nicer way to write the rex and do it just once! But without seeing the actual events, this is the best I could do.

0 Karma

lguinn2
Legend

UPDATE: It was pointed out that the hrl_addr can be 4 to 8 digits, not just 4 or 5. Therefore, I have updated this answer as a separate answer - it was just too hard to rewrite this one...
I would consider doing it this way - use your lookup table to validate your hrl_addr like this

index=xxx
| rex "tel:001001(?<hlr_addr4>\d{4})"
| lookup roaming_db.csv HLR_ADDR as hlr_addr4 OUTPUT Destination as Destination4 Operator_Name as Operator_Name4
| rex "tel:001001(?<hlr_addr5>\d{5})"
| lookup roaming_lookup HLR_ADDR as hlr_addr5 HR OUTPUT Destination as Destination4 Operator_Name as Operator_Name4
| eval Destination = if(Destination4=="Unknown", Destination5, Destination4)
| eval Operator_Name = if(Destination4=="Unknown",Operator_Name5,Operator_Name4)
| stats count by HLR_ADDR Destination Operator_Name

In order for this to work, you will need to setup your CSV file as a lookup. If you do it manually in transforms.conf, it will look like this:

[roaming_lookup]
filename = roaming_db.csv
min_matches = 1
default_match =Unknown

(This transforms.conf goes in the same app as the csv file.) Or you can set up the lookup using the GUI. Just be sure to set the minimum number of matches to 1, and the default match value to Unknown.

The nice thing about this solution is that it will behave consistently, even with a large CSV file.

lguinn2
Legend

rex "tel:001001(?<hrl_addr>\d{4,8})" will not work because the regular expression has nothing after the {4,8} - how would the regex know whether to choose 4, 5, 6, 7 or 8 digits?

IF we know that the number ended with a space, we could do this
rex "tel:001001(?<hrl_addr>\d+) "

which says "pick off the one or more digits up to the next space" - but we don't know that. We don't know how many digits we need until we check to see if it matches in the lookup.

0 Karma

laserval
Communicator

Right, but what I don't get is why you need separate fields for different lengths.

rex "tel:001001(?<hrl_addr>\d{4,8})"

Why would this not work?

0 Karma

lguinn2
Legend

Ouch - I didn't notice the ones with length 6-8! Still, I like my solution better than having to maintain these awful lists...

0 Karma

laserval
Communicator

Doesn't the hlr_addr field vary in length more than that?

The second rex checks for values of length 4 to 8:
33660|33751|68977|59669690|59035|24105|24107|22077|2206|49151...919802|919806

I don't get the need to split it into 4 or 5 digit numbers, but I might be missing something.

0 Karma

somesoni2
Revered Legend

Would you mind providing some data samples? This will help if there are any alternative rex commands.

linu1988
Champion

this is not a good approach, rather than writing this you can count the number of "|" pipes and group them in rex command e.g."tel:001001(?((\d+|)){60}.

But i agree splunk will be slow with this much regex processing.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...