Getting Data In

Why can't I generate KV pairs from nested field?

plynch52
Explorer

Here is a single record

Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2,99296=2]

Inside the [ ] are the KV pairs that I want to extract. All that I am able to retrieve have been FIELDn=string where string is the "number=number" KV pair that I want broken into key and value.

I have tried transforms.conf (REGEX = ([0-9]+)=([0-9]+) FORMAT = $1::$2) with a REPORT in props.conf to reference this.

Search string is

index=* OR index=_* sourcetype=Shield | rex field=_raw "(?ms)(?=[^N]*(?:Rule Hits Digest|N.*NetDefender Rule Hits Digest))^(?P[^\\[]+)[^\\]\\n]*\\]\\[(?P[^\\]]+)\\]\\[(?P\\d+)\\]\\[(?P\\d+)[^\\]\\n]*\\]\\[(?P[^\\]]+)" offset_field=_extracted_fields_bounds 

And it is the stats field that I need to break up into KV pairs. Variable number of KV pairs, Key values are from 1 to 1 million. I want to sum the counts for each key

0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"

View solution in original post

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Over at regex101.com, that regex looks thoroughly broken. The double escaping may be needed in some circumstances, but I don't believe it is needed for rex wihtin splunk. The ^ after Digest asserts the beginning of the field... and you have the lookahead there... which seems needlessly complicated. In the light of my results, it really IS overcomplicated.

I believe you are running into a problem with the limits of splunk's ability to extract multiple copies into multivalue fields. After futzing around a bit, I realized the obvious... this regex works...

| rex field=george max_match=0 "(?<kvpair>\d*=\d*)"

However, it only succeeds for numbers of kvpairs up to about 25-30, then fails without a message. I suspect there is some sort of catastrophic backtracking, but I can't see why that would be the case. In any case, one workable solution I found was to split the list up into units of 10 kvpairs, then split the individual kvpairs. You can take it from there.

| makeresults | eval _raw=" Feb  9 12:17:35 dev-test USERstrng[Rule Hits Digest][2017-02-09T12:05:00-07:00,2017-02-09T12:09:59-07:00][354][100][99189=2,99190=2,99191=2,99147=2,99146=2,99145=2,99144=2,99151=2,99150=2,99149=2,99148=2,99139=2,99138=2,99137=2,99136=2,99143=2,99142=2,99141=2,99140=2,99162=2,99163=2,99160=2,99161=2,99166=2,99167=2,99164=2,99165=2,99154=2,99155=2,99152=2,99153=2,99158=2,99159=2,99156=2,99157=2,99236=2,99237=2,99238=2,99239=2,99232=2,99233=2,99234=2,99235=2,99244=2,99245=2,99246=2,99247=2,99240=2,99241=2,99242=2,99243=2,99253=2,99252=2,99255=2,99254=2,99249=2,99248=2,99251=2,99250=2,99261=2,99260=2,99263=2,99262=2,99257=2,99256=2,99259=2,99258=2,99206=2,99207=2,99204=2,99205=2,99202=2,99203=2,99200=2,99201=2,99214=2,99215=2,99212=2,99213=2,99210=2,99211=2,99208=2,99209=2,99223=2,99222=2,99221=2,99220=2,99219=2,99218=2,99217=2,99216=2,99231=2,99230=2,99229=2,99228=2,99227=2,99226=2,99225=2,99224=2]"
| rex field=_raw max_match=0 "(?<kvgroup>(\d*=\d*[,\]]){1,10})"
| mvexpand kvgroup
| rex field=kvgroup max_match=10 "(?<kvpair>\d*=\d*)"
0 Karma

plynch52
Explorer

Thanks,
as a newbie I figured there had to be some way. The original regex was generated by Splunk. It parsed the kv pairs.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...