Solved: Repeating Field Extraction

tyronetv · ‎05-20-2014

I have a request that is sent out in the following format:

?doc=A0RF7S:36518:2;A0RET7:36254:1;A0REQ2:38161:2;A0REJ8:129192:6;A0RDYJ:35301:2

Or, it can be:

?doc=A0RDSZ:9865:2

Or any number of what we call 'doc triplets' separated by semi-colons.

What I want to do is extract the three component pieces whether it happens once or n+1 times.

The doc triplet is JOBID:OFFSET:PAGES

I would like to pull, at a minimum, the JOBID and PAGES. This way I can report on PAGES requested per JOBID, or unique JOBID's requested, or total pages per time-period, etc.

Any suggestions on how to write that regex?

Thanks!

martin_mueller · ‎05-20-2014

With an inline rex call you can do this:

| stats count | eval url = "?doc=A0RF7S:36518:2;A0RET7:36254:1;A0REQ2:38161:2;A0REJ8:129192:6;A0RDYJ:35301:2"
| rex field=url max_match=0 "(?:=|;)(?<JOBID>[^:]+):(?<OFFSET>[^:]+):(?<PAGES>[^;]+)"

The max_match=0 tells rex to re-apply the extraction over and over until it runs out of input.

With configured extractions you add a field transforms with that expression sans double quotes and check the multivalue box. Then you add a field extraction using transforms with the name of your transformation.

View solution in original post

martin_mueller · ‎05-20-2014

With an inline rex call you can do this:

| stats count | eval url = "?doc=A0RF7S:36518:2;A0RET7:36254:1;A0REQ2:38161:2;A0REJ8:129192:6;A0RDYJ:35301:2"
| rex field=url max_match=0 "(?:=|;)(?<JOBID>[^:]+):(?<OFFSET>[^:]+):(?<PAGES>[^;]+)"

The max_match=0 tells rex to re-apply the extraction over and over until it runs out of input.

With configured extractions you add a field transforms with that expression sans double quotes and check the multivalue box. Then you add a field extraction using transforms with the name of your transformation.

Repeating Field Extraction

Splunk Custom Visualizations App End of Life

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk