I have a request that is sent out in the following format:
?doc=A0RF7S:36518:2;A0RET7:36254:1;A0REQ2:38161:2;A0REJ8:129192:6;A0RDYJ:35301:2
Or, it can be:
?doc=A0RDSZ:9865:2
Or any number of what we call 'doc triplets' separated by semi-colons.
What I want to do is extract the three component pieces whether it happens once or n+1 times.
The doc triplet is JOBID:OFFSET:PAGES
I would like to pull, at a minimum, the JOBID and PAGES. This way I can report on PAGES requested per JOBID, or unique JOBID's requested, or total pages per time-period, etc.
Any suggestions on how to write that regex?
Thanks!
With an inline rex
call you can do this:
| stats count | eval url = "?doc=A0RF7S:36518:2;A0RET7:36254:1;A0REQ2:38161:2;A0REJ8:129192:6;A0RDYJ:35301:2"
| rex field=url max_match=0 "(?:=|;)(?<JOBID>[^:]+):(?<OFFSET>[^:]+):(?<PAGES>[^;]+)"
The max_match=0
tells rex
to re-apply the extraction over and over until it runs out of input.
With configured extractions you add a field transforms with that expression sans double quotes and check the multivalue box. Then you add a field extraction using transforms with the name of your transformation.
With an inline rex
call you can do this:
| stats count | eval url = "?doc=A0RF7S:36518:2;A0RET7:36254:1;A0REQ2:38161:2;A0REJ8:129192:6;A0RDYJ:35301:2"
| rex field=url max_match=0 "(?:=|;)(?<JOBID>[^:]+):(?<OFFSET>[^:]+):(?<PAGES>[^;]+)"
The max_match=0
tells rex
to re-apply the extraction over and over until it runs out of input.
With configured extractions you add a field transforms with that expression sans double quotes and check the multivalue box. Then you add a field extraction using transforms with the name of your transformation.