Hello,
I'm trying to extract a field in Splunk, but for some reason it's only extracting part of the field. For instance, here is the raw file that I am seeing:
[03/Dec/2012:13:11:51 +0000] "GET /profile-services/talent?src=document_patents&q=(%22pelvis%22)%20AND%20NOT%20(%22childbirth%22)&normalize=1&clusterxml=1&proj=f8c6ade298d8adc5b6a4346f6acd7580&sort=relevance&start=0&limit=10 HTTP/1.1" 200 6232 "https://pg.inno-360.com/projects/f8c6ade298d8adc5b6a4346f6acd7580/landscapes" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.19) Gecko/20110707 Firefox/3.6.19" pg.inno-360.com 636009
Out of that file, I am interested in the field 'q', which would be the following:
q=(%22pelvis%22)%20AND%20NOT%20(%22childbirth%22)
However, when I run the query "... | stats count by q" I am just given the following:
(%22pelvis%22)
It looks like for some reason Splunk is only extracting the first part of the field instead of the whole thing. I was curious if there was a way to change the field extraction to get the entire thing or if this would be possible in any way. Any help on this would be greatly appreciated, thanks in advance.
-Tyler
Using the SPL, you can try the following rex
command and test the following regex
q=(?P<q>[^\&]+)
So, once you have your results:
<yourGeneratingSearch> | rex field=_raw "q=(?P<q>[^\&]+)"
This should look in the raw field and then assign the field as anything from q=
upto the next use of &
.
If this works, you can update/save this as the extraction for q
and it will override Splunk's extraction.
Hope this helps.
Using the SPL, you can try the following rex
command and test the following regex
q=(?P<q>[^\&]+)
So, once you have your results:
<yourGeneratingSearch> | rex field=_raw "q=(?P<q>[^\&]+)"
This should look in the raw field and then assign the field as anything from q=
upto the next use of &
.
If this works, you can update/save this as the extraction for q
and it will override Splunk's extraction.
Hope this helps.
Glad it helped.
This worked perfect, thanks!
It almost appears as if the field extractor is urldecoding that line, and then because the resulting string is q=("pelvis") AND NOT ("childbirth")
, it stops at the space after the ). You could run it through rex and see what works.
your_search| rex field=_raw "q=(?<query>[^&]*)&"
assuming that everytime your page is called, there is another parameter after the q parameter.