Splunk doesn't parse this URL fully....

sunitachan · ‎02-20-2015

Hello My dear Splunker!,

I was trying to get data via syslog into Splunk, the events consists of a request="url" field like below:

request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433...

But Splunk parses it like this:
request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default

Can someone help me with this please?
How can I get the full URL parsed correctly?
And where can I go in Splunk to tweak this field? As my data is already parsed...

Appreciate the help!!
Thanks
Sunita

esix_splunk · ‎02-22-2015

As CPetterborg mentions, it depends on how the event looks. Is this a space delimited event, or newline feed.. I would use something like:

request=(?<url>[^\s|^\r\n]+)

That would capture anything followed by a space, or a unix style linefeed (that might need to be adjusted based on the sourcetype.) One potential issue with using a space as a delimiter could be that you might have a url that has a space or encoded space character in the url...

cpetterborg · ‎02-22-2015

I'm making some assumptions here...

Looks like you are relying on key/value pair parsing for automatic field extraction. You probably want to use a rex command or do a field extraction for your data. Since there are no spaces in your URL you should be able to us the following regex to parse the request url:

request=(?P<url>[^ ]+)

I'm assuming that from the samples, there is really supposed to be a space between the various fields for each event.

somesoni2 · ‎02-20-2015

Could you provide some sample full events and also definition of your URL2 field extraction?

sunitachan · ‎02-20-2015

Hi there,
here are few samples,

Feb
20
09:25:27 |1.0.3|0|passed|0|src=x.x.x.x
spt=40960
dst=34.23.12.3
dpt=80
deviceDirection=1
request=http://www.unikin.cd/
act=passed
cn1Label=Risk_Score
cn1=0
cs5=-
cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=-
cs2Label=Protocol

Feb
20 09:25:27|1.0.3|0|passed|0|src=x.x.x.x
spt=60657
dst=291.98.1.1
dpt=80
deviceDirection=1
request=http://mobile.orange.fr/
act=passed
cn1Label=Risk_Score
cn1=0
cs5=- cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=- cs2Label=Protocol

Feb
16 08:46:11|1.0.3|0|passed|0|src=x.x.x.x
spt=55845
dst=199.11.1.1
dpt=80
deviceDirection=1
request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433...
act=passed
cn1Label=Risk_Score
cn1=0
cs5=- cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=- cs2Label=Protocol

And URL = request
URL2 = request with long url as in the 3rd sample above

Can I have just one field which could include both type of URLs?
The URL2 regex is ^(?:[^=\n]*=){6}(?P[^ ]+)

Thanks

sunitachan · ‎02-20-2015

Hello all,
I actually used the built in field extraction tool to parse this particular field, but the issue now I see is that the field extraction is applied to all other URLs which are not this long. So I have:
URL
URL2

I want to only apply this field extraction to URL2..

Any suggestion please?
Thanks

Splunk doesn't parse this URL fully....

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life