Splunk doesn't parse this URL fully....

sunitachan · ‎02-20-2015

Hello My dear Splunker!,

I was trying to get data via syslog into Splunk, the events consists of a request="url" field like below:

request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433...

But Splunk parses it like this:
request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default

Can someone help me with this please?
How can I get the full URL parsed correctly?
And where can I go in Splunk to tweak this field? As my data is already parsed...

Appreciate the help!!
Thanks
Sunita

esix_splunk · ‎02-22-2015

As CPetterborg mentions, it depends on how the event looks. Is this a space delimited event, or newline feed.. I would use something like:

request=(?<url>[^\s|^\r\n]+)

That would capture anything followed by a space, or a unix style linefeed (that might need to be adjusted based on the sourcetype.) One potential issue with using a space as a delimiter could be that you might have a url that has a space or encoded space character in the url...

cpetterborg · ‎02-22-2015

I'm making some assumptions here...

Looks like you are relying on key/value pair parsing for automatic field extraction. You probably want to use a rex command or do a field extraction for your data. Since there are no spaces in your URL you should be able to us the following regex to parse the request url:

request=(?P<url>[^ ]+)

I'm assuming that from the samples, there is really supposed to be a space between the various fields for each event.

somesoni2 · ‎02-20-2015

Could you provide some sample full events and also definition of your URL2 field extraction?

sunitachan · ‎02-20-2015

Hi there,
here are few samples,

Feb
20
09:25:27 |1.0.3|0|passed|0|src=x.x.x.x
spt=40960
dst=34.23.12.3
dpt=80
deviceDirection=1
request=http://www.unikin.cd/
act=passed
cn1Label=Risk_Score
cn1=0
cs5=-
cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=-
cs2Label=Protocol

Feb
20 09:25:27|1.0.3|0|passed|0|src=x.x.x.x
spt=60657
dst=291.98.1.1
dpt=80
deviceDirection=1
request=http://mobile.orange.fr/
act=passed
cn1Label=Risk_Score
cn1=0
cs5=- cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=- cs2Label=Protocol

Feb
16 08:46:11|1.0.3|0|passed|0|src=x.x.x.x
spt=55845
dst=199.11.1.1
dpt=80
deviceDirection=1
request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433...
act=passed
cn1Label=Risk_Score
cn1=0
cs5=- cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=- cs2Label=Protocol

And URL = request
URL2 = request with long url as in the 3rd sample above

Can I have just one field which could include both type of URLs?
The URL2 regex is ^(?:[^=\n]*=){6}(?P[^ ]+)

Thanks

sunitachan · ‎02-20-2015

Hello all,
I actually used the built in field extraction tool to parse this particular field, but the issue now I see is that the field extraction is applied to all other URLs which are not this long. So I have:
URL
URL2

I want to only apply this field extraction to URL2..

Any suggestion please?
Thanks

Splunk doesn't parse this URL fully....

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!