Getting Data In

Splunk doesn't parse this URL fully....

sunitachan
New Member

Hello My dear Splunker!,

I was trying to get data via syslog into Splunk, the events consists of a request="url" field like below:

request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433...

But Splunk parses it like this:
request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default

Can someone help me with this please?
How can I get the full URL parsed correctly?
And where can I go in Splunk to tweak this field? As my data is already parsed...

Appreciate the help!!
Thanks
Sunita

Tags (2)
0 Karma

esix_splunk
Splunk Employee
Splunk Employee

As CPetterborg mentions, it depends on how the event looks. Is this a space delimited event, or newline feed.. I would use something like:

request=(?<url>[^\s|^\r\n]+)

That would capture anything followed by a space, or a unix style linefeed (that might need to be adjusted based on the sourcetype.) One potential issue with using a space as a delimiter could be that you might have a url that has a space or encoded space character in the url...

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

I'm making some assumptions here...

Looks like you are relying on key/value pair parsing for automatic field extraction. You probably want to use a rex command or do a field extraction for your data. Since there are no spaces in your URL you should be able to us the following regex to parse the request url:

request=(?P<url>[^ ]+)

I'm assuming that from the samples, there is really supposed to be a space between the various fields for each event.

0 Karma

somesoni2
Revered Legend

Could you provide some sample full events and also definition of your URL2 field extraction?

0 Karma

sunitachan
New Member

Hi there,
here are few samples,

Feb
20
09:25:27 |1.0.3|0|passed|0|src=x.x.x.x
spt=40960
dst=34.23.12.3
dpt=80
deviceDirection=1
request=http://www.unikin.cd/
act=passed
cn1Label=Risk_Score
cn1=0
cs5=-
cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=-
cs2Label=Protocol

Feb
20 09:25:27|1.0.3|0|passed|0|src=x.x.x.x
spt=60657
dst=291.98.1.1
dpt=80
deviceDirection=1
request=http://mobile.orange.fr/
act=passed
cn1Label=Risk_Score
cn1=0
cs5=- cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=- cs2Label=Protocol

Feb
16 08:46:11|1.0.3|0|passed|0|src=x.x.x.x
spt=55845
dst=199.11.1.1
dpt=80
deviceDirection=1
request=http://www.terracotta.org/kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433...
act=passed
cn1Label=Risk_Score
cn1=0
cs5=- cs5Label=Malware_Type
cs1=-
cs1Label=Category
cs2=- cs2Label=Protocol

And URL = request
URL2 = request with long url as in the 3rd sample above

Can I have just one field which could include both type of URLs?
The URL2 regex is ^(?:[^=\n]*=){6}(?P[^ ]+)

Thanks

0 Karma

sunitachan
New Member

Hello all,
I actually used the built in field extraction tool to parse this particular field, but the issue now I see is that the field extraction is applied to all other URLs which are not this long. So I have:
URL
URL2

I want to only apply this field extraction to URL2..

Any suggestion please?
Thanks

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...