Getting Data In

How to edit my props and transforms to target certain web page logs using REGEX for URL strings and forward to a certain index?

cord_thomas
Explorer

Hi

We are looking at Splunk as way to log specific activities on our website.

I think in writing this, I see what i am missing, but cannot figure out where to place it.

From our webserver, we have a universal forwarder sending access_log to Splunk index, "newweb"

I have system/local/transforms.conf with 3 stanzas like - you will see i have attempted two different versions of filtering - the string patterns in the REGEX are parts of URLs we are interested in...

Currently, no requests are being filtered. We want only those urls in the two setparsing stanzas to make it into the index.

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing1]
REGEX = GET\s/about/people/cord_thomas
DEST_KEY = queue
FORMAT = indexQueue

[setparsing2]
REGEX = about/history/company_at_a_glance
DEST_KEY = queue
FORMAT = indexQueue

I have system/local/props.conf with a single stanza:

[apache_log]
TRANSFORMS-set=setnull,setparsing1, setparsing

I see that in no way am I telling splunk to apply the transforms to my index, newweb. Where might i do that?

Thank you

0 Karma
1 Solution

maciep
Champion

I'm not positive but it looks like you're applying the setnull transform first. And since the regex for that stanza is "." then it will match everything...so everything is getting sent to the nullQueue (i.e. getting trashed). Maybe try putting it last in the list? Or possibly in another setting below the first. Maybe something like this?

[apache_log]
TRANSFORMS-set1=setparsing1, setparsing
TRANSFORMS-set2=setnull

Also, you may still need to escape the slashes (/) in the regex for your other stanzas with a backslash. For example:

about\/history\/company_at_a_glance

I recommend going to a site like regex101 to verify that your regex matches your data first.

Hope that helps a little.

View solution in original post

maciep
Champion

I'm not positive but it looks like you're applying the setnull transform first. And since the regex for that stanza is "." then it will match everything...so everything is getting sent to the nullQueue (i.e. getting trashed). Maybe try putting it last in the list? Or possibly in another setting below the first. Maybe something like this?

[apache_log]
TRANSFORMS-set1=setparsing1, setparsing
TRANSFORMS-set2=setnull

Also, you may still need to escape the slashes (/) in the regex for your other stanzas with a backslash. For example:

about\/history\/company_at_a_glance

I recommend going to a site like regex101 to verify that your regex matches your data first.

Hope that helps a little.

cord_thomas
Explorer

Thank you both - each of you contributed a part of the answer.

I got it working. It was a combination of needing to understand the role of the sourcetype in the inputs.conf and then getting the right regular expression. In the end, my regex looks like this:

REGEX = "GETs/pubs/research_reports/RR604.html HTTP/1.1"

somesoni2
SplunkTrust
SplunkTrust

YOu should have transforms.conf and props.conf on the Indexers instance (a restart/reload will be required after adding). On Universal forwarder side, update inputs.conf entry for this file to specify the sourcetype as [apache_log].

cord_thomas
Explorer

Thank you - you added a bit of insight and I feel I am closer.

Now, i am not seeing ANY data being added to the index.

using tcpdump, i am certain the universal fowarder is still forwarding

i did as you suggest and set sourcetype=apache_log to match what is in props.conf

I wonder how whether possibly my regular expression is wrong. thoughts on ways to validate that? I don't see anything in the splunkd.log file indicating any problems with my conf file, but that may be the case.

As a concrete example, i have a log entry of:

... host.domain.org - [11/Mar/2015:17:22:54 -0400] "GET /pubs/perspectives/PE113.html HTTP/1.1" 200 13968 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.76 Safari/537.36" "c4kaqS2x=1; ic=; __utmz=145273911.1421782255.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); lira-8001-PORTAL-PSJSESSION ...

And I have this as an entry in transforms.conf

[setparsing4]
REGEX = /pubs/perspectives/PE113
DEST_KEY = queue
FORMAT = indexQueue

i also tried (with escape slashes which don't appear to show up here):
REGEX = GET\s\/pubs\/perspectives\/PE113
and
\/pubs\/perspectives\/PE113

still no luck. Ways to debug indexer?

0 Karma

cord_thomas
Explorer

And to be clear somesoni2, i was not being sarcastic, i feel I am headed in the right direction...

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...