Splunk Dev

Can you help me extract fields from apache:access logs?

mrtolu6
Path Finder

Regex Experts!
Need help in extracting src, http_method, uri_path, status field.

Below is an example of a log with the fields that I would like to extract :

"10.10.10.22 - - [12/Oct/2012:14:22:41 -0400] "GET /etc/team/transport/tRoom?serlet=jpsSSGenerator HTTP/1.1" 200 26494"
src=10.10.10.22, http_method=GET, uri_path= /etc/team/transport/Room?serlet=jpsSSGenerator

This is example of different types of logs that comes from apache access logs. I'm looking for a regex that can extract fields from the example below. Thanks in advance for any help.

example logs

127.0.0.1 - - [104/Oct/2018:11:22:47 -0700] "GET /directory/directory/test?seat=ShowPage&tese=calendar.js&IP=444.444.1.444 HTTP/1.1" 304 - "htttps://testwebsite.com/m/see/yup/Union?selpt=stepReportFilter.jsp" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko"

10.10.10.10 - - [10/Oct/2018:11:22:47 -0700] "POST /nba/nfl/nhl/ufc HTTP/1.1" 200 470 "-" "Mozilla/4.0 (Windows 8.1 6.3) Java/1.2.0_181" "10.10.10.02"

dnsname..cod.blackops.com:80 10.10.10.02 - - [16/Oct/2018:11:22:22 -0700] "GET /scripts/form_registry.js HTTP/1.1" 200 2504 "htttp://10.10.10.03lnba/cruisehtml?&swf_version=ezboard052614_1&serverUrl=110.10.10.03&boardId=19-153970030&isPreview=0&update052109=1" "Mozilla/5.0 (Windows NT 6.1; Trident/2.0; rv:11.0) like Gecko"

10.10.10.02 - - [12/Oct/2018:13:22:41 -0500] "POST /yup/zillow/server.php?a=c7355 HTTP/1.1" 200 - "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/515.2 (KHTML, like Gecko) Chrome/15.0.200.200 Safari/535.2"

10.10.10.02 - - [11/Oct/2018:13:22:41 -0500] "POST /yup/zillow/server.php?a=c7355 HTTP/1.1" 200 - 

10.10.10.22 - - [12/Oct/2012:14:22:41 -0400] "GET /etc/team/transport/tRoom?serlet=jpsSSGenerator HTTP/1.1" 200 26494
Tags (1)
0 Karma
1 Solution

sudosplunk
Motivator

Hi @mrtolu6,

Give this regex a try: your base search | rex field=_raw (?<src>\d+\.\d+\.\d+\.\d+).+\]\s\"(?<http_method>\w+)\s(?<uri_path>.+)\"\s(?<status>\d+)

Tested the regex here1.

View solution in original post

adonio
Ultra Champion

why not use the pre-built sourcetype access_combined?

see here:
https://docs.splunk.com/Documentation/Splunk/7.2.0/Data/Listofpretrainedsourcetypes

0 Karma

sudosplunk
Motivator

Hi @mrtolu6,

Give this regex a try: your base search | rex field=_raw (?<src>\d+\.\d+\.\d+\.\d+).+\]\s\"(?<http_method>\w+)\s(?<uri_path>.+)\"\s(?<status>\d+)

Tested the regex here1.

mrtolu6
Path Finder

that worked but it adds extra details in the uri_path fields. If i wanted to created additional fields called uri_query that would create a new field for anything after the "?", also would like to create a version field forhe HTTP1/1 called version.

For example
10.10.10.04 - - [07/Oct/23:08:30:59 -0400] "POST /OndnForm/drag_Form?images/ HTTP/1.1" 400 226 "-" "Hello, People"

the uri_query=images/
version= HTTP/1.1
src=10.10.10.04
status=400
bytes=226

0 Karma

sudosplunk
Motivator

Try this:

your base search | rex field=_raw "(?<src>\d+\.\d+\.\d+\.\d+).+\]\s\"(?<http_method>\w+)\s(?<uri_path>\S+)\s(?<uri_query>\S+)\"\s(?<status>\d+)\s(?<bytes>[\d-]+)"

Updated regex https://regex101.com/r/CpQ56P/2

0 Karma

mrtolu6
Path Finder

thanks for your help!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...