I have a report that groups webpage request by from an IIS log by SC_STATUS. The results are really bad because splunk appears to be getting confused on what line and what part of a line it's reading, resulting in data like "myurl.com" showing up where "200" for sc_status should be.
I have Splunk set up to monitor the folder where log files are stored in real time and I manually selected IIS logs when identifying the format of the files.
This is what Splunk has stored for one request:
2015-12-30 15:06:54 W3SVC3 MYWEBSERVER 192.111.11.11 GET /App_Themes/Blue/Blue.css - 80 - 54.69.58.243 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) stuff_id=stuff;+user=stuff;+persistcookie=True;+stuffSelection=STUFF1,STUFF2,STUFF3,STUFF4,STUFF5,;+MYWEBSITE=R285025761;+ASP.NET_SessionId=3sgbsssgrvbwizta31fcynmx;+MyWebSite.ASPXAUTH=D2E24F7A75F2114DCF6AFB5DA65C739A2972D39870A74C1735EF0B3A819F27D5E743DE70EB6C5D7ADF944507DA71042D235483889FEA3A736EFBA2E81AB02F47A08BA93D51C6563422CE17055236EA5BBDCC03A03B4389CE042ADDFB89AA7A7D6C7246376DB20045AD709BE50444332F048A79BD65269C0919B0A5ADA4EE415EE1E96BCFBF3D5D33507D663A5671DE9E https://m5.0+(Macintosh;+Intel+Mac+OS+X+10_10)+AppleWebKit/600.1.25+(KHTML,+like+Gecko)+Version/8.0+... MYWEBSITE=R285025761;+ASP.NET_SessionId=o2hgz2wa34vj2v0i2c5zdmis https://mywebsite.thisisawesome.com/Logon.aspx?ReturnUrl=%2f mywebsite.thisisawesome 200 0 0 24916 515 31
This request appears to be a mashup of two or more requests:
Part 1: 2015-12-30 15:06:54 W3SVC3 MYWEBSERVER 192.111.11.11 GET /App_Themes/Blue/Blue.css - 80 - 54.69.58.243 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+Trident/7.0;+rv:11.0)+like+Gecko stuff_id=stuff;+user=stuff;+persistcookie=True;+datalistSelection=OFAC,PEP_FO,;+MYWEBSITE=R285025761;+ASP.NET_SessionId=ykvwd2cgbhjcjck45jcy1w13 https://mywebsite.thisisawesome.com/logon.aspx mywebsite.thisisawesome.com 304 0 0 92 593 62
Part 2: 2015-12-30 15:06:38 W3SVC3 MYWEBSERVER 192.111.11.11 GET /Includes/jquery-1.4.2.min.js - 80 - 209.15.236.88 HTTP/1.1 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_10)+AppleWebKit/600.1.25+(KHTML,+like+Gecko)+Version/8.0+Safari/600.1.25 MYWEBSITE=R285025761;+ASP.NET_SessionId=o2hgz2wa34vj2v0i2c5zdmis https://mywebsite.thisisawesome.com/Logon.aspx?ReturnUrl=%2f mywebsite.thisisawesome.com 200 0 0 24916 515 31
and part of another request in the middle.
I can see at least one place where the lines were mashed together. In this snippit, "5671DE9E https://m5.0+(Macintosh;+Int", you can see "https://m" is part of a URL and "5.0+" is part of a user agent but they're put together without a space as if they're one field.
Other than that, I'm not sure where the data is coming from in the log file to put that one request together in Splunk.
My question is, how do I get Splunk to read my IIS logs properly and not mash up multiple lines into one line?
Thanks!
It appears you have a line merge / line breaker problem. You'll want to check your inputs.conf for the sourcetype you're using to consume these logs, then you'll want to match that up to your props.conf to see if SHOULD_LINEMERGE = false, and configure a line breaker... looks like date will be best.
inputs.conf:
[<input stanza>]
...
sourcetype=sourcetypeName
props.conf:
[sourcetypeName]
...
SHOULD_LINEMERGE=false
LINE_BREAKER=\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}
Check the docs for reference. If you're sending from universal forwarder, you'll need to put props on the forwarder.
http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf
I believe I should be looking for sourcetype of "iis" since that what I've configured the data input as.
In the inputs.conf file, I do not see anything for iis so I'm not sure if any changes are necessary.
I see this in the props.conf file. SHOULD_LINEMERGE is already set to false.
[iis]
pulldown_type = true
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE = False
INDEXED_EXTRACTIONS = w3c
detect_trailing_nulls = auto
category = Web
description = W3C Extended log format produced by the Microsoft Internet Information Services (IIS) web server
I'll add this below description: LINE_BREAKER=\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}
... and see what happens.
make it a lowercase false, but yeah you gotta have LINE_BREAKER or MUST_BREAK_BEFORE ONLY_BREAK_AFTER ONLY_BREAK_BEFORE etc if you set SHOULD_LINEMERGE=false.
The inputs.conf will be located on the forwarder on the IIS servers or wherever splunk is reading the log files from.
You can run $splunk_home$/bin/splunk cmd btool inputs list --debug to see what inputs.conf stanzas are loaded and what app their loaded from.