Splunk Search

Blackboard bb-access-logs not parsing correctly

cyndiback
Path Finder

Blackboard has changed the format of the bb-access-logs to include session information. With the new data the logs are being split into several logs and some are being indexed with the wrong timestamp. Is anyone else working with these logs and successfully indexing them?

Splunk universal forwarder, inputs.conf:

[monitor:///usr/local/blackboard/apps/tomcat/logs/bb-access-log.*.txt]
index=oc_dev
sourcetype=bb-access-log
ignoreOlderThan=1h

Splunk indexer props.conf:

[bb-access-log]
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 85
SHOULD_LINEMERGE = False
TRUNCATE = 25000
LINE_BREAKER = ([\n\r]+)(\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s
EXTRACT-webcampus_access-fields = (?<src_ip>\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b)\s(?<dest_ip>.*?)\s(?<HTTPComponent>.*?)\s(?<duid>.*?)\s(?<dateInternal>\[(.*?)\])\s(?<HTTP>\"(.*?)\")\s(?<http_status_code>.*?)\s(?<unknownDigits2>.*?)\s(?<http_user_agent>\".*?\")\s

Example of possible log formatting:

ip_address ip_address http_code duid [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "browser information" "session information" digit digit
ip_address - http_code duid [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "browser information" "session information" digit digit
ip_address - http_code {unset id} [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "-" "-" digit digit
- - - {unset id} [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "-" "-" digit digit

Example of possible logs:

1.1.1.1 1.1.1.1 http-nio-8081-exec-3 _1_2 [06/Oct/2015:13:58:22 -0700] "POST /webapps/bb-social-learning-BBLEARN/dwr_open/call/plaincall/ToolActivityService.getActivityForAllTools.dwr HTTP/1.1" 200 116 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36" "JSESSIONID=123456789123456879; BIGipServer~Part-Apps~blackboard-8081=155738122.37151.0000; cookies_enabled=yes; JSESSIONID=123456789123456879; web_client_cache_guid=57a5cf06-3530-48d4-a15f-24f7b7ad0c86; __utma=180787233.108259193.1444163123.1444163123.1444163123.1; __utmb=180787233.17.10.1444163123; __utmc=180787233; __utmz=180787233.1444163123.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); session_id=123456789123456879; s_session_id=123456789123456879" 14 116

1.1.1.1 - http-nio-8081-exec-7 {unset id} [06/Oct/2015:14:45:52 -0700] "GET / " 200 975 "-" "-" 78 975

1.1.1.1 1.1.1.1 http-nio-8081-exec-39 _1_2 [06/Oct/2015:14:45:50 -0700] "POST /webapps/blackboard/dwr_open/call/plaincall/ToolActivityService.getActivityForAllTools.dwr HTTP/1.1" 200 116 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "JSESSIONID=123456789123456879; AMCV_774C31DD5342CAF40A490D44%40AdobeOrg=793872103%7CMCIDTS%7C16709%7CMCMID%7C54023606677090264832941195089008334173%7CMCAAMLH-1444243530%7C7%7CMCAAMB-1444243530%7CNRX38WO0n5BH8Th-nqAG_A%7CMCAID%7CNONE; s_vnum=1446231162345%26vn%3D1; optimizelySegments=%7B%22183805944%22%3A%22none%22%2C%22183871552%22%3A%22direct%22%2C%22183916464%22%3A%22false%22%2C%22184368091%22%3A%22gc%22%7D; optimizelyEndUserId=123456789123456879.123456789123456879; optimizelyBuckets=%7B%7D; s_fid=123456789123456879-1F5618C69F6F5C95; mbox=session#1443638730148-106896#1443641446|PC#1443638730148-106896.17_60#1444849186; _ga=GA1.2.1929637067.1443278384; BIGipServer~Part-Apps~blackboard-8081=155738122.37151.0000; xythosdrive=0; web_client_cache_guid=6251af67-d5e0-4752-a2f7-0a05da0b8839; __utma=184114681.1929637067.1443278384.1443715769.1444002370.6; __utmc=184114681; __utmz=184114681.1443278384.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); JSESSIONID=123456789123456879; __utma=180787233.1929637067.1443278384.1444154819.1444159011.35; __utmc=180787233; __utmz=180787233.1443320404.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); session_id=123456789123456879; s_session_id=123456789123456879" 3 116

Original post did not update successfully.

1 Solution

cyndiback
Path Finder

Actual solution - The issue was Blackboard does not write complete events but instead queues data, writes, pauses to queue again and writes. Splunk support advised - As Splunk's tailing process can see the pause and interprets the complete event has been written and then tries to ingest the data. A majority of the time the data is written as complete events but not always. Easy way to check for bb-access-log accuracy is have search look if there are logs that do not contact source ip (e.g. NOT src_ip=*)

To resolve add time_before_close to forwarder inputs.

time_before_close = 30

Doc: http://docs.splunk.com/Documentation/Splunk/6.2.6/Admin/Inputsconf

time_before_close = integer

  • Modtime delta required before Splunk can close a file on EOF.
  • Tells the system not to close files that have been updated in past seconds.
  • Defaults to 3.

View solution in original post

0 Karma

cyndiback
Path Finder

Actual solution - The issue was Blackboard does not write complete events but instead queues data, writes, pauses to queue again and writes. Splunk support advised - As Splunk's tailing process can see the pause and interprets the complete event has been written and then tries to ingest the data. A majority of the time the data is written as complete events but not always. Easy way to check for bb-access-log accuracy is have search look if there are logs that do not contact source ip (e.g. NOT src_ip=*)

To resolve add time_before_close to forwarder inputs.

time_before_close = 30

Doc: http://docs.splunk.com/Documentation/Splunk/6.2.6/Admin/Inputsconf

time_before_close = integer

  • Modtime delta required before Splunk can close a file on EOF.
  • Tells the system not to close files that have been updated in past seconds.
  • Defaults to 3.
0 Karma

cyndiback
Path Finder

Found the issue - The logs are being written in incomplete chunks by blackboard. Splunk is indexing logs as it sees it.
Hard learned lesson to tail your files to see how they are being written.

woodcock
Esteemed Legend

I agree with @ssievert. Try this:

[bb-access-log]
TIME_PREFIX=^[^\[]+\[
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 26
SHOULD_LINEMERGE = false
TRUNCATE = 25000
# LINE_BREAKER <- leave as default
EXTRACT-webcampus_access-fields = (?<src_ip>\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b)\s(?<dest_ip>.*?)\s(?<HTTPComponent>.*?)\s(?<duid>.*?)\s(?<dateInternal>\[(.*?)\])\s(?<HTTP>\"(.*?)\")\s(?<http_status_code>.*?)\s(?<unknownDigits2>.*?)\s(?<http_user_agent>\".*?\")\s
0 Karma

s2_splunk
Splunk Employee
Splunk Employee

I don't know the format of these logs, but aren't the log entries one event per line anyway? In other words, wouldn't the default line breaking on \r\n work just fine?

0 Karma

cyndiback
Path Finder

My original confs did not contain line breaker and the issue started.

0 Karma

cyndiback
Path Finder

Other LINE_BREAKER regex tried:

()\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}
(\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s
0 Karma

cyndiback
Path Finder

Updated

LINE_BREAKER = (\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s

We've tried variations of the regex including

()\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...