Blackboard has changed the format of the bb-access-logs to include session information. With the new data the logs are being split into several logs and some are being indexed with the wrong timestamp. Is anyone else working with these logs and successfully indexing them?
Splunk universal forwarder, inputs.conf:
[monitor:///usr/local/blackboard/apps/tomcat/logs/bb-access-log.*.txt]
index=oc_dev
sourcetype=bb-access-log
ignoreOlderThan=1h
Splunk indexer props.conf:
[bb-access-log]
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 85
SHOULD_LINEMERGE = False
TRUNCATE = 25000
LINE_BREAKER = ([\n\r]+)(\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s
EXTRACT-webcampus_access-fields = (?<src_ip>\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b)\s(?<dest_ip>.*?)\s(?<HTTPComponent>.*?)\s(?<duid>.*?)\s(?<dateInternal>\[(.*?)\])\s(?<HTTP>\"(.*?)\")\s(?<http_status_code>.*?)\s(?<unknownDigits2>.*?)\s(?<http_user_agent>\".*?\")\s
Example of possible log formatting:
ip_address ip_address http_code duid [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "browser information" "session information" digit digit
ip_address - http_code duid [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "browser information" "session information" digit digit
ip_address - http_code {unset id} [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "-" "-" digit digit
- - - {unset id} [%d/%b/%Y:%H:%M:%S %z] "http activity" digit digit "-" "-" digit digit
Example of possible logs:
1.1.1.1 1.1.1.1 http-nio-8081-exec-3 _1_2 [06/Oct/2015:13:58:22 -0700] "POST /webapps/bb-social-learning-BBLEARN/dwr_open/call/plaincall/ToolActivityService.getActivityForAllTools.dwr HTTP/1.1" 200 116 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36" "JSESSIONID=123456789123456879; BIGipServer~Part-Apps~blackboard-8081=155738122.37151.0000; cookies_enabled=yes; JSESSIONID=123456789123456879; web_client_cache_guid=57a5cf06-3530-48d4-a15f-24f7b7ad0c86; __utma=180787233.108259193.1444163123.1444163123.1444163123.1; __utmb=180787233.17.10.1444163123; __utmc=180787233; __utmz=180787233.1444163123.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); session_id=123456789123456879; s_session_id=123456789123456879" 14 116
1.1.1.1 - http-nio-8081-exec-7 {unset id} [06/Oct/2015:14:45:52 -0700] "GET / " 200 975 "-" "-" 78 975
1.1.1.1 1.1.1.1 http-nio-8081-exec-39 _1_2 [06/Oct/2015:14:45:50 -0700] "POST /webapps/blackboard/dwr_open/call/plaincall/ToolActivityService.getActivityForAllTools.dwr HTTP/1.1" 200 116 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "JSESSIONID=123456789123456879; AMCV_774C31DD5342CAF40A490D44%40AdobeOrg=793872103%7CMCIDTS%7C16709%7CMCMID%7C54023606677090264832941195089008334173%7CMCAAMLH-1444243530%7C7%7CMCAAMB-1444243530%7CNRX38WO0n5BH8Th-nqAG_A%7CMCAID%7CNONE; s_vnum=1446231162345%26vn%3D1; optimizelySegments=%7B%22183805944%22%3A%22none%22%2C%22183871552%22%3A%22direct%22%2C%22183916464%22%3A%22false%22%2C%22184368091%22%3A%22gc%22%7D; optimizelyEndUserId=123456789123456879.123456789123456879; optimizelyBuckets=%7B%7D; s_fid=123456789123456879-1F5618C69F6F5C95; mbox=session#1443638730148-106896#1443641446|PC#1443638730148-106896.17_60#1444849186; _ga=GA1.2.1929637067.1443278384; BIGipServer~Part-Apps~blackboard-8081=155738122.37151.0000; xythosdrive=0; web_client_cache_guid=6251af67-d5e0-4752-a2f7-0a05da0b8839; __utma=184114681.1929637067.1443278384.1443715769.1444002370.6; __utmc=184114681; __utmz=184114681.1443278384.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); JSESSIONID=123456789123456879; __utma=180787233.1929637067.1443278384.1444154819.1444159011.35; __utmc=180787233; __utmz=180787233.1443320404.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); session_id=123456789123456879; s_session_id=123456789123456879" 3 116
Original post did not update successfully.
Actual solution - The issue was Blackboard does not write complete events but instead queues data, writes, pauses to queue again and writes. Splunk support advised - As Splunk's tailing process can see the pause and interprets the complete event has been written and then tries to ingest the data. A majority of the time the data is written as complete events but not always. Easy way to check for bb-access-log accuracy is have search look if there are logs that do not contact source ip (e.g. NOT src_ip=*)
To resolve add time_before_close to forwarder inputs.
time_before_close = 30
Doc: http://docs.splunk.com/Documentation/Splunk/6.2.6/Admin/Inputsconf
time_before_close = integer
Actual solution - The issue was Blackboard does not write complete events but instead queues data, writes, pauses to queue again and writes. Splunk support advised - As Splunk's tailing process can see the pause and interprets the complete event has been written and then tries to ingest the data. A majority of the time the data is written as complete events but not always. Easy way to check for bb-access-log accuracy is have search look if there are logs that do not contact source ip (e.g. NOT src_ip=*)
To resolve add time_before_close to forwarder inputs.
time_before_close = 30
Doc: http://docs.splunk.com/Documentation/Splunk/6.2.6/Admin/Inputsconf
time_before_close = integer
Found the issue - The logs are being written in incomplete chunks by blackboard. Splunk is indexing logs as it sees it.
Hard learned lesson to tail your files to see how they are being written.
I agree with @ssievert. Try this:
[bb-access-log]
TIME_PREFIX=^[^\[]+\[
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 26
SHOULD_LINEMERGE = false
TRUNCATE = 25000
# LINE_BREAKER <- leave as default
EXTRACT-webcampus_access-fields = (?<src_ip>\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b)\s(?<dest_ip>.*?)\s(?<HTTPComponent>.*?)\s(?<duid>.*?)\s(?<dateInternal>\[(.*?)\])\s(?<HTTP>\"(.*?)\")\s(?<http_status_code>.*?)\s(?<unknownDigits2>.*?)\s(?<http_user_agent>\".*?\")\s
I don't know the format of these logs, but aren't the log entries one event per line anyway? In other words, wouldn't the default line breaking on \r\n work just fine?
My original confs did not contain line breaker and the issue started.
Other LINE_BREAKER regex tried:
()\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}
(\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s
Updated
LINE_BREAKER = (\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)\s((\b(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[1-9])){3}\b)|\-)\s
We've tried variations of the regex including
()\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}