Hi,
I'll cut straight to the chase. I have a sourcetype that contains 2 log sources. Both are broken correctly using the props entry
TIME_PREFIX = ^
TIME_FORMAT= %Y-%m-%dT%H:%M:%S%:z
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE_DATE = true
LINE_BREAKER = ([\r\n]+)
TRUNCATE = 999999
TRANSFORMS-changeSourcetype1 = psm-set-sourcetype, asm-set-sourcetype
However, one of the sources contains a lot of visible EOL terminators source_NR=NR\r\n\r\n
. It is now required for the visible EOL terminators to be parsed as actual EOLs.
I've tried to apply various types of multi regex on the LINE_BREAKER to no avail. From what I've read, it is possible, but anything I try fails and breaks any line breaking.
A few things I've tried:
([\r\n]+)|([\\r\\n]+)
([\r\n]+)|\\r\\n
([\r\n]+)(\\r)(\\n)
The list goes on.
Any advice would be greatly appreciated.
Cheers
Steve
This is the raw event:
2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\nHost: <URL>\r\nCookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\nAccept-Encoding: gzip, deflate\r\n\r\n"#015
This is how I thought the LINE_BREAKER would have changed it:
2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\n
Host: <URL>\r\n
Cookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\n
Accept-Encoding: gzip, deflate\r\n\r\n"#015
But I'm starting to lean towards using transforms to replace the \r\n
so that the whole event is standardised?
This is the raw event:
2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\nHost: <URL>\r\nCookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\nAccept-Encoding: gzip, deflate\r\n\r\n"#015
This is how I thought the LINE_BREAKER would have changed it:
2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\n
Host: <URL>\r\n
Cookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\n
Accept-Encoding: gzip, deflate\r\n\r\n"#015
But I'm starting to lean towards using transforms to replace the \r\n
so that the whole event is standardised?
Hm. I'd also suggest replacing those \r\n with an actual linebreak. Have a look here and see if it works for you.
nice 🙂 Thanks jeffland. very much appreciated.
Just for completeness 🙂
My props stanza is:
[f5]
TIME_PREFIX = ^
TIME_FORMAT= %Y-%m-%dT%H:%M:%S%:z
BREAK_ONLY_BEFORE_DATE = True
LINE_BREAKER = ([\r\n\$])
TRUNCATE = 999999
TRANSFORMS-changeSourcetype1 = psm-set-sourcetype, asm-set-sourcetype
SEDCMD-newline = s/\\r\\n/,/g
SEDCMD-eventend = s/#015//g
So now all the fields are correctly extracted and the annoying #015 is removed. Plus the other source is untouched.
2015-04-21T10:51:26+01:00 <> ASM: unit_hostname="<>",management_ip_address="<>",http_class_name="/Common/pl_restricted_L0_prod",web_application_name="/Common/pl_restricted_L0_prod",policy_name="/Common/pl_restricted_L0_prod",policy_apply_date="2015-04-20 21:44:42",violations="Attack signature detected",support_id="16995741371937106148",request_status="blocked",response_code="0",ip_client="46.201.133.82",route_domain="0",method="GET",protocol="HTTP",query_string="",x_forwarded_for_header_value="N/A",sig_ids="300000002",sig_names="parimatchru",date_time="2015-04-21 10:51:25",severity="Error",attack_type="Abuse of Functionality",geo_location="UA",ip_address_intelligence="N/A",username="N/A",session_id="8f08ae0f2fbd5d82",src_port="55263",dest_port="80",dest_ip="<>",sub_violations="",virus_name="N/A",uri="/bet/ru",request="GET /bet/ru HTTP/1.1\r\nHost: sports.whgaming.com\r\nConnection: keep-alive\r\nAccept: image/webp,/;q=0.8\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1717.129 Amigo/32.0.1717.129 MRCHROME SOC Safari/537.36\r\nReferer: http://start.parimatchru.com/bonusnew/?btag=a_3615b_234c_231947&id=231947\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4\r\nCookie: banner_click=aleshasavin,NA,NA,NA,admap:159955966FE625989E443CA9CEA4BE36CCBBFCB%3Bsource:[var1]%3Bzone:1487412695%3Bchannel:185050786; clickinfo=pid=185050786&bid=1487412695; vars_info=; source_NR=NR\r\n\r\n"#015
Hi Jeffland,
The above is an event that we want to break down. So for each \r\n
we require the following line on a new line. The alternative I could try is to setup a SEDCMD in transforms and replace each \r\n
with a ,
. This I believe would also fix the auto field extraction.
Cheers
Steve
I have the feeling that your event text was somehow corrupted when you posted it. Could you post it as a text file, or as code? There are some "rn" in there, also one with backslashes, but I doubt this is what you wanted to post.
As for your linebreaker, the places you define there will lead to an "event break", i.e. every time the regex fits your data there will be a new event. That's why I doubt you can achieve what you need with the line breaker. But I still haven't fully understood what you need your event to look like. Do you want splunk to display a line break when it shows the events as returned from a search?
I'll have to post it as an "answer" as the comment box won't allow the volume of text.
I haven't fully understood what behavior you need. New events are supposed to begin just like they did until now, but inside of them you need linebreaks (i.e. there need to be new lines at the beginning of an event)?