Splunk Search

Multiple LINE_BREAKER regex

cdstealer
Contributor

Hi,
I'll cut straight to the chase. I have a sourcetype that contains 2 log sources. Both are broken correctly using the props entry

TIME_PREFIX = ^
TIME_FORMAT= %Y-%m-%dT%H:%M:%S%:z
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE_DATE = true
LINE_BREAKER = ([\r\n]+)
TRUNCATE = 999999
TRANSFORMS-changeSourcetype1 = psm-set-sourcetype, asm-set-sourcetype

However, one of the sources contains a lot of visible EOL terminators source_NR=NR\r\n\r\n. It is now required for the visible EOL terminators to be parsed as actual EOLs.

I've tried to apply various types of multi regex on the LINE_BREAKER to no avail. From what I've read, it is possible, but anything I try fails and breaks any line breaking.

A few things I've tried:

([\r\n]+)|([\\r\\n]+)
([\r\n]+)|\\r\\n
([\r\n]+)(\\r)(\\n)

The list goes on.

Any advice would be greatly appreciated.

Cheers
Steve

0 Karma
1 Solution

cdstealer
Contributor

This is the raw event:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\nHost: <URL>\r\nCookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\nAccept-Encoding: gzip, deflate\r\n\r\n"#015

This is how I thought the LINE_BREAKER would have changed it:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\n
Host: <URL>\r\n
Cookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\n
Accept-Encoding: gzip, deflate\r\n\r\n"#015

But I'm starting to lean towards using transforms to replace the \r\n so that the whole event is standardised?

View solution in original post

0 Karma

cdstealer
Contributor

This is the raw event:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\nHost: <URL>\r\nCookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\nAccept-Encoding: gzip, deflate\r\n\r\n"#015

This is how I thought the LINE_BREAKER would have changed it:

2015-04-21T12:55:25+01:00 <hostname> ASM: unit_hostname="<hostname>",management_ip_address="<IP>",http_class_name="/Common/pl_sports_com_L1_prod",web_application_name="/Common/pl_sports_com_L1_prod",policy_name="/Common/pl_sports_com_L1_prod",policy_apply_date="2015-04-20 22:59:53",violations="Web scraping detected",support_id="16995741371944062892",request_status="blocked",response_code="0",ip_client="185.17.184.228",route_domain="0",method="GET",protocol="HTTPS",query_string="action=event&ev_id=7447953&version=1",x_forwarded_for_header_value="N/A",sig_ids="",sig_names="",date_time="2015-04-21 12:55:25",severity="Error",attack_type="Web Scraping",geo_location="NL",ip_address_intelligence="N/A",username="N/A",session_id="a27f9feb0b622a04",src_port="40567",dest_port="443",dest_ip="<IP>",sub_violations="",virus_name="N/A",uri="/bir_xml",request="GET /bir_xml?action=event&ev_id=7447953&version=1 HTTP/1.1\r\n
Host: <URL>\r\n
Cookie: TS0158e29b=0148840b44c2771c7edfa9b3305f349c56fc28fecb0c11fc5c4b963f7e860ace7b26c42578; TS0197b840=0148840b4416795c008c4e4b7adc1e097f180a2c1e661ae566acbeafbd41be90e94db247b64d197bb6324d0ffd8ba54611a9e1ce03; sitePreference=DESKTOP\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: en-us,en;q=0.5\r\n
Connection: keep-alive\r\n
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0\r\n
Accept-Encoding: gzip, deflate\r\n\r\n"#015

But I'm starting to lean towards using transforms to replace the \r\n so that the whole event is standardised?

0 Karma

jeffland
SplunkTrust
SplunkTrust

Hm. I'd also suggest replacing those \r\n with an actual linebreak. Have a look here and see if it works for you.

cdstealer
Contributor

nice 🙂 Thanks jeffland. very much appreciated.

0 Karma

cdstealer
Contributor

Just for completeness 🙂

My props stanza is:

[f5]
TIME_PREFIX = ^
TIME_FORMAT= %Y-%m-%dT%H:%M:%S%:z
BREAK_ONLY_BEFORE_DATE = True
LINE_BREAKER = ([\r\n\$])
TRUNCATE = 999999
TRANSFORMS-changeSourcetype1 = psm-set-sourcetype, asm-set-sourcetype
SEDCMD-newline = s/\\r\\n/,/g
SEDCMD-eventend = s/#015//g

So now all the fields are correctly extracted and the annoying #015 is removed. Plus the other source is untouched.

cdstealer
Contributor

2015-04-21T10:51:26+01:00 <> ASM: unit_hostname="<>",management_ip_address="<>",http_class_name="/Common/pl_restricted_L0_prod",web_application_name="/Common/pl_restricted_L0_prod",policy_name="/Common/pl_restricted_L0_prod",policy_apply_date="2015-04-20 21:44:42",violations="Attack signature detected",support_id="16995741371937106148",request_status="blocked",response_code="0",ip_client="46.201.133.82",route_domain="0",method="GET",protocol="HTTP",query_string="",x_forwarded_for_header_value="N/A",sig_ids="300000002",sig_names="parimatchru",date_time="2015-04-21 10:51:25",severity="Error",attack_type="Abuse of Functionality",geo_location="UA",ip_address_intelligence="N/A",username="N/A",session_id="8f08ae0f2fbd5d82",src_port="55263",dest_port="80",dest_ip="<>",sub_violations="",virus_name="N/A",uri="/bet/ru",request="GET /bet/ru HTTP/1.1\r\nHost: sports.whgaming.com\r\nConnection: keep-alive\r\nAccept: image/webp,/;q=0.8\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1717.129 Amigo/32.0.1717.129 MRCHROME SOC Safari/537.36\r\nReferer: http://start.parimatchru.com/bonusnew/?btag=a_3615b_234c_231947&id=231947\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4\r\nCookie: banner_click=aleshasavin,NA,NA,NA,admap:159955966FE625989E443CA9CEA4BE36CCBBFCB%3Bsource:[var1]%3Bzone:1487412695%3Bchannel:185050786; clickinfo=pid=185050786&bid=1487412695; vars_info=; source_NR=NR\r\n\r\n"#015

0 Karma

cdstealer
Contributor

Hi Jeffland,
The above is an event that we want to break down. So for each \r\n we require the following line on a new line. The alternative I could try is to setup a SEDCMD in transforms and replace each \r\n with a ,. This I believe would also fix the auto field extraction.

Cheers
Steve

0 Karma

jeffland
SplunkTrust
SplunkTrust

I have the feeling that your event text was somehow corrupted when you posted it. Could you post it as a text file, or as code? There are some "rn" in there, also one with backslashes, but I doubt this is what you wanted to post.
As for your linebreaker, the places you define there will lead to an "event break", i.e. every time the regex fits your data there will be a new event. That's why I doubt you can achieve what you need with the line breaker. But I still haven't fully understood what you need your event to look like. Do you want splunk to display a line break when it shows the events as returned from a search?

0 Karma

cdstealer
Contributor

I'll have to post it as an "answer" as the comment box won't allow the volume of text.

0 Karma

jeffland
SplunkTrust
SplunkTrust

I haven't fully understood what behavior you need. New events are supposed to begin just like they did until now, but inside of them you need linebreaks (i.e. there need to be new lines at the beginning of an event)?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...