All Apps and Add-ons

what does an empty line represent in a regular expression?

royimad
Builder

I am looking to find a character (regular expression) in Splunk that searches for and returns values (from a file) starting with a word (ex.Total) and ending with a new empty line (representing a new paragraph etc..).

An random text chosen from the web:

A Memorandum of Understanding was signed by Total and MOGE on July 9, 1992. In addition to the construction of offshore gas facilities by the partners, a separate company in which PTT-EP, MOGE, and other affiliates of Total and Unocal are investors (the Moattama Gas Transportation Company - MGTC) built a 346-kilometer subsea pipeline to bring the gas to landfall in Myanmar, and a 63-kilometer onshore pipeline, with control and metering units, to carry the gas to the border with Thailand, which purchases most of the field's output under a long-term sales and purchase agreement.

Construction was carried out between fall 1995 and mid-1998, with gas production beginning in July 1998. The total investment outlay was approximately US$1 billion. Further capital expenditure will be requiredduring the field's lifetime to drill additional wells and install compressors. The export production threshold of 525 million cubic feet per day was reached in early 2001.

In this case, the regular expression would return the following:

"Total and MOGE on July 9, 1992. In addition to the construction of offshore gas facilities by the partners, a separate company in which PTT-EP, MOGE, and other affiliates of Total and Unocal are investors (the Moattama Gas Transportation Company - MGTC) built a 346-kilometer subsea pipeline to bring the gas to landfall in Myanmar, and a 63-kilometer onshore pipeline, with control and metering units, to carry the gas to the border with Thailand, which purchases most of the field's output under a long-term sales and purchase agreement"

To make it easier my text doesn't contain 7 consecutive empty spaces, you can look for a new line that contain 7 consecutive spaces at the beginning.

0 Karma
1 Solution

sowings
Splunk Employee
Splunk Employee

I would probably go with "(?ms)(?<capture>Total.*)\n^\n"

I haven't tested it, but the principle is: (?ms) -- use both multiline and single line mode together. This allows . to match any character (including a newline), while allowing ^ and $ to reference beginning and ending of a line (as demarcated with newline characters).

Next, start capturing with the word Total until you find a newline followed by a newline which is itself at the beginning of a line.

The tester at http://gskinner.com/RegExr/ suggests that (?ms)(?<capture>Total.*)^ might be enough.

View solution in original post

rturk
Builder

Hi Royimad,

Have you tried http://www.pythonregex.com/? As Splunk is based on Python, I find this site really useful for testing regular expressions.

Using this, the following regex gives you what you listed above:

A Memorandum of Understanding was signed by (?P<blah>.*)\n\n

The dot (.) won't match newline characters, so bounding the search with two \n's will ensure it breaks on a blank line.

Hope this helps 🙂

sowings
Splunk Employee
Splunk Employee

I would probably go with "(?ms)(?<capture>Total.*)\n^\n"

I haven't tested it, but the principle is: (?ms) -- use both multiline and single line mode together. This allows . to match any character (including a newline), while allowing ^ and $ to reference beginning and ending of a line (as demarcated with newline characters).

Next, start capturing with the word Total until you find a newline followed by a newline which is itself at the beginning of a line.

The tester at http://gskinner.com/RegExr/ suggests that (?ms)(?<capture>Total.*)^ might be enough.

royimad
Builder

do you think there are a reverse regular expression to capture text started by end line and ending with beginning of a character?

Thanks for your help

0 Karma

royimad
Builder

Thanks sowings for (?ms)(?Total.*)^ capturing my text.
How about then having the Total number of lines ?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...