Splunk Search

Regex across two lines

Branden
Builder

I have the following output:

DEV#:    0    DEVICE NAME: vpath0    TYPE: 2107900    POLICY: Optimized
SERIAL: 123bac
=======================================================================
Path#         Adapter/Hard Disk       State    Mode      Select  Errors
    0            fscsi0/hidsk22       Open     NORMAL    123456       0
    1            fscsi0/hidsk29       Open     NORMAL    456789       0

I would like to extract four fields from this: The "path" numbers (in this case 0 and 1). Fields should be named path0 and path1. The "select" values (in this case 123456 and 456789). Fields should be named select0 and select1.

I can't figure out how to get a regex to separate two lines and create the field extraction for me.

My ultimate goal is to be able to compare the two select fields against a common vpath (DEVICE NAME).

Is this possible?

Thanks!

Tags (2)
0 Karma
1 Solution

Lowell
Super Champion

You may have some luck with the multikv command.

You could do something like this:

sourcetype=your_source_type | rex "^DEV#:\s+(?<dev_no>\d+)\s+DEVICE NAME:\s+(?<device_name>\S+)\s+TYPE:\s+(?<type>\d+)\s+POLICY:\s+(?<policy>\S+)" | rex "SERIAL:\s+(?<serial>\S+)" | multikv | stats list(Select), list(Disk), sum(Errors) by device_name, serial

The stats operation is pretty bogus at this point; it's mostly just demoing which fields you have after the multikv command.

Some of the column names are less than ideal, but you can always rename them if you really need to.

Basically the multikv search command looks for a header line (the 4th line in your example) and then it looks for fixed-width rows beneath that. In you case you have two rows of data, and each row will be transformed into it's own event. (This is why it's important to extract the top-level fields (like dev_no, device_name, serial ...) prior to use the multikv command. Because after you call multikv everything but the individual "row" is removed from your raw event. But all the fields are kept.)

So if you look at the fields that exist after the multikv command, you'll see that the "Path#" column gets named "Path_" (because "#" is not valid in afield name, so it's replaced with a "_"). In the case of the next column, it's called just "Disk" (looks like it is just dropping off the "Adapter/Hard" portion prefix for whatever reason, due to spaces I guess--like I said, it's kind of kludgey command). The remaining columns (State, Mode, Select, and Errors) are all very straight forward to see and the fields are named exactly as the column names appear in the text.

So essentially you are now looking at multiple events. So instead of having "path0" and "path1" as you originally talked about, you will now have a single field called "Path_" and the first event will have the value "0" and the second will have the value "1". So how you combine these back together will be completely determined by what you are trying do with your data. You can recombine your events using stats or transaction, but without a specific example of how you would like to interact with your fields, it's hard to give a usable example. If you never want to be able to deal with your fields individually like this, then perhaps the mulit-line regex approach is the best for you.

If you're still struggling with figuring out how all of this works. You may find it helpful to recreate the search I've shown above one search command at a time while looking one event at a time. (Sometimes just simplifying the problem into it's smallest parts will help you see what's going on.) If you're very new to splunk, then the whole thing can seem like voodoo (I've been there), I suggest just taking it one step at a time and eventually it will all make sense.

View solution in original post

Lowell
Super Champion

You may have some luck with the multikv command.

You could do something like this:

sourcetype=your_source_type | rex "^DEV#:\s+(?<dev_no>\d+)\s+DEVICE NAME:\s+(?<device_name>\S+)\s+TYPE:\s+(?<type>\d+)\s+POLICY:\s+(?<policy>\S+)" | rex "SERIAL:\s+(?<serial>\S+)" | multikv | stats list(Select), list(Disk), sum(Errors) by device_name, serial

The stats operation is pretty bogus at this point; it's mostly just demoing which fields you have after the multikv command.

Some of the column names are less than ideal, but you can always rename them if you really need to.

Basically the multikv search command looks for a header line (the 4th line in your example) and then it looks for fixed-width rows beneath that. In you case you have two rows of data, and each row will be transformed into it's own event. (This is why it's important to extract the top-level fields (like dev_no, device_name, serial ...) prior to use the multikv command. Because after you call multikv everything but the individual "row" is removed from your raw event. But all the fields are kept.)

So if you look at the fields that exist after the multikv command, you'll see that the "Path#" column gets named "Path_" (because "#" is not valid in afield name, so it's replaced with a "_"). In the case of the next column, it's called just "Disk" (looks like it is just dropping off the "Adapter/Hard" portion prefix for whatever reason, due to spaces I guess--like I said, it's kind of kludgey command). The remaining columns (State, Mode, Select, and Errors) are all very straight forward to see and the fields are named exactly as the column names appear in the text.

So essentially you are now looking at multiple events. So instead of having "path0" and "path1" as you originally talked about, you will now have a single field called "Path_" and the first event will have the value "0" and the second will have the value "1". So how you combine these back together will be completely determined by what you are trying do with your data. You can recombine your events using stats or transaction, but without a specific example of how you would like to interact with your fields, it's hard to give a usable example. If you never want to be able to deal with your fields individually like this, then perhaps the mulit-line regex approach is the best for you.

If you're still struggling with figuring out how all of this works. You may find it helpful to recreate the search I've shown above one search command at a time while looking one event at a time. (Sometimes just simplifying the problem into it's smallest parts will help you see what's going on.) If you're very new to splunk, then the whole thing can seem like voodoo (I've been there), I suggest just taking it one step at a time and eventually it will all make sense.

Lowell
Super Champion

Yeah, that sounds doable. Best of luck! Glad I could help.

0 Karma

Branden
Builder

I get it now. 🙂 I really appreciate you taking the time to explain it. My next goal is to take the two "select" values and determine the difference between them. If the difference is greater than 20%, I need an alert. Sounds reasonable, right? I'm going to try to tackle that one on my own for now. Sounds like a good challenge. 🙂

0 Karma

Lowell
Super Champion

I've added some additional explanation; hopefully this will help.

0 Karma

Branden
Builder

Okay, I'll just be brutally honest here: Your example worked great. But I don't understand WHY it worked. 🙂
I mean it extracted the Select field... But I don't see how/where you extracted it in your code. I'll have to look at it more closely.
One more question: using that, how I pick out/separate the two different "selects"? Suppose I want to take the difference of the two values or something like that... how are they called?
Thank you so much, you both have been great. 🙂

0 Karma

Lowell
Super Champion

Yeah, 'multikv' can be intimating at first. I generally stay away from it as much as possible myself, but there are times where it is the most direct option; and your given example is the classic use case that multikv. I've updated my answer to include an example search, hope it helps.

0 Karma

Branden
Builder

I looked at multikv, but even after reading the document I don't understand how to apply it in this case. From what I read, it seems to assume the fields have already been defined.

0 Karma

Brian_Osburn
Builder

Try this regex:

\s+(?P<path0>[\d]+)\s+\S+\s+\S+\s+\S+\s+(?P<select0>[\d]+)\s+[\d]+\n\s+(?P<path1>[\d]+)\s+\S+\s+\S+\s+\S+\s+(?P<select1>[\d]+)\s+[\d]+

Note, I'm assuming that there's not going to be more then 2 paths specified..

We could also capture more information if necessary...

Brian

Branden
Builder

Thank you for the response. Strange that it works on your end but not mine. I tried a few variants too with no success, including just extracting the fields from just one of the two lines. Still doesn't work... strange.

0 Karma

Brian_Osburn
Builder

Hrrm.. It worked on my machine. Granted, I'm using the same test files I did for your other answer.

I am treating the output as one big string, using the \n (newline) as a delimiter for the lines.

Lowell's answer is more elegant then my regex happy self.

0 Karma

Branden
Builder

Hmm... Didn't seem to work. It didn't error, but I don't see the fields extracted.
So are you basically treating the last two lines of the output as one big string?
(And, no, there will never be more than 2 paths specified. Not unless we redo our entire SAN configuration. :-))

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...