Splunk Search

How to extract these fields from my sample data?

anoopambli
Communicator

I have raw data like this,

09:00:06 08/01/2016 good    TSMONW46PRDV    [TSMONW46PRDV][AP] Disk Space   Disk/File System/[C]/percent full=45.745, Disk/File System/[E]/percent full=34.595  

I want to extract field from this so that I can get result like this:

[C]/percent full=45.745
[E]/percent full=34.595

What is the best suited option for this? eval or regex? Any help is really appreciated.

1 Solution

skoelpin
SplunkTrust
SplunkTrust

Here's the regex for the C percent full.. This will only extract the numbers, so it will look like this..

C_Full = 45.745

(?P<C_Full>(?<=C\]\/percent\sfull\=)\d{2}\.\d+)

Here's the regex for the E percent full

(?P<E_Full>(?<=E\]\/percent\sfull\=)\d{2}\.\d+)

View solution in original post

sundareshr
Legend

Like this

... | rex max_match=0 "File System\/(?<drive>[^,]+)" | mvexpand drive | ...

skoelpin
SplunkTrust
SplunkTrust

Here's the regex for the C percent full.. This will only extract the numbers, so it will look like this..

C_Full = 45.745

(?P<C_Full>(?<=C\]\/percent\sfull\=)\d{2}\.\d+)

Here's the regex for the E percent full

(?P<E_Full>(?<=E\]\/percent\sfull\=)\d{2}\.\d+)

sloshburch
Splunk Employee
Splunk Employee

Good call pulling out just the value!

Masa
Splunk Employee
Splunk Employee

One note. It is rare Splunk needs lookbehind or lookahead of regex. And, lookahead and lookbhind are more expensive in resource usage. So, if you do not need to use them, you would like to avoid using them.

C_Full = 45.745

(?P<C_Full>(?<=C\]\/percent\sfull\=)\d{2}\.\d+)

Assuming the number could be bigger than 100 🙂
could be;

"C\]/percent\s+full=(?P<C_Full>\d{2,}\.\d+)" 

So,

| rex  "C\]/percent\s+full=(?P<C_Full>\d{2,}\.\d+)" 
0 Karma

skoelpin
SplunkTrust
SplunkTrust

I'm not sure where you read this, but this is not true. Lookaheads/lookbehinds can be used if needed with little impact to search performance.. Obviously there are exceptions to this rule, such as indexing a massive amount of data in a short period of time.. So, it could potentially be an issue in some circumstances, but this case, I doubt it.. I actually posted a question about this last year

So in my limited experience, deploying one lookahead was unnoticeable

https://answers.splunk.com/answers/294477/will-lookaheadslookbehinds-hurt-search-performance.html

Also, why create a regular expression to account for disk usage greater than 100%? It's not needed

0 Karma

skoelpin
SplunkTrust
SplunkTrust

I just tested this by creating a regular expression with a lookbehind then ran a search in verbose mode, I then inspected the job and it took 44.368 seconds. I then modified that extraction by removing the lookbehind and that same exact search took 44.329 seconds, so the lookbehind was 39ms slower which is insignificant..

0 Karma

Masa
Splunk Employee
Splunk Employee

Thanks skoelpin for the info.

I was talking about general regex cost, and if no need to use lookahead/lookbehind, that's better. Yes, scalability is in my concern. Indexing performance with lookahead/lookbehind with 1MB each event.
More like, why you suggest to use lookahead/lookbehind when you do not need to use them.

I'm fine with using lookahead/lookbehiind for this specific splunk answer. That's why up up-voted this before I added my comment. My comment is just a suggestion. If you think that's wrong. That's fine with me, too.

skoelpin
SplunkTrust
SplunkTrust

Wow those are big events!!

Mine are 1-2 KB's each, so I could see how lookbehinds could potentially be an issue for you

0 Karma

anoopambli
Communicator

I am able to get C_drive and the related value correctly in a field. But how do I get E_drive also with the same rex command? I may have couple of other servers which has more drives, how do i dynamically get the drive info with a regex?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You could create one field with many values or you could create many fields with one value, it's all preference.

1 field with many values would look like this

Where Drive will be your field.. The advantages of this would be, you only have 1 field.. The disadvantages are that it could be difficult to isolate one drive when querying, such as when using ... | stats count by

Drive = [C]/percent full=45.745
Drive = [E]/percent full=34.595

Many fields with 1 value would look like this

C_Drive = 45.745
D_Drive= 34.595

The advantage of this would be that it's super easy to manipulate the fields in your searching. So if you only wanted to see the drive space on a single host, your search would look like this

index=foo hostname=anoopambli C_Drive="*" OR D_Drive="*"

So depending on what route you want to go, I can help build your regular expression.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

@anoopambli , was this able to help you? If so, could you accept the answer?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Also, if you want to get really good with regular expressions then you should check out www.regex101.com and play around. Once you get familiar with Lookaheads and Lookbehinds then it's pretty straight forward

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...