Splunk Search

Regex from source

thinksplunk
Engager

if i need to extract "num" from source=c:/documents/app/test1/test12/controlnum34/12.log and tag as field, how to go abt doing it? thks

Tags (2)
0 Karma

thinksplunk
Engager

If i need to extract two fields from below string
"source=/app/cups-drink/test/iron13-machine5a-43machine.log"
The first field name is "item" and value is "cups"
The second field name is "system" and value is "43machine"

0 Karma

dwaddle
SplunkTrust
SplunkTrust

This really isn't an answer, but more of a comment that applies to all of these great solutions. An approach using the rex command will work great. But, if you try to put this into a configuration file as a permanent field extraction ( props.conf or transforms.conf ) and want to use it in a base search, you will probably not get the result you're looking for. The reason for this is when you do a search for something like

sourcetype=mysourcetype myfieldfromsource=123

splunk will look for the token "123" within the raw text of the event - it will not look in the source field.

If you want to extract a regular expression from source and have it searchable as a field name in a base search then you will need to make it an indexed field. Indexed fields are not recommended for a variety of very good reasons, not the least of which is they are must be defined in advance and are very inflexible. But if this is what you need to solve your problem, it is available to you.

0 Karma

lukejadamec
Super Champion

If regex was that easy, then I would have answered.:)

0 Karma

kristian_kolb
Ultra Champion

... | rex field=source "^/[^/]+/(?<animal>[a-zA-Z]+)"

Which means, from the start of the string in the field called source, find a single slash, followed by one or more non-slash characters, followed by a single slash - then take all (but at least one) uppercase or lowercase letters you find, and put them in the field 'animal'.

As you'll find, the field will only contain 'dog' in this scenario, as the dash between 'dog' and 'focus' is not a letter.

You can probably benefit from reading up on regular expressions if you want to make more dynamic extractions.

/K

lukejadamec
Super Champion

How about:

Search | rex field=_raw .*capture(?<NUM>num)34/12.log.*$
0 Karma

kristian_kolb
Ultra Champion

faster 🙂

... | eval num="num" | ...

0 Karma

thinksplunk
Engager

i am trying to extract the word "NUM" from source=c:/documents/app/test1/test12/controlNUM34/12.log.

0 Karma

kristian_kolb
Ultra Champion

You can do field extractions dynamically in the search with the rex command;

your_base_search | rex field=source "your regex with a capture group here"

to capture "34" an put it in a field called num;

your_base_search | rex field=source "(?<num>\d+)/[^/]+$"

which is to be read as, capture one or more digits (and call them num) that are followed by one slash, which is followed by one or more non-slash characters, followed by the end-of-line.

Once you're happy with your regex field extraction, you should probably make it 'permanent' by adding the extraction rule to props.conf as an EXTRACT.

See more here:

http://docs.splunk.com/Documentation/Splunk/5.0.4/Knowledge/Addfieldsatsearchtime
http://docs.splunk.com/Documentation/Splunk/5.0.4/Knowledge/Createandmaintainsearch-timefieldextract...
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Rex

/K

0 Karma

kristian_kolb
Ultra Champion

Given your question here, and in other posts I suggest that you read up on regex in general.

e.g. http://www.regular-expressions.info
http://gskinner.com/RegExr/

In this case (one of) the answer(s) is;

rex field=source "/app/(?<item>[a-z]+)([^/]+/){2}.+(?<system>[^-]+)\.log$

Which is; find '/app/', then take any a-z characters and call them item. Then jump over any non-slash characters followed by a slash, twice. Then skip through any characters, until you find a set of non-dash characters followed by .log at the end of the string. Call these non-dash characters system.

/K

0 Karma

kristian_kolb
Ultra Champion

I'm guessing that you want to extract XXX in the following scenario, where XXX is a string that follows 'control' and 'yy' is one or more digits. Not the literal string 'num', right?

/controlXXXyy/zzz.log

In that case;

rex field=source "/control(?<XXX>[a-zA-Z]+)\d+/[^/]+$"

0 Karma

rturk
Builder

Hi Thinksplunk - can you give a few more samples? Are you trying to extract:

source=c:/documents/app/test1/test12/control*num*34/12.log

or:

source=c:/documents/app/test1/test12/controlnum*34*/12.log?

0 Karma