Splunk Search

Field extraction from source field

ppatkar
Path Finder

I have my Splunk source in the format below :

source=/default/folder/20190403/file_PARADOX_7747_txt

I am trying to only pick the file name from the source to do some analysis & unable to get rid of unwanted process id appended at the end i.e., I only need PARADOX from the above.

Below is the closest I have got so far , however I am unable to separate the process id from the file name

rex field=source "(?<logdir>[\w\W/]+)/file_(?<filename>[^.]+)_txt"
  • logdir : /default/folder/20190403/
  • filename : PARADOX_7747

Ideally, I would like the below output :

  • logdir : /default/folder/
  • date : 20190403
  • processid : 7747
  • filename : PARADOX
  • extension : txt

Any help is appreciated . Thank you.

0 Karma
1 Solution

ragedsparrow
Contributor

If you only want the filename, I think @FrankVI or @vnravikumar would be a good approach. If you want it all parsed out:

 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

Here is what I used to test it:

| makeresults 
 | eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

View solution in original post

0 Karma

ppatkar
Path Finder

Thanks @FrankVI , @vnravikumar & @ragedsparrow for all your help .

Unfortunately my source pattern can contain multiple words in the file name but filename is always suffixed by process id like below :

source=/default/folder/20190403/file_PARADOX_7747_txt
source=/default/folder/20190402/file_AMR_CA_1234_txt
source=/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt

If there is a way to grab the file name between "file_" and a numeric digit ([0-9]) , it ll help .

0 Karma

ragedsparrow
Contributor

I think this would work:

| rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\d]+)\_(?<processid>\d+)\_(?<extension>.+)"

I tested it here:

| makeresults 
  | eval source="/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt"
  | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\d]+)\_(?<processid>\d+)\_(?<extension>.+)"
0 Karma

ppatkar
Path Finder

Works like a charm ! Thank you

0 Karma

vnravikumar
Champion

Hi

Try this

| makeresults 
 | eval source = "source=/default/folder/20190402/file_EMEA_IRE_DUB_8964_txt" 
 | rex field=source "file\_(?P<name>.+)_\d+"

ragedsparrow
Contributor

If you only want the filename, I think @FrankVI or @vnravikumar would be a good approach. If you want it all parsed out:

 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"

Here is what I used to test it:

| makeresults 
 | eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
 | rex field=source "(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/file_(?<filename>[^\_]+)\_(?<processid>[^\_]+)\_(?<extension>.+)"
0 Karma

vnravikumar
Champion

Hi

Give a try

| makeresults 
| eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
| eval filename = mvindex(split(source,"_"),1)

OR

To avoid any directory that contains the underscore

| makeresults 
| eval source = "/default/folder/20190403/file_PARADOX_7747_txt" 
| rex field=source "\/(?P<filename>file.+)" 
| eval filename = mvindex(split(filename,"_"),1)

[New]:

Try this

| makeresults 
 | eval source = "/default/folder/20190402/file_AMR_CA_1234_txt" 
 | rex field=source "file\_(?P<name>.+)_\d+"
0 Karma

FrankVl
Ultra Champion

You were pretty close. I guess this should work (unless the filename can also contain _ or other variations on the format cause this to break in some cases.

| rex field=source "(?<logdir>[\w\W/]+)/file_(?<filename>[^_]+)_(?<processid>[^_]+)_txt"
0 Karma
Get Updates on the Splunk Community!

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...