Splunk Search

How to extract file names from an unformatted URL string?

splunker9999
Path Finder

Hi We need to extract file name from a URL. But URL in the log files have different formats or it has multiple spaces for few as below.

Can someone please help us with extraction.
In the below format

1.it has space after /log(here file name should be data.csv)
    /home/data/var/log/ data.csv

2.It doesn't have space after log
    /home/data/var/log/data.csv

3.It has space with after log but file extension is different.(here file name is (data2015067987.dat)
    /home/data/var/log/ data2015067987.dat
    /home/data/var/log/  201608587data.csv

4.It has multiple spaces after/log (here file name is data data2 data3)
 /home/data/var/log/ data  data2 data3 

Thanks

0 Karma

woodcock
Esteemed Legend

Like this:

| rex field=URL mode=sed "s%/\s+%/%"
0 Karma

gokadroid
Motivator

If still required, this regex might be of help to cover all the scopes:

your base search
| rex field=_raw "\/((?<prefix>[^\s\/]+)\/)*(?<fileName>.*)"
| table prefix, fileName

See Extraction here

0 Karma

splunker9999
Path Finder

This works partially but it extracts all values after last segment, can we reextract for FileName .
It should only give highlighted bold value, I am looking to run regex again on FileName in such away that it should exclude 8 spaces in reverse and list "201608587data.csv" as file name

201608587data.csv b s o r user ssh 0 *

0 Karma

gokadroid
Motivator

I thought query was supposed catch that given you specifically wanted to catch multiple spaces as described here in original question:

It has multiple spaces after/log (here file name is data data2 data3)
  /home/data/var/log/ data  data2 data3 

So can it be a fair statement that if the immediate string of last section has a <dot> in it signifying a file extension, stop there, else continue to capture as mentioned in highlighted requirement of capturing data data2 data3

0 Karma

splunker9999
Path Finder

Yes thats true, we have multiple spaces after last segment for few URI.

Your search for file name actually braeaking from last segment till the end ,which is good one, But can we use another regex on fileName some thing like below

rex field=fileName (regex(which will check from last and exclude  8 spaces)

Thanks

0 Karma

somesoni2
Revered Legend

Give this a try

Updated

your base search | rex field=URL "^.+\/\s*(?<filename>[\w\s\.-:_]+)$"

Run anywhere sample

| gentimes start=-1 | eval URL="/home/data/var/log/ data.csv#/home/data/var/log/data.csv#/home/data/var/log/ data2015067987.dat#/home/data/var/log/  201608587data.csv#/home/data/var/log/ data  data2 data3" | table URL | makemv URL delim="#" | mvexpand URL | rex field=URL "^.+\/\s*(?<filename>[\w\s\.-:_]+)$"

somesoni2
Revered Legend

Try the updated answer.

0 Karma

splunker9999
Path Finder

This one has worked for many but didn't for a few, filename field return no value for below URI

For Ex: :No filename value returns for below :

/home/data/var/data/input/splunk/data_ip_2012-07-21-14-15-06.dat
/home/data/var/data/input/data_user_2012-07-21-141506.done
/home/data/var/data/inpu/data_user_2012-07-21-141506.dat

Thanks

0 Karma

somesoni2
Revered Legend

Give this a try.

your base search | rex field=URL "^.+\/\s*(?<filename>.*)$"
0 Karma

splunker9999
Path Finder

This doesn't worked, it just removed starting"/" from URL and return everything for file name.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...