Splunk Search

Searching a URL for file name that may contain spaces

Mkaz
New Member

Have a record in a log that looks like the following:

Wed Oct 26 10:41:14 2016 0 10.40.112.27 437434 /dirlevel1/dirlevel2/dirlevel3/dirlevel4/chr 2610 4109.txt b s o r aaa_aaaaaaa ssh 0 *

The record is delimited by spaces. I'm trying to pull the filename from the directory provided: /dirlevel1/dirlevel2/dirlevel3/dirlevel4/chr 2610 4109.txt

The issue I'm running into is that the file name may have a space or multiple spaces in it. The following code works, but provides the next set of filed values when it runs into a space within the file name. If the search can be performed from right to left starting at the "b" in the 8th field from the left and take everything from that point to the right up till the first "/" that would be fine. Not sure how to do that though? Any suggestions?

Code used is:

index="ti_is_st" sourcetype="xfer_log" URI=* Status=* IP_Address=* File_Size=* Service_Account=*| rex field=URI "\/(([^\s\/]+\/)*)(?<fileName>[\S]+)" |search fileName="*" Service_Account="*"|table _time IP_Address Service_Account fileName File_Size Status  |replace o with "Download Successful" i with "Upload Successful" j with "Upload Errored" k with "Upload Aborted"  p with "Download Errored" q with "Download Aborted" in Status

Thanks

Tags (1)
0 Karma
1 Solution

somesoni2
SplunkTrust
SplunkTrust

Give this a try

index="ti_is_st" sourcetype="xfer_log" URI=* Status=* IP_Address=* File_Size=* Service_Account=*| rex field=_raw "^(\S+\s+){8}\/(([^\s\/]+\/)+)(?<fileName>.+)(\s+\S+){8}$" |search fileName="*" Service_Account="*"|table _time IP_Address Service_Account fileName File_Size Status  |replace o with "Download Successful" i with "Upload Successful" j with "Upload Errored" k with "Upload Aborted"  p with "Download Errored" q with "Download Aborted" in Status

View solution in original post

0 Karma

jrballesteros05
Communicator

Hello, if you only need the filename. I would do it in two ways.

  1. If the filename comes with the metadata "source" you can extract in the props.conf and create a new field:

    EXTRACT-filename=\S+\/(?P.*?).txt in source

  2. If the filename does not come with the metadata "source", you can use the

    index="ti_is_st" sourcetype="xfer_log" URI=* Status=* IP_Address=* File_Size=* Service_Account=| rex field=_raw "\S+\/(?P.?).txt" |search fileName="" Service_Account=""|table _time IP_Address Service_Account fileName File_Size Status |replace o with "Download Successful" i with "Upload Successful" j with "Upload Errored" k with "Upload Aborted" p with "Download Errored" q with "Download Aborted" in Status
    In the two ways, the key is the regex you are using. I tried "\S+\/(.*?).txt" in regex101.com and it worked for me.

I hope this help you.

0 Karma

Mkaz
New Member

Thanks Jr... My apologies for not stating this earlier, but the file names can end in multiple file formats such as .txt, .xls, .xfr, etc... There would also be mainframes file that may be named aaaaa.aaaaaa.aaaaaaa.aaaaa.

0 Karma

jrballesteros05
Communicator

Ok, you can use this regex.

   \S+\/(?P<filename>.*?)\..*?\s+
0 Karma

Mkaz
New Member

Thanks... I was able to get it to work via the query below. I was trying to pull the status code out of the record also, which I am still having issues with.

I tried the basic \s+\S+){5,6}$ in Regex101 and it seemed to pull properly, but what I have isn't assigning the correct code. Its pulling part of the NNNN in the filename. Also, we're pulling the file size from the record also which seems to be out of alignment now.

index="ti_is_st" sourcetype="xfer_log" | rex field=_raw "^(\S+\s+){8}\/(([^\s\/]+\/)+)(?.+)(\s+\S+){8}$" |rex field=_raw "(\s+\S+){5,6}$(?.+(i|j|k|o|p|q))\s"|search "$field2$" "$field3$" |table _time ip_address Service_Account fileName file_size status |replace o with "Download Successful" i with "Upload Successful" j with "Upload Errored" k with "Upload Aborted" p with "Download Errored" q with "Download Aborted" in status

Record:

Wed Oct 26 10:41:14 2016 0 10.40.112.27 437434 /dirlevel1/dirlevel2/dirlevel3/dirlevel4/chr 2610 4109.txt b s o r aaa_aaaaaaa ssh 0 *
0 Karma

jrballesteros05
Communicator

Hello, I don't understand the question very well (Maybe is my English :D) but I think you want to extract this:

/dirlevel1/dirlevel2/dirlevel3/dirlevel4/chr 2610 4109.txt

Am I right?

I think your problem is regex, if you help me with more information I might help you.

0 Karma

Mkaz
New Member

Thanks for your response...

Well... Yes, basically just trying to pull the complete filename including the spaces only. The "/dirlevel1/dirlevel2/dirlevel3/dirlevel4/" is not required.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try

index="ti_is_st" sourcetype="xfer_log" URI=* Status=* IP_Address=* File_Size=* Service_Account=*| rex field=_raw "^(\S+\s+){8}\/(([^\s\/]+\/)+)(?<fileName>.+)(\s+\S+){8}$" |search fileName="*" Service_Account="*"|table _time IP_Address Service_Account fileName File_Size Status  |replace o with "Download Successful" i with "Upload Successful" j with "Upload Errored" k with "Upload Aborted"  p with "Download Errored" q with "Download Aborted" in Status
0 Karma

Mkaz
New Member

Thanks for your response...

This sits in a query statement and is throwing an error: Encountered the following error while trying to update: In handler 'views': Error parsing XML on line 37: Premature end of data in tag form line 1

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Are you updating the query in dashboard from Edit -> Source xml option?
If yes,then use this

index="ti_is_st" sourcetype="xfer_log" URI=* Status=* IP_Address=* File_Size=* Service_Account=*| rex field=_raw "^(\S+\s+){8}\/(([^\s\/]+\/)+)(?&lt;fileName&gt;.+)(\s+\S+){8}$" |search fileName="*" Service_Account="*"|table _time IP_Address Service_Account fileName File_Size Status  |replace o with "Download Successful" i with "Upload Successful" j with "Upload Errored" k with "Upload Aborted"  p with "Download Errored" q with "Download Aborted" in Status
0 Karma

Mkaz
New Member

Great thanks... Tried it and its close. Its pulling the file name correctly but not the status filed. In the record below, its pulling a status of "of" as opposed to the "o" which would be converted to Download Successful. Its happening on all records that have a space in the file name. This one happened to have several spaces.

Record:
NNN NNNNN aaaaa-Aaaa of Aaaaaa Aaaaaa NNNN.xls b s o r AAAAAA ssh 0 *

0 Karma

somesoni2
SplunkTrust
SplunkTrust

How are you extracting Status field? I don't see a Status field being extracted in the query itself, so it's probably extracted using saved field extractions and you should check the regular expression their on why Status field is wrong for your sample event.

0 Karma

Mkaz
New Member

This is the full query we're using. I tried the basic \s+\S+){5,6}$ in Regex101 and it seemed to pull properly, but what I have isn't assigning the correct code. Its pulling part of the NNNN in the filename. Also, we're pulling the file size from the record also which seems to be out of alignment now.

index="ti_is_st" sourcetype="xfer_log" | rex field=_raw "^(\S+\s+){8}\/(([^\s\/]+\/)+)(?<fileName>.+)(\s+\S+){8}$" |rex field=_raw "(\s+\S+){5,6}$(?<status>.+(i|j|k|o|p|q))\s"|search "$field2$" "$field3$" |table _time ip_address Service_Account fileName file_size status |replace o with "Download Successful" i with "Upload Successful" j with "Upload Errored" k with "Upload Aborted" p with "Download Errored" q with "Download Aborted" in status

Record:

Wed Oct 26 10:41:14 2016 0 10.40.112.27 437434 /dirlevel1/dirlevel2/dirlevel3/dirlevel4/chr 2610 4109.txt b s o r aaa_aaaaaaa ssh 0 *

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...