Hello Everyone!
Currently the result of my query is below:
Input:
id URL
101 https://......-28.../../..../..../..../12304
102 https://......-28.../../..../..../..../34569
https://......-02.../../..../..../..../8976
https://......-28.../..../..../741256
103 https://......-06.../..../..../..../5678
https://......-04.../../..../..../..../158930
I would like to have the output as below:
Output:
id URL fieldA fieldB
101 https://......-28.../../..../..../..../12304 28 12304
102 https://......-28.../../..../..../..../34569 28 34569
102 https://......-02.../../..../..../..../8976 02 8976
102 https://......-28.../..../..../741256 28 741256
103 https://......-06.../..../..../..../5678 06 5678
103 https://......-04.../../..../..../..../158930 04 158930
I have tried with rex and mvcombine , makemv but not able to achieve the result. I am not sure whether I am using them correctly.
Can you please help me to get the output?
The problem here is that mvexpand doesn't work with empty fields so the trick is to add a random string to the field (mvappend, mvcombine), then remove it if it is not the only value in the field (mvfilter), then replace it with an empty string after the mvexpand
... your search
| eval temp="randomstringwithoutwhitespaces"
| eval URL=mvappend(temp,URL)
| mvcombine delim=" " URL
| makemv tokenizer="(\S+)" URL
| eval URL=if(mvcount(URL)>1,mvfilter(URL!="randomstringwithoutwhitespaces"),URL)
| mvexpand URL
| eval URL=if(URL="randomstringwithoutwhitespaces","",URL)
| rex field=URL "\/[^\d]*(-|)(?<fieldA>\d{2}).*\/(.*csv$|(?<fieldB>\d+$))"
... your search
| eval URL=split(URL,"\n")
| mvexpand URL
| rex field=URL "-(?<fieldA>\d{2}).*\/(?<fieldB>\d+$)"
Hello @ITWhisperer ,
I am sorry, the URL field turns out to be a string field where we have multiple links separated by space. I tried the split command with mvexpand and it is not working .
The rex command worked perfectly! If we are able to split the URL to separate links in the same field then I think we can extract the fieldA and fieldB.
Can you please let me how to split the split the string to new rows?
It is a "\n" between links in the string.
Try this to split the URL string into multiple events
... your search
| makemv tokenizer="(\S+)" URL
| mvexpand URL
| rex field=URL "-(?<fieldA>\d{2}).*\/(?<fieldB>\d+$)"
Hello @ITWhisperer ,
The above code works perfectly:) But I have certain rows where URL is "" and with the above code all the rows with URL as "" is removed.
Can you please let me know how extract new fields retaining all the existing rows?
Hello @ITWhisperer
Can you also please help me to extract 'fieldA' in case the URL field is available in the below format?
URL fieldA
/opt/splunk-monitor/.../..../28-.../......csv 28
/opt/splunk-monitor/.../..../08-.../......csv 08
/opt/splunk-monitor/.../..../....-02/......csv 02
/opt/splunk-monitor/.../..../....-06/......csv 06
| rex max_match=0 field=URL "\/[^\d]*(-|)(?<fieldA>\d{2}).*\/(.*csv$|(?<fieldB>\d+$))"
Hello @ITWhisperer ,
Thanks a lot.The extraction works very well:) But as I mentioned above, I have certain rows where URL is "" and with the below code all the rows with URL as "" is removed.
Can you please let me know how extract new fields retaining all the existing rows?
Code:
... your search | makemv tokenizer="(\S+)" URL | mvexpand URL | rex field=URL "-(?<fieldA>\d{2}).*\/(?<fieldB>\d+$)"
The problem here is that mvexpand doesn't work with empty fields so the trick is to add a random string to the field (mvappend, mvcombine), then remove it if it is not the only value in the field (mvfilter), then replace it with an empty string after the mvexpand
... your search
| eval temp="randomstringwithoutwhitespaces"
| eval URL=mvappend(temp,URL)
| mvcombine delim=" " URL
| makemv tokenizer="(\S+)" URL
| eval URL=if(mvcount(URL)>1,mvfilter(URL!="randomstringwithoutwhitespaces"),URL)
| mvexpand URL
| eval URL=if(URL="randomstringwithoutwhitespaces","",URL)
| rex field=URL "\/[^\d]*(-|)(?<fieldA>\d{2}).*\/(.*csv$|(?<fieldB>\d+$))"
Hello @ITWhisperer ,
Thanks a lot:) This worked like magic:)
Please share your existing query.
For field extractions using rex, we need more details about the data from which the data will be extracted. That is to say the "..." may be hiding important characters that may determine how the regex must be written.
Hello @richgalloway ,
I am using the below query for rex now:
| rex max_match=0 field=URL"\/(?<fieldB>[^\/]*)(https:|\n|$)"
I hope this helps in modifing it to get fieldA as well.