Splunk Search

Search query to replace first occurrence word with blank but second occurrence to replace with comma

Kitteh
Path Finder

How do I use regex or replace to remove the first occurrence word found and replace second occurrence onward with comma?

For example, the raw data is:
ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root

I want it to be:
CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0),CRON[2907]: pam_unix(cron:session): session closed for user root

0 Karma
1 Solution

cpetterborg
SplunkTrust
SplunkTrust

If you have only one second occurrence of the beginning string, this will work:

| makeresults 
| eval _raw="ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root by (uid=0)" 
| rex mode=sed "s/^(\S+)(.*?)\s(\1)/\2, /"

The process for multiple occurrences is more complex. Is the data in that case similar to the example that you provided? if not can you provide an example? Is there a maximum number of occurrences?

View solution in original post

inventsekar
SplunkTrust
SplunkTrust

You can run rex two times, first time to replace the first ubuntu with blank,
second ubuntu with a comma

(if the string "ubuntu" is not known before hand, please update some more details(which spot it appears), so that rex can be updated)
(rex mode=sed can not be tested on regex101 website, i have tested it on splunk directly, it works fine.. please check the screenshot)

|makeresults
 | eval _raw = "ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root"
 | rex mode=sed field=_raw "s#(^ubuntu\s)##"
 | rex mode=sed field=_raw "s#ubuntu#,#"
 | table _raw

alt text

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

If you have only one second occurrence of the beginning string, this will work:

| makeresults 
| eval _raw="ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root by (uid=0)" 
| rex mode=sed "s/^(\S+)(.*?)\s(\1)/\2, /"

The process for multiple occurrences is more complex. Is the data in that case similar to the example that you provided? if not can you provide an example? Is there a maximum number of occurrences?

inventsekar
SplunkTrust
SplunkTrust

Hi @cpetterborg, great rex command... Great learning !

to other rex beginners, let me explain it -
"s/^(\S+)(.?)\s(\1)/\2, /"
^(\S+) --- captures the first word
`(.
?)------ remaining line is captured as "\2", till the 2nd ubuntu match
\s(\1)---- matching for "a space and word ubuntu"
before the "/", only matching part, after this "/", its the replacement part
\2,--- on the replacement, leave the\1`, write the "\2" match and then a comma ",". thats it.

cpetterborg
SplunkTrust
SplunkTrust

Thank you. I saw your original post in email. I'm glad you figured it all out. Congratulations! 🙂 I've upvoted your comment for the fine explanation!

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...