Solved: Search query to replace first occurrence word with...

Kitteh · ‎10-08-2017

How do I use regex or replace to remove the first occurrence word found and replace second occurrence onward with comma?

For example, the raw data is:
ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root

I want it to be:
CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0),CRON[2907]: pam_unix(cron:session): session closed for user root

cpetterborg · ‎10-08-2017

If you have only one second occurrence of the beginning string, this will work:

| makeresults 
| eval _raw="ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root by (uid=0)" 
| rex mode=sed "s/^(\S+)(.*?)\s(\1)/\2, /"

The process for multiple occurrences is more complex. Is the data in that case similar to the example that you provided? if not can you provide an example? Is there a maximum number of occurrences?

View solution in original post

inventsekar · ‎10-08-2017

You can run rex two times, first time to replace the first ubuntu with blank,
second ubuntu with a comma

(if the string "ubuntu" is not known before hand, please update some more details(which spot it appears), so that rex can be updated)
(rex mode=sed can not be tested on regex101 website, i have tested it on splunk directly, it works fine.. please check the screenshot)

|makeresults
 | eval _raw = "ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root"
 | rex mode=sed field=_raw "s#(^ubuntu\s)##"
 | rex mode=sed field=_raw "s#ubuntu#,#"
 | table _raw

cpetterborg · ‎10-08-2017

If you have only one second occurrence of the beginning string, this will work:

| makeresults 
| eval _raw="ubuntu CRON[2907]: pam_unix(cron:session): session opened for user root by (uid=0) ubuntu CRON[2907]: pam_unix(cron:session): session closed for user root by (uid=0)" 
| rex mode=sed "s/^(\S+)(.*?)\s(\1)/\2, /"

The process for multiple occurrences is more complex. Is the data in that case similar to the example that you provided? if not can you provide an example? Is there a maximum number of occurrences?

inventsekar · ‎10-08-2017

Hi @cpetterborg, great rex command... Great learning !

to other rex beginners, let me explain it -
"s/^(\S+)(.?)\s(\1)/\2, /"
^(\S+) --- captures the first word
`(.?)------ remaining line is captured as "\2", till the 2nd ubuntu match\s(\1)---- matching for "a space and word ubuntu" before the "/", only matching part, after this "/", its the replacement part\2,--- on the replacement, leave the\1`, write the "\2" match and then a comma ",". thats it.

cpetterborg · ‎10-09-2017

Thank you. I saw your original post in email. I'm glad you figured it all out. Congratulations! 🙂 I've upvoted your comment for the fine explanation!

Search query to replace first occurrence word with blank but second occurrence to replace with comma

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk