Splunk Search

How can I extract multiple file names from an event and add it as a separate field using rex command?

Renunaren
Loves-to-Learn Everything

Hi Team,

We have a raw event where the message field consists of multiple file names, we want to extract those and add them as a separate field. Please help us on this. Below is the sample event for reference.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕\\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕\\\"status\\\": \\\"files arrived\\\"\",\"3\"😕\\\"files\\\": [\",\"4\"😕\\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\\\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\\\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\\\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\\\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\\\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\\\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\\\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\\\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}

Below is the sample SPL command used for this purpose.

index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"

Please help us on this.

 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please repost your raw event in a code block </> so that it doesn't get corrupted by formatting 

0 Karma

Renunaren
Loves-to-Learn Everything

HI IT Whisperer,

Thanks for your response. As mentioned by you, below is the raw event.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕\\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕\\\"status\\\": \\\"files arrived\\\"\",\"3\"😕\\\"files\\\": [\",\"4\"😕\\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\\\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\\\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\\\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\\\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\\\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\\\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\\\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\\\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\\\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\\\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\\\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\\\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\\\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\\\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\\\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\\\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\\\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}

I tried to extract the file names like  PAKS_FACT_DWH2_D20220221.okPAKS_UBER_DWH2_D20220221.okHHE_SIT_check_file1.txt.okHHE_SIT_check_file2.txt.okHHE_SIT_check_file3.txt.ok

separately and add them as a separate field using the below query 

index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"

but this doesn't worked. Please help us on this issue.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

By not putting your event in a code block </> as requested it gets corrupted

ITWhisperer_0-1686746848325.png

Please use this button

ITWhisperer_1-1686746905019.png

to insert your example event

Renunaren
Loves-to-Learn Everything

Hi IT Whisperer,

Thanks for your response. Please look into the sample event below.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\":\"{\",\"1\":\" \\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\":\" \\\"status\\\": \\\"files arrived\\\"\",\"3\":\" \\\"files\\\": [\",\"4\":\" \\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\" \\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\" \\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\" \\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\" \\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\" \\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\" \\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\" \\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\" \\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" \\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\" \\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\" \\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\" \\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\" \\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\" \\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\":\"}\"}} ", "process": 32633, "processName": "MainProcess"}

Please look into the above code and kindly help us in extracting the file names like mentioned above using rex command.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

First extract the list, then each file

| rex "(?:\"files[\\\\]+\": \[)(?<fileslist>[^\s:]+[^\]]+)"
| rex field=fileslist max_match=0 "(?:[^\s:]+[^\s]+\s[\"\\\]+)(?<files>[^\\\]+)"
0 Karma
Get Updates on the Splunk Community!

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Your Guide to SPL2 at .conf24!

So, you’re headed to .conf24? You’re in for a good time. Las Vegas weather is just *chef’s kiss* beautiful in ...