Solved: Regex help on multiline fields and mvexpand

koshyk · ‎02-12-2017

My data sample is as below

C12345  my1Surname, my1First        Role Access (Group1) - I
                            Role Access (Group2) - II
                            HR Department1
                            Email Contractors
P98765  Þórarinsson, Guðmundur        Role Access (Group1) - I
                            IT Department1
                            Email Permanent                               
P12567  my3Surname, my3First        Role Access (Group2) - II
                            IT Department1
                            AWS Access Role2
                            Email Permanent

The first column (is Surname and Firstname), then there is a tab and then it is "multiline" roles of the user
I'm using below regex. The regex splits correctly, but the "\n" of the roles are missed, so unable to split it afterwards

| rex "^(?<employeeID>.*)\t(?<user>.*)\t(?<roles>(?s).*)"

I'm looking for final output into a key-value format possibly to have a relational-value (or json hierarchy). eg C12345 has 4 roles

C12345 = Role Access (Group1) - I
C12345 = Role Access (Group2) - II
C12345 = HR Department1
C12345 = Email Contractors

PS: The data is encoded in iso-8859-1 (European languages)

martin_mueller · ‎02-12-2017

This chops up your provided data into each field, format the final output as you need:

| makeresults | eval _raw = " C12345  my1Surname, my1First        Role Access (Group1) - I
                             Role Access (Group2) - II
                             HR Department1
                             Email Contractors
 P98765  Þórarinsson, Guðmundur        Role Access (Group1) - I
                             IT Department1
                             Email Permanent                               
 P12567  my3Surname, my3First        Role Access (Group2) - II
                             IT Department1
                             AWS Access Role2
                             Email Permanent"
| rex max_match=0 "(?m)^(?<entry>\s{0,3}\S[^\r\n]+([\r\n]+\s{4,}[^\r\n]+)*)" 
| mvexpand entry 
| rex field=entry "(?s)^\s*(?<employeeid>\S+)\s+(?<user>.*?)\s\s+(?<roles>.*)" 
| rex max_match=0 field=roles "(?<roles>.*?)(\s\s+|\s*$)"
| table employeeid user roles

Output:

employeeid        user                      roles   
C12345            my1Surname, my1First      Role Access (Group1) - I
                                            Role Access (Group2) - II
                                            HR Department1
                                            Email Contractors
P98765            Þórarinsson, Guðmundur    Role Access (Group1) - I
                                            IT Department1
                                            Email Permanent
P12567            my3Surname, my3First      Role Access (Group2) - II
                                            IT Department1
                                            AWS Access Role2
                                            Email Permanent

View solution in original post

martin_mueller · ‎02-12-2017

This chops up your provided data into each field, format the final output as you need:

| makeresults | eval _raw = " C12345  my1Surname, my1First        Role Access (Group1) - I
                             Role Access (Group2) - II
                             HR Department1
                             Email Contractors
 P98765  Þórarinsson, Guðmundur        Role Access (Group1) - I
                             IT Department1
                             Email Permanent                               
 P12567  my3Surname, my3First        Role Access (Group2) - II
                             IT Department1
                             AWS Access Role2
                             Email Permanent"
| rex max_match=0 "(?m)^(?<entry>\s{0,3}\S[^\r\n]+([\r\n]+\s{4,}[^\r\n]+)*)" 
| mvexpand entry 
| rex field=entry "(?s)^\s*(?<employeeid>\S+)\s+(?<user>.*?)\s\s+(?<roles>.*)" 
| rex max_match=0 field=roles "(?<roles>.*?)(\s\s+|\s*$)"
| table employeeid user roles

Output:

employeeid        user                      roles   
C12345            my1Surname, my1First      Role Access (Group1) - I
                                            Role Access (Group2) - II
                                            HR Department1
                                            Email Contractors
P98765            Þórarinsson, Guðmundur    Role Access (Group1) - I
                                            IT Department1
                                            Email Permanent
P12567            my3Surname, my3First      Role Access (Group2) - II
                                            IT Department1
                                            AWS Access Role2
                                            Email Permanent

koshyk · ‎02-12-2017

Thank you perfect.
The missing thing in my search was:

| rex max_match=0 field=roles "(?<roles>.*?)(\s\s+|\s*$)"

So my whole search is now

<raw search>| rex "^(?<employeeID>.*)\t(?<user>.*)\t(?<roles>(?s).*)"| rex max_match=0 field=roles "(?<roles>.*?)(\s\s+|\s*$)"| mvexpand roles | table employeeID,user,roles

Regex help on multiline fields and mvexpand

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!