I have a non standardized field in one of the logs that we pull. I am building an inline rex string to extract the field. The string below extracts everything except for one entry that should be. That entry contains an ampersand "&".
regex: \[Site:\s(?<site>\w\s[[-\s\w+\s]+|[-\s\w+\s\w+\s]+]-[\s\w+|\s\w.{1,3}])\]
Data not extracted with this regex: D - External Subnets - AT&T
I have tried the following:
\[Site:\s(?<site>\w\s[[-\s\w+\s]+|[-\s\w+\s\w+\s]+]-[\s\w+|\s\w+&\w])\]
\[Site:\s(?<site>\w\s[[-\s\w+\s]+|[-\s\w+\s\w+\s]+]-[\s\w+|\s\w+"&"\w])\]
\[Site:\s(?<site>\w\s[[-\s\w+\s]+|[-\s\w+\s\w+\s]+]-[\s\w+|\s\w+\&\w])\]
\[Site:\s(?<site>\w\s[[-\s\w+\s]+|[-\s\w+\s\w+\s]+]-[\s\w+|\s\w+\\&\w])\]
\[Site:\s(?<site>\w\s[[-\s\w+\s]+|[-\s\w+\s\w+\s]+]-[\s\w+|\s\w+\\\&\w])\] - I tried this after researching some perl coding suggestions
I believe this is because the ampersand is used to repeat the previously matched pattern. I'm not sure how to escape the ampersand so it reads as a litteral value. I also haven't been able to find any reference for a specific character sequence to use in the place of the ampersand to search for it.
Thanks for any help you can offer.
This should do the job. It will catch everything between "[Site: " and "]".
"\[Site:\s*(?P<site>.*)\]"
This should do the job. It will catch everything between "[Site: " and "]".
"\[Site:\s*(?P<site>.*)\]"
Thank you.
When I use that it starts pulling from the adjacent field also which is the IP so I end up with far too many unique fields
Sample data with adjacent field:
[Site: V - A - VLAN 213 - Full] [XXX.XXX.XXX.XXX]
Field values being pulled now:
V - A - VLAN 213 - Full] [xxx.xxx.xxx.xxx
Making the quantifier less greedy should fix that.
"\[Site:\s*(?P<site>.*?)\]"
That worked perfectly. Thanks. I'll look up the trailing "?" and see why that solved the problem but thank you very much for your help.
Can you supply some sample data? If not, what terminates the Site field?
The closing square bracket is the termination of the value in the log.
Here are a couple examples, like I said the field doesn't have a standardized naming convention so I did my best with the regex above which catches everything except for the value that includes the ampersand.
Sample data that I need to extract:
[Site: V - A - VLAN 213 - Full]
[Site: D - External Subnets - AT&T]