Splunk Search

Why does extracted field search work for certain dates but not some other dates?

gnshah12345
Observer

I created an extracted field called remote_user.  My search for certain dates do bring the field value properly. However the same search for some other dates do not bring the proper values. I checked the events and the extracted field is malformed on the dates having issues. The remote_user field value will be like "CompanyName John_doe".  The days when search is working the remote_user shows "CompanyName John_doe".  The dates when the search is not working the field shows  value as "CompanyName". How can same extracted field works differently on different dates? Any suggestions?

Labels (1)
0 Karma

seemanshu
Path Finder

Hi @gnshah12345 ,

You may use the following regex expression for fetching the required "remote_user" field.

\d{0,3}\.\d{0,3}\.\d{0,3}\.\d{0,3}\s\-\s(?<remote_user>.+)\[

 Kindly upvote, if found helpful.

 

0 Karma

seemanshu
Path Finder

Hi @gnshah12345 ,

If the field extraction is based on user provided regex, kindly share the same in the response with a sample data, will be helpful in finding the right cause.

Thanks!

 

0 Karma

gnshah12345
Observer

I used regular expression for field extraction.

0 Karma

gnshah12345
Observer

The below is sample. The extracted field is highlighted.

May 3 11:26:01 linux_1 request-instance SoftCert 10.10.20.30 - Brew Bar John Doe_123456_UE [03/May/2023:11:25:55.509 -0400] "GET /rest/BROk305031.xml?ink=202305031525554263206 HTTP/1.1" 404 196 36580 1 25135 brew.bar.com /rest 749 "OU=123456+CN= Brew Bar John Doe,OU=ny,O=Brew Bar Joint,C=us" cc045c0a-e9a9-11ed-a6e5-0050568916c1 "x509: TLSV12: 30" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"

0 Karma

yuanliu
SplunkTrust
SplunkTrust

The question doesn't seem to be related to dates - unless you can show two different raw events, one for which your regex works as desired, one for which not.  Additionally, unless you can demonstrate your regex, there is no way to diagnose.

But ultimately, what is the significance of this string preceding the bracketed date, namely "Brew Bar John Doe_123456_UE"?  According to your description, the value you want is "Brew Bar John Doe".  If your description is accurate, this is the value of CN attribute in that embedded LDAP node, except that embedded message contains a nonstandard delimiter ("+" instead of space), and some inconvenient spacing, both can be fixed easily.

Instead of trying to reinvent regex, I suggest that you use Splunk supported extractions when applicable.  They are more robust.  In your case, the log contains a segment that is NCSA/Apache access log.  Splunk comes with access-request and access-extractions for such.  For example,

 

| rex mode=sed "s/\+/,/g s/= */=/g" ``` handle little quirks in data ```
| extract access-request ``` but this is robust ```

 

This will give you

CCNOOUfileinkmethodrooturiuri_domainuri_pathuri_queryversion
usBrew Bar John DoeBrew Bar Joint123456BROk305031.xml202305031525554263206GETrest/rest/BROk305031.xml?ink=202305031525554263206 /rest/BROk305031.xmlink=202305031525554263206HTTP/1.1
Alternatively, you can use

 

| rex mode=sed "s/\+/,/g s/= */=/g"
| extract access-extractions

 

CCNOOUink
usBrew Bar John DoeBrew Bar Joint123456202305031525554263206
 
Tags (1)
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...