Hello
I have a field extraction to extract email address from a wso2 log and rename it as user.
So this log:
2016-07-11 20:38:30,633 priority sampledata-not_real-1111-simple-90 mydata.platform.stuff.yea.morestuff field=handler method=value scopeValue=email_address=myemail@smile.com|something:stuff=me&app=hello_stuff id=""
I have set to extract:
scopeValue=email_address=(?P<user>[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})
When I run:
index=* user=myemail@smile.com earliest=-48h@h sourcetype="wso2:am:runtime" "scopeValue=email_address=" | stats count as "UserCountUsingField"
I get the above log with the email in the user field
When I run this search I do not get that log:
index=* user=myemail@smile.com earliest=-48h@h sourcetype="wso2:am:runtime" | stats count as "UserCountUsingField"
Any idea why that wouldn't be working?
Thanks for the help!
This sounds very similar to https://answers.splunk.com/answers/102528/field-discovery-extraction-works-but-extracted-field-value...
I wonder if you need a fields.conf on your search head with:
[user]
INDEXED_VALUE = false
to solve this issue. There might be a more efficient way with adjusting tokenization per the other answer, but perhaps this will work? The unfortunate thing is that this impacts all fields called user not just that in your particular sourcetype (since this is on the building of the search end, no data yet).
That regex is a little bit of overkill if all you want is the user. You could try something like this in the sourcetype stanza in props.conf
:
EXTRACT-email_user = email_address=(?<user>[^|]+)
Some explanation:
This regex is looking for the string "email_address=" and then the capture group contains a negated character class which says "all characters until a pipe".
HTH,
Dave
I added this and reapplied the configs. Still don't get this record when searching:
index=* user=myemail@smile.com earliest=-48h@h sourcetype="wso2:am:runtime" | stats count as "UserCountUsingField"
If you search without the stats part, do you see the "user" field in your field list?
This record is not returned at all when searching UNLESS you use "scopeValue", this returns the log I am looking for:
index=* user=myemail@smile.com earliest=-48h@h sourcetype="wso2:am:runtime" "scopeValue"
This returns nothing:
index=* user=myemail@smile.com earliest=-48h@h sourcetype="wso2:am:runtime"
right, but in that case specifying a user seems incidental. As I asked before, if you search without the stats part, do you see the "user" field in your field list?
If this log is not returned then there won't be a user field
What I'm getting at is, does the field extraction work. If you look at events that should have this field extracted, is the field showing up?
It appears that the extraction is only partly working. For some addresses it works and others it does not but I have not found WHY as the addresses it works on are the same format that it does not
I don't know if the square brackets is a problem of the post.
I tested your regex on
https://regex101.com/
just a little bit modified:
scopeValue\=email_address\=(?P[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+)
and I have the result you want:
myemail@smile.com
Bye.
Giuseppe
I added this and reapplied the configs. Still don't get this record when searching:
index=* user=myemail@smile.com earliest=-48h@h sourcetype="wso2:am:runtime" | stats count as "UserCountUsingField"
try using double quotes.
index=* user="myemail@smile.com" earliest=-48h@h sourcetype="wso2:am:runtime" | stats count as "UserCountUsingField"
Bye.
Giuseppe
Unfortunately I get the same results
NOTE: I had to use brackets instead of the proper <> for the field name in the regex because of formatting in this page
How did you set to extract the email address? Check the permission for the field extraction.
Permissions are Global: readable by all and writable to admin and power