Symptom:
Our authentication datamodel is showing user=Unknown for events that have a username defined in the log.
Example:
2020-02-07 09:31:11,161 - xxx.xxx.xxx.xxxx - INFO [net.shibboleth.idp.authn.duo.impl.ValidateDuoWebResponse:200] - Profile Action ValidateDuoWebResponse: Duo authentication succeeded for 'user1'
2020-02-07 09:31:10,527 - xxx.xxx.xxx.xxx - INFO [net.shibboleth.idp.authn.impl.ValidateUsernamePasswordAgainstLDAP:152] - Profile Action ValidateUsernamePasswordAgainstLDAP: Login by 'user2' succeeded
The fields look ok, except user=unknown (respectively, below):
action = success | app = shibboleth | src_user = user1 | tag = authentication tag = success | user = unknown
action = success | app = shibboleth | src_user = user2 | tag = authentication tag = success | user = unknown
I thought that adding a field alias in the props.conf for this app would do the trick...
But it still seems to display user=unknown for the datamodel...
Here is an eval expression for the datamodel definition:
src_user=if(isnull(src_user) OR src_user="","unknown",src_user), user=if(isnull(user) OR user="","unknown",user)
Here's a funny interaction...
With the search:
index="shib" host="myhost" signature="INFO [net.shibboleth.idp.authn.duo.impl.ValidateDuoWebResponse"
| where isnull(user)
I get what you would expect... a number of problematic logs where for some reason the user field is not populated..
After some time, I did the search again, and it says No Results Found.
user
100 Values, 99.139% of events
When I ran this search initially, it found ~25% of events with user value....
Is this because it is performing the field extraction at search-time?!
Everything you have defined in that TA is a searchtime extraction.
As a general rule, you want fields to be searchtime extractions (they are more flexible and fast)
In other words, unless you have a specific need to use index time extractions you are best to avoid them.
If your search is now returning good data for the search fields, I wonder if your data model will also include them correctly populated too?
Doesn't look like it...
| from datamodel Authentication
| search app=shibboleth user=unknown
2/10/20
4:18:13.170 PM
2020-02-10 16:18:13,170 - XXX.XXX.XXX.XXX - INFO [net.shibboleth.idp.authn.impl.ValidateUsernamePasswordAgainstLDAP:152] - Profile Action ValidateUsernamePasswordAgainstLDAP: Login by 'user1' succeeded
action = success successapp = shibboleth dest = unknown signature = INFO [net.shibboleth.idp.authn.impl.ValidateUsernamePasswordAgainstLDAPsource = /var/log/shibboleth-idp/idp-process.log sourcetype = idp-process src = unknown src_user = unknown
index="shib"
| where isnull(user)
| search "net.shibboleth.idp.authn.impl.ValidateUsernamePasswordAgainstLDAP:152"
And there it is...
a similar search:
index="shib" "net.shibboleth.idp.authn.impl.ValidateUsernamePasswordAgainstLDAP:152"
user
100 Values, 100% of events
The datamodel uses src_user and user for different purposes.
"src_user" is used to refer to the user who initiated the action, where there is an escalation or modification
"user" is the account on which something was modified in escalated events, or where there is no escalation it should be the initiator.
https://docs.splunk.com/Documentation/CIM/4.14.0/User/Authentication
It looks like your shibboleth integration is populating the wrong field, which its why its missing in the datamodel.
Ideally, you want to change the field extraction so that shibboleth populates user (not src_user)
@nickhillscpl
Thanks for the explanation!
I have this defined in our props:
[idp-process]
EVAL-app = "shibboleth"
...
EXTRACT-user,vendor_action = Login by '(?P<user>\S+)'\s+(?P<vendor_action>.+)
EXTRACT-signature,vendor_action,user = net\.shibboleth\.idp\.authn\..*\] \- (?<signature>.*?)\:.*?\s+(?P<vendor_action>\S+) for '(?P<user>\S+)'
EXTRACT-signature = \.\d{1,3}\s\-\s(?P<signature>[^:]+)\:
EXTRACT-signature,user,vendor_action = net\.shibboleth\.idp\.authn\..*\] \- (?<signature>.*?)\:.*(?<user>No username)\s.*(?<vendor_action>UsernamePasswordContext)
Which seems to address the two use-cases that I have there... seems to return nothing.
2020-02-10 10:13:22,181 - xxx.xxx.xxx.xxx - INFO [net.shibboleth.idp.authn.duo.impl.ValidateDuoWebResponse:200] - Profile Action ValidateDuoWebResponse: Duo authentication succeeded for 'user1'
2020-02-10 10:13:21,414 - xxx.xxx.xxx.xxx - INFO [net.shibboleth.idp.authn.impl.ValidateUsernamePasswordAgainstLDAP:152] - Profile Action ValidateUsernamePasswordAgainstLDAP: Login by 'user1' succeeded
I get neither field user or src_user.
I agree, the extractions and regex looks (and tests) good.
Are you looking at the props.conf on your searchhead?
Do the fields extract correctly if you search your logs?
They did not appear to extract properly when I search the log directly.
e.g.
index=shib sourcetype=idp-process
Shows events with no user, this is likely why the DataModel shows user=unknown!
This TA is deployed on the Indexer, SH, and Enterprise Security SH
Is the TA, and its Knowledge objects shared Globally?
It would appear so..
TA-Shib -- global, everyone
index=shib -- global, everyone
Oh, so you are not using this: https://splunkbase.splunk.com/app/4389/ ??
Can you paste the whole props.conf and transforms.conf? (redacted if necessary)
We are not using the Shib splunkbase app..
I'm not sure why not.
Here are the props.conf and transforms.conf
Props.conf
[idp-audit]
CHARSET=UTF-8
MAX_TIMESTAMP_LOOKAHEAD=18
#REPORT-idp_audit=REPORT-idp_audit
SHOULD_LINEMERGE=false
disabled=false
TIME_FORMAT=%Y%m%dT%H%M%SZ
TIME_PREFIX=|
TZ=UTC
REPORT-idp-audit = REPORT-idp-audit
##From shib docs field name is 'a'
FIELDALIAS-src_ip = SSOclientIP AS src_ip
FIELDALIAS-src = SSOclientIP AS src
##field name is 'u'
FIELDALIAS-user = SSOprincipalName AS user
FIELDALIAS-url = SSOrelyingPartyID AS url
FIELDALIAS-SSOrelayingPartyID = SSOrelyingPartyID AS SSOrelayingPartyID
[idp-process]
EVAL-app = "shibboleth"
EVAL-action = case(vendor_action="succeeded" OR vendor_action="true","success", vendor_action="failed" OR vendor_action="false","failure")
FIELDALIAS-dest = SSOrelayingPartyID AS dest
EXTRACT-log_level,process,pid = ^(?:[^ \n]* ){3}(?P<log_level>\w+)\s+\[(?P<process>[^\]:]+)\:(?<pid>\d+)\]
EXTRACT-src_ip = \] \- (?P<src_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
EXTRACT-user,vendor_action = Login by '(?P<user>\S+)'\s+(?P<vendor_action>.+)
EXTRACT-signature,vendor_action,user = net\.shibboleth\.idp\.authn\..*\] \- (?<signature>.*?)\:.*?\s+(?P<vendor_action>\S+) for '(?P<user>\S+)'
EXTRACT-signature = \.\d{1,3}\s\-\s(?P<signature>[^:]+)\:
EXTRACT-signature,user,vendor_action = net\.shibboleth\.idp\.authn\..*\] \- (?<signature>.*?)\:.*(?<user>No username)\s.*(?<vendor_action>UsernamePasswordContext)
REPORT-idp-process-fields = REPORT-idp-process-fields
##From shib docs field name is 'a'
FIELDALIAS-src_ip = SSOclientIP AS src_ip
FIELDALIAS-src = SSOclientIP AS src
##field name is 'u'
FIELDALIAS-user = SSOprincipalName AS user
FIELDALIAS-url = SSOrelyingPartyID AS url
FIELDALIAS-SSOrelayingPartyID = SSOrelyingPartyID AS SSOrelayingPartyID
[idp-warning]
EXTRACT-log_level,process,pid = ^(?:[^ \n]* ){3}(?P<log_level>\w+)\s+\[(?P<process>[^\]:]+)\:(?<pid>\d+)\]
EXTRACT-src_ip = \] \- (?P<src_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
EXTRACT-user,vendor_action = Login by '(?P<user>\S+)'\s+(?P<vendor_action>.+)
EXTRACT-signature = \.\d{1,3}\s\-\s(?P<signature>[^:]+)\:
Transforms.conf:
[REPORT-idp-audit]
DELIMS = "|"
FIELDS = "SSOclientIP","SSOauditEventTime","SSOrequestBinding","SSOrequestId","SSOrelyingPartyID","SSOmessageProfileId","SSOassertingPartyId","SSOresponseBinding","SSOresponseId","SSOprincipalName","SSOauthNMethod","SSOreleasedAttributeID","SSOnameIdentifier","SSOassertionID"
[REPORT-idp-process-fields]
DELIMS = "|"
FIELDS = "event_headers","SSOrequestBinding","SSOrequestId","SSOrelayingPartyID","SSOmessageProfileId","SSOassertingPartyId","SSOresponseBinding","SSOresponseId","SSOprincipalName","SSOauthNMethod","SSOreleasedAttributeID","SSOnameIdentifier","SSOassertionID"
And apparently /local/props.conf
[idp-audit]
MAX_TIMESTAMP_LOOKAHEAD=28
SHOULD_LINEMERGE=false
disabled=false
TIME_FORMAT=%Y%m%dT%H%M%SZ
TIME_PREFIX=\|
TZ=America/New_York
TRUNCATE=99999
FIELDALIAS-src_ip = Client_IP AS src_ip
FIELDALIAS-src = Client_IP AS src
FIELDALIAS-user = Username AS user
[idp-process]
LINE_BREAKER = ([\n\r]+)\d{4}-\d{1,2}-\d{1,2}\s\d+:\d+:\d+
TIME_FORMAT=%Y-%m-%d %H:%M:%S,%3Q
TZ=America/New_York
MAX_TIMESTAMP_LOOKAHEAD=28
SHOULD_LINEMERGE=false
disabled=false
TRUNCATE=99999
EXTRACT-src_ip = \|(?P<src_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\|
EXTRACT-message = (\]\s-\s)(?P<message>[a-zA-Z0-9:?\s.\(\):\/=%&]+)([\n\r]+)
EXTRACT-reason= --(?P<potential_reason>[a-zA-Z0-9:?\s\(\):\/=%&]+\S)\s+at
EXTRACT-examples = \s+at\s+(?P<example_of__reason>[a-zA-Z0-9:?\s\(\):\/=%&.]+)\s
[idp-warning]
LINE_BREAKER = ([\n\r]+)\d{4}-\d{1,2}-\d{1,2}\s\d+:\d+:\d+
TIME_FORMAT=%Y-%m-%d %H:%M:%S,%3Q
TZ=America/New_York
MAX_TIMESTAMP_LOOKAHEAD=28
SHOULD_LINEMERGE=false
disabled=false
TRUNCATE=99999
EXTRACT-src_ip = \|(?P<src_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\|
EXTRACT-message = (\]\s-\s)(?P<message>[a-zA-Z0-9:?\s.\(\):\/=%&]+)([\n\r]+)
EXTRACT-reason= --(?P<potential_reason>[a-zA-Z0-9:?\s\(\):\/=%&]+\S)\s+at
EXTRACT-examples = \s+at\s+(?P<example_of_reason>[a-zA-Z0-9:?\s\(\):\/=%&.]+)\s
And local/transforms.conf
[REPORT-idp-audit]
DELIMS = "|"
FIELDS = "Timestamp","Client_IP","Inbound_binding","Inbound_message_ID","Service_provider_name","Profile_ID","Identity_provider_name","Outbound_binding","Outbound_message_ID","Username","AuthenticationContext","Attributes","NameID_value"
#FIELDS = "timestamp","SSOclientIP","SSOauditEventTime","SSOrequestBinding","SSOrequestId","SSOrelyingPartyID","SSOmessageProfileId","SSOassertingPartyId","SSOresponseBinding","SSOresponseId","SSOprincipalName","SSOauthNMethod","SSOrelease
dAttributeID","SSOnameIdentifier","SSOassertionID"
[REPORT-idp-process-fields]
DELIMS = "|"
FIELDS = event_headers,SSOclientIP,SSOrequestBinding,SSOrequestId,SSOrelayingPartyID,SSOmessageProfileId,SSOassertingPartyId,SSOresponseBinding,SSOresponseId,SSOprincipalName,SSOauthNMethod,SSOreleasedAttributeID,SSOnameIdentifier,SSOasser
tionID,field15,field16,url,uri_path,field19,signature,vendor_action,http_user_agent,field23
I cant see anything wroing with your config - moreover, i cant see any definition for src_user!!
My thought process therefore, is where is src_user comming from?
Have a look in your field extractions/aliases/calc.fields for anything which monkeying with src_user which could apply to thet source type.
It looks like src_user is not defined in the sourcetype=idp-process
Rather, it is coming through from the datamodel definition:
src_user=if(isnull(src_user) OR src_user="","unknown",src_user)
The implementer here also created a separate TA: TA-shib_inputs
Which has additional props/transforms.conf:
**default/props.conf**
[idp-audit]
CHARSET=UTF-8
MAX_TIMESTAMP_LOOKAHEAD=18
#REPORT-idp_audit=REPORT-idp_audit
SHOULD_LINEMERGE=false
disabled=false
TIME_FORMAT=%Y%m%dT%H%M%SZ
TIME_PREFIX=\|
TZ=UTC
REPORT-idp-audit = REPORT-idp-audit
##From shib docs field name is 'a'
FIELDALIAS-src_ip = SSOclientIP AS src_ip
FIELDALIAS-src = SSOclientIP AS src
##field name is 'u'
FIELDALIAS-user = SSOprincipalName AS user
FIELDALIAS-url = SSOrelyingPartyID AS url
FIELDALIAS-SSOrelayingPartyID = SSOrelyingPartyID AS SSOrelayingPartyID
[idp-process]
EVAL-app = "shibboleth"
EVAL-action = case(vendor_action="succeeded","success", vendor_action="failed","failure")
EXTRACT-log_level,process,pid = ^(?:[^ \n]* ){3}(?P\w+)\s+\[(?P[^\]:]+)\:(?\d+)\]
EXTRACT-src_ip = \] \- (?P\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
EXTRACT-user,vendor_action = Login by '(?P\S+)'\s+(?P.+)
EXTRACT-signature = \.\d{1,3}\s\-\s(?P[^:]+)\:
REPORT-idp-process-fields = REPORT-idp-process-fields
##From shib docs field name is 'a'
FIELDALIAS-src_ip = SSOclientIP AS src_ip
FIELDALIAS-src = SSOclientIP AS src
##field name is 'u'
FIELDALIAS-user = SSOprincipalName AS user
FIELDALIAS-url = SSOrelyingPartyID AS url
FIELDALIAS-SSOrelayingPartyID = SSOrelyingPartyID AS SSOrelayingPartyID
**default/transforms.conf:**
[REPORT-idp-audit]
DELIMS = "|"
FIELDS = "SSOclientIP","SSOauditEventTime","SSOrequestBinding","SSOrequestId","SSOrelyingPartyID","SSOmess
ageProfileId","SSOassertingPartyId","SSOresponseBinding","SSOresponseId","SSOprincipalName","SSOauthNMetho
d","SSOreleasedAttributeID","SSOnameIdentifier","SSOassertionID"
[REPORT-idp-process-fields]
DELIMS = "|"
FIELDS = "event_headers","SSOrequestBinding","SSOrequestId","SSOrelayingPartyID","SSOmessageProfileId","SS
OassertingPartyId","SSOresponseBinding","SSOresponseId","SSOprincipalName","SSOauthNMethod","SSOreleasedAt
tributeID","SSOnameIdentifier","SSOassertionID"
**local/props.conf**
[idp-audit]
MAX_TIMESTAMP_LOOKAHEAD=28
SHOULD_LINEMERGE=false
disabled=false
TIME_FORMAT=%Y%m%dT%H%M%SZ
TIME_PREFIX=\|
TZ=EST
TRUNCATE=99999
FIELDALIAS-src_ip = Client_IP AS src_ip
FIELDALIAS-src = Client_IP AS src
FIELDALIAS-user = Username AS user
[idp-process]
LINE_BREAKER = \d{4}-\d{1,2}-\d{1,2}\s\d+:\d+:\d+
TIME_FORMAT=%Y-%m-%d\s%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD=28
SHOULD_LINEMERGE=false
disabled=false
TZ=EST
TRUNCATE=99999
EXTRACT-loglevel = \d{3}\s\-\s(?P\w+)\s
EXTRACT-logger = \[(?P.*):\d+\]
EXTRACT-logline = \[.*:(?P\d+)\]
EXTRACT-message = (\]\s-\s)(?P[a-zA-Z0-9:?\s.\(\):\/=%&]+)(\s'|([\n\r]+)\d+)
EXTRACT-key_value = '(?P[a-zA-Z0-9]+)'
EXTRACT-reason= --(?P[a-zA-Z0-9:?\s\(\):\/=%&]+\S)\s+at
EXTRACT-examples = \s+at\s+(?P[a-zA-Z0-9:?\s\(\):\/=%&.]+)\s
[idp-warning]
LINE_BREAKER = \d{4}-\d{1,2}-\d{1,2}\s\d+:\d+:\d+
TIME_FORMAT=%Y-%m-%d\s%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD=28
SHOULD_LINEMERGE=false
disabled=false
TZ=EST
TRUNCATE=99999
EXTRACT-loglevel = \d{3}\s\-\s(?P\w+)\s
EXTRACT-logger = \[(?P.*):\d+\]
EXTRACT-logline = \[.*:(?P\d+)\
EXTRACT-message = (\]\s-\s)(?P[a-zA-Z0-9:?\s.\(\):\/=%&]+)(\s'|([\n\r]+)\d+)
EXTRACT-key_value = '(?P[a-zA-Z0-9]+)'
EXTRACT-reason= --(?P[a-zA-Z0-9:?\s\(\):\/=%&]+\S)\s+at
EXTRACT-examples = \s+at\s+(?P[a-zA-Z0-9:?\s\(\):\/=%&.]+)\s
**local/transforms.conf**
[REPORT-idp-audit]
DELIMS = "|"
FIELDS = "Client_IP","Timestamp","Inbound_binding","Inbound_message_ID","Service_provider_name","Profile_I
D","Identity_provider_name","Outbound_binding","Outbound_message_ID","Username","AuthenticationContext","A
ttributes","NameID_value"
#FIELDS = "SSOclientIP","SSOauditEventTime","SSOrequestBinding","SSOrequestId","SSOrelyingPartyID","SSOmes
sageProfileId","SSOassertingPartyId","SSOresponseBinding","SSOresponseId","SSOprincipalName","SSOauthNMeth
od","SSOreleasedAttributeID","SSOnameIdentifier","SSOassertionID"
#[REPORT-idp-process-fields]
#DELIMS = "|"
#FIELDS = "event_headers","SSOrequestBinding","SSOrequestId","SSOrelayingPartyID","SSOmessageProfileId","S
SOassertingPartyId","SSOresponseBinding","SSOresponseId","SSOprincipalName","SSOauthNMethod","SSOreleasedA
ttributeID","SSOnameIdentifier","SSOassertionID"
It looks like src_user is not defined in the sourcetype=idp-process Rather, it is coming through from the datamodel definition:
No, its the other way round - Datamodels take thier fields from extractions, a datamodel can't add fields to a search.
Back in your original question you said that the DM returns:
action = success | app = shibboleth | src_user = user1 | tag = authentication tag = success | user = unknown
action = success | app = shibboleth | src_user = user2 | tag = authentication tag = success | user = unknown
So its getting src_user from somewhere, just not from this TA as far as I can see.