I wanted to add an update to this after some troubleshooting.. When our crash occurred, a user in the admin group got the message that he did not have the cloud gateway role, though he should have had it. As soon as he tried to add a device, we began getting the 404 errors. This was after two users had successfully added devices to cloud gateway.
To get our search head back online, I renamed the script to 1azureScripted and restarted splunk. When I went back into the SAML configuration, I noticed that the "Script Functions" and "Script Secure Arguments" blocks were empty, although they both showed up in the authentication.conf file. I updated the file and the script function to the original values, which got things working again.
One thing that has me puzzled, is that I had to update the azurescripted.py file to make it proxy aware, and it the process I messed up some of the space/tab combos which resulted in an error, meaning that the script never could've executed successfully anyway.
It seems like we are able to generate tokens as long as something in entered in the script field. If a user has an error during device registration or splunk is restarted, you will begin to see the 404/500 errors.
... View more