Getting Data In

Parse JSON nested inside a Windows Event

tjreynol
Engager

Hello, I am looking for a way to parse the JSON data that exists in the "Message" body of a set of Windows Events. Ideally I would like it such that my team only has to put in search terms for the sourcetype and the fields will be extracted and formatted appropriately. However, I would settle for simply creating a bunch of saved searches\reports and instructing my team to use those.

Here is an example record:

09/19/2017 11:42:20 AM
LogName=PowerShell-Endpoint-IMS-APISession
SourceName=PowerShell-Endpoint-IMS-APISession-Source
EventCode=1000
EventType=4
Type=Information
ComputerName=SOME_MACHINE.some.domain.tld
TaskCategory=None
OpCode=Info
RecordNumber=2275
Keywords=Classic
Message={
    "Message":  "User, jdoe, is already Lync-enabled.",
    "CorrelationId":  "38d97480-08a0-4e81-971c-8ab3f68747bc",
    "SessionInfo":  {
                        "SessionConfigurationName":  "IMS-APISession",
                        "SessionConnectionString":  "http://some_machine:5985/wsman?PSVersion=5.1.14393.1715",
                        "RunspaceID":  "044d7c40-1de2-4c20-ad74-3745c3d99ac3",
                        "ProcessID":  2412,
                        "ClientIP":  "169.68.128.128",
                        "SessionUser":  "DOMAIN\\sessionuser",
                        "RunAsUser":  "DOMAIN\\runasuser"
                    },
    "CmdInvocationInfo":  {
                              "InvocationName":  "Enable-CCILyncUser",
                              "BoundParameters":  {
                                                      "Username":  "jdoe"
                                                  },
                              "UnboundArguments":  [

                                                   ],
                              "ScriptLineNumber":  0,
                              "OffsetInLine":  0,
                              "HistoryId":  5,
                              "ScriptName":  "",
                              "Line":  "",
                              "PositionMessage":  "",
                              "PSScriptRoot":  "",
                              "PSCommandPath":  null,
                              "PipelineLength":  2,
                              "PipelinePosition":  1,
                              "ExpectingInput":  false,
                              "CommandOrigin":  0,
                              "DisplayScriptPosition":  null
                          },
    "LogInvocationInfo":  {
                              "InvocationName":  "Add-EndpointLogEntry",
                              "ScriptLineNumber":  294,
                              "OffsetInLine":  25,
                              "HistoryId":  5,
                              "ScriptName":  "C:\\some_path\\Functions\\Lync.ps1",
                              "Line":  "                        Add-EndpointLogEntry -WriteDebug -Message \"User, $Username, is already Lync-enabled.\"\r\n",
                              "PositionMessage":  "At C:\\some_path\\Functions\\Lync.ps1:294 char:25\r\n+ ...             Add-EndpointLogEntry -WriteDebug -Message \"User, $Usernam ...\r\n+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~",
                              "PSScriptRoot":  "C:\\some_path\\Functions",
                              "PSCommandPath":  "C:\\some_path\\Functions\\Lync.ps1",
                              "PipelineLength":  1,
                              "PipelinePosition":  1,
                              "ExpectingInput":  false,
                              "CommandOrigin":  1,
                              "DisplayScriptPosition":  null
                          }
}

As you can see this is a standard windows event but the Message body is all JSON. Automatic Field Discovery is capable of pulling out many of these fields automatically but the values for the fields typically include the quotes and commas that are a part of the JSON syntax (i.e. ClientIP = "169.68.128.128",).

I am able to successfully create search time field extractions using regex but as I understand it the only way you can see those is if you are using Smart/Verbose mode which will in turn cause automatic field discovery to occur which means I will get duplicate values, one formatted correctly and one incorrectly. If I use the same ClientIP field name, those two values both show up under ClientIP which is just as confusing as using a different name for the field as I will then have a incorrectly formatted ClientIP and a correctly formatted ClientIPAddress.

So as I see it I need to figure out how to do one of two things. Either I need to find a way to do search time field extractions while preventing automatic field discovery displaying the fields I have custom extractions for or I need to find a way to get automatic field discovery to properly parse the nested JSON. (Or just figure out how to manipulate the data in a search and save the searches, again though that is not ideal.)

I would also be interested in a solution that involves index time field extractions but that of course is only recommended as a last resort due to the performance impact. That said, I don't know that this system would generate enough logs for that performance impact to be noticeable in any way.

Please note that I do not have Splunk admin access, but I do have admin access to the machine the forwarder is on and can modify the .conf files if needed. Also, I'm a bit of a noob to Splunk. All I've really done is take Power Users course and have been given access to Splunk accordingly. So apologies if I am missing something basic here.

Thanks for you time,

EDIT

Shortly after posting this last Friday I came up with the following search that provides the correct fields in the correct formats:

sourcetype="WinEventLog:PowerShell-Endpoint-IMS-APISession" |
fields host, source, sourcetype, LogName, SourceName, EventCode, EventType, Type, ComputerName, TaskCategory, OpCode, RecordNumber, Keywords, Message |
spath input=Message output=EventMessage path=Message |
spath input=Message output=CorrelationId path=CorrelationId |
spath input=Message output=SessionConfigurationname path=SessionInfo.SessionConfigurationName |
spath input=Message output=SessionConnectionString path=SessionInfo.SessionConnectionString |
spath input=Message output=RunspaceID path=SessionInfo.RunspaceID |
spath input=Message output=ProcessID path=SessionInfo.ProcessID |
spath input=Message output=ClientIP path=SessionInfo.ClientIP |
spath input=Message output=SessionUser path=SessionInfo.SessionUser |
spath input=Message output=RunAsUser path=SessionInfo.RunAsUser |
spath input=Message output=CmdInvocationName path=CmdInvocationInfo.InvocationName |
spath input=Message output=CmdBoundParameters path=CmdInvocationInfo.BoundParameters |
spath input=Message output=CmdUnboundParameters path=CmdInvocationInfo.UnboundParameters |
spath input=Message output=CmdScriptLineNumber path=CmdInvocationInfo.ScriptLineNumber |
spath input=Message output=CmdScriptInLineOffset path=CmdInvocationInfo.OffsetInLine |
spath input=Message output=CmdHistoryId path=CmdInvocationInfo.HistoryId |
spath input=Message output=CmdScriptName path=CmdInvocationInfo.ScriptName |
spath input=Message output=CmdLine path=CmdInvocationInfo.Line |
spath input=Message output=CmdPositionMessage path=CmdInvocationInfo.PositionMessage |
spath input=Message output=CmdPSScriptRoot path=CmdInvocationInfo.PSScriptRoot |
spath input=Message output=CmdPSCommandPath path=CmdInvocationInfo.PSCommandPath |
spath input=Message output=CmdPipelineLength path=CmdInvocationInfo.PipelineLength |
spath input=Message output=CmdPipelinePosition path=CmdInvocationInfo.PipelinePosition |
spath input=Message output=CmdExpectingInput path=CmdInvocationInfo.ExpectingInput |
spath input=Message output=CmdCommandOrigin path=CmdInvocationInfo.CommandOrigin |
spath input=Message output=CmdDisplayScriptPosition path=CmdInvocationInfo.DisplayScriptPosition |
spath input=Message output=LogInvocationName path=LogInvocationInfo.InvocationName |
spath input=Message output=LogScriptLineNumber path=LogInvocationInfo.ScriptLineNumber |
spath input=Message output=LogScriptInLineOffset path=LogInvocationInfo.OffsetInLine |
spath input=Message output=LogHistoryId path=LogInvocationInfo.HistoryId |
spath input=Message output=LogScriptName path=LogInvocationInfo.ScriptName |
spath input=Message output=LogLine path=LogInvocationInfo.Line |
spath input=Message output=LogPositionMessage path=LogInvocationInfo.PositionMessage |
spath input=Message output=LogPSScriptRoot path=LogInvocationInfo.PSScriptRoot |
spath input=Message output=LogPSCommandPath path=LogInvocationInfo.PSCommandPath |
spath input=Message output=LogPipelineLength path=LogInvocationInfo.PipelineLength |
spath input=Message output=LogPipelinePosition path=LogInvocationInfo.PipelinePosition |
spath input=Message output=LogExpectingInput path=LogInvocationInfo.ExpectingInput |
spath input=Message output=LogCommandOrigin path=LogInvocationInfo.CommandOrigin |
spath input=Message output=LogDisplayScriptPosition path=LogInvocationInfo.DisplayScriptPosition |
fields - Message

I can now save this as a report and tell my team members to just run the report and then to narrow down their search results by clicking on the field names of interest and then finding the field value they want to search for and clicking on it which will add that criteria to the above search. However, I still feel this is a bit kludgy. As you can see the query is already quite long and having to work with that to narrow your search down to specific field/value pairs means adding more lines to that already too long query string. (It also means that for every new criteria added to the search you must scroll down a full page to see the results since the query string takes up a full page on screen.)

Now if that is the only way that can be done without adding index-time field extractions then I guess we will live with it. However, what I was hoping for is a way to add search-time field extractions such that my team can simply go to the search and type sourcetype="WinEventLog:PowerShell-Endpoint-IMS-APISession" and nothing more. This would give them a list of results with lots of well formatted data at which point they can then easily and intuitively add additional field/value pair criteria to the query without all the extra clicks (i.e. simply adding ClientIP=169.68.128.128 to the search string).

livehybrid
Builder

Hi!
If you run your search and pipe it into the spath command, does the ClientIP appear in the field list on the left?

index=main source=yourSourceType | spath

0 Karma

Sukisen1981
Champion

'If I use the same ClientIP field name, those two values both show up under ClientIP which is just as confusing as using a different name for the field as I will then have a incorrectly formatted ClientIP and a correctly formatted ClientIPAddress.'

what happens if i use something like
| eval ClientIP=XXX or | rex field=ClientIP (your regex) AND i also add
| eval ClientIP=XXX or | rex field=ClientIP (your regex) | where ClientIP !="yyy"

YYY would be something that would distinguish the raw ClientIP from your extracted ClientIP?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...