Splunk Search

help with regex for windows event log

jbandautrgv
Engager

I'm trying to parse out data from an event log in xml format. I'm posting an example of two logs that are coming from the same eveng log (same sourcetype):

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <System><Provider Name='Microsoft-Windows-Base-Filtering-Engine-Connections' Guid='{121D3DA8-BAF1-4DCB-929F-2D4C9A47F7AB}'/>
    <EventID>2000</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2020-03-23T14:23:44.982049900Z'/>
    <EventRecordID>1238530</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1252' ThreadID='11720'/>
    <Channel>microsoft-windows-base-filtering-engine-connections/operational</Channel>
    <Computer>servername.fqdn</Computer>
    <Security UserID='S-1-5-19'/>
  </System>
  <EventData>
    <Data Name='ConnectionId'>13228601961099160992</Data>
    <Data Name='MachineAuthenticationMethod'>4</Data>
    <Data Name='RemoteMachineAccount'>machine.fqdn</Data>
    <Data Name='UserAuthenticationMethod'>2</Data>
    <Data Name='RemoteUserAcount'>domain\user</Data>
    <Data Name='RemoteIPAddress'>ipv6address</Data>
    <Data Name='LocalIPAddress'>ipv6address</Data>
    <Data Name='TechnologyProviderKey'>{1BEBC969-61A5-4732-A177-847A0817862A}</Data>
    <Data Name='IPsecTrafficMode'>1</Data>
    <Data Name='DHGroup'>0</Data>
    <Data Name='StartTime'>2020-03-23T14:23:44.969Z</Data>
  </EventData>
</Event>

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <System>
    <Provider Name='Microsoft-Windows-Base-Filtering-Engine-Connections' Guid='{121D3DA8-BAF1-4DCB-929F-2D4C9A47F7AB}'/>
    <EventID>2001</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2020-03-24T02:53:43.017501900Z'/>
    <EventRecordID>1284675</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1252' ThreadID='7796'/>
    <Channel>microsoft-windows-base-filtering-engine-connections/operational</Channel>
    <Computer>servername.fqdn</Computer>
    <Security UserID='S-1-5-19'/>
  </System>
  <EventData>
    <Data Name='ConnectionId'>13228601961099183464</Data>
    <Data Name='MachineAuthenticationMethod'>4</Data>
    <Data Name='RemoteMachineAccount'>clientname.fqdn</Data>
    <Data Name='UserAuthenticationMethod'>2</Data>
    <Data Name='RemoteUserAcount'>domain\user</Data>
    <Data Name='RemoteIPAddress'>ipv6addr</Data>
    <Data Name='LocalIPAddress'>ipv6addr</Data>
    <Data Name='TechnologyProviderKey'>{1BEBC969-61A5-4732-A177-847A0817862A}</Data>
    <Data Name='IPsecTrafficMode'>1</Data>
    <Data Name='BytesTransferredInbound'>34256</Data>
    <Data Name='BytesTransferredOutbound'>30672</Data>
    <Data Name='BytesTransferredTotal'>64928</Data>
    <Data Name='StartTime'>2020-03-24T02:33:00.492Z</Data>
    <Data Name='CloseTime'>2020-03-24T02:53:43.017Z</Data>
  </EventData>
</Event>

I have this in my props.conf

[directaccess:connections]
NO_BINARY_CHECK = 1
TIME_FORMAT = %a %b %d %H:%M:%S %T %Y
pulldown_type = 1
REPORT-xmlkv = xmlkv-alternative

and this in my transforms.conf

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

(found on another splunk answers post)

I'm really not sure how it works, but that is enough to exract the first section so that I end up with a Computer, Channel, Data, EventID, EventRecordID, Level, Opcode and Task field. Data just seems to contain the first of the "Data Name" fields.

The props.conf and transforms.conf seemed good enough to extract the top part contained inside "System", but not "EventData". For the botom "EventData" part, I tried with manual field extractions, first letting splunk pick one for me then trying to create the rest. I ended up with something like this:

^(?:[^=\n]*=){12}'\w+'>(?P[^<]+)

^(?:[^=\n]*=){15}'\w+'>(?P[^<]+)

For the fields, but using the count of characters (? I think that's what its doing) didn't always work because some fields were the same lenth and were giving me weird results.

At this point i"m ok with manually typing the field names, but I don't know how to build a proper query to extract the bottom part inside the "EventData" section. I was trying to do something like this (but this obviously didn't work):

^(?:[^=\n]*=)ConnectionID'\w+'>(?P[^<]+)

Unfortunately regex is my Achilles heel, so I appreciate any help I can get with this.

0 Karma

darrenfuller
Contributor

Hi jbandautrgv,

The easiest way to get the fields extracted on an xml is to use KV_MODE = xml in your props.conf

If you are determined to use props / transforms... i believe this works:

props:

[directaccess:connections]
 NO_BINARY_CHECK = 1
 TIME_FORMAT = %a %b %d %H:%M:%S %T %Y
 pulldown_type = 1
 REPORT-xmlkv = xmlkv-alternative
 REPORT-xmlkv2 = xmlkv-alternative2

transforms:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

[xmlkv-alternative2]
REGEX = <Data\sName='([^']+)'>([^<]+)<\/Data>
FORMAT = $1::$2

This adds a second extraction that matches the Data Name bits

Hope this helps

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...