Splunk Search

help with regex for windows event log

jbandautrgv
Engager

I'm trying to parse out data from an event log in xml format. I'm posting an example of two logs that are coming from the same eveng log (same sourcetype):

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <System><Provider Name='Microsoft-Windows-Base-Filtering-Engine-Connections' Guid='{121D3DA8-BAF1-4DCB-929F-2D4C9A47F7AB}'/>
    <EventID>2000</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2020-03-23T14:23:44.982049900Z'/>
    <EventRecordID>1238530</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1252' ThreadID='11720'/>
    <Channel>microsoft-windows-base-filtering-engine-connections/operational</Channel>
    <Computer>servername.fqdn</Computer>
    <Security UserID='S-1-5-19'/>
  </System>
  <EventData>
    <Data Name='ConnectionId'>13228601961099160992</Data>
    <Data Name='MachineAuthenticationMethod'>4</Data>
    <Data Name='RemoteMachineAccount'>machine.fqdn</Data>
    <Data Name='UserAuthenticationMethod'>2</Data>
    <Data Name='RemoteUserAcount'>domain\user</Data>
    <Data Name='RemoteIPAddress'>ipv6address</Data>
    <Data Name='LocalIPAddress'>ipv6address</Data>
    <Data Name='TechnologyProviderKey'>{1BEBC969-61A5-4732-A177-847A0817862A}</Data>
    <Data Name='IPsecTrafficMode'>1</Data>
    <Data Name='DHGroup'>0</Data>
    <Data Name='StartTime'>2020-03-23T14:23:44.969Z</Data>
  </EventData>
</Event>

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <System>
    <Provider Name='Microsoft-Windows-Base-Filtering-Engine-Connections' Guid='{121D3DA8-BAF1-4DCB-929F-2D4C9A47F7AB}'/>
    <EventID>2001</EventID>
    <Version>0</Version>
    <Level>4</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2020-03-24T02:53:43.017501900Z'/>
    <EventRecordID>1284675</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1252' ThreadID='7796'/>
    <Channel>microsoft-windows-base-filtering-engine-connections/operational</Channel>
    <Computer>servername.fqdn</Computer>
    <Security UserID='S-1-5-19'/>
  </System>
  <EventData>
    <Data Name='ConnectionId'>13228601961099183464</Data>
    <Data Name='MachineAuthenticationMethod'>4</Data>
    <Data Name='RemoteMachineAccount'>clientname.fqdn</Data>
    <Data Name='UserAuthenticationMethod'>2</Data>
    <Data Name='RemoteUserAcount'>domain\user</Data>
    <Data Name='RemoteIPAddress'>ipv6addr</Data>
    <Data Name='LocalIPAddress'>ipv6addr</Data>
    <Data Name='TechnologyProviderKey'>{1BEBC969-61A5-4732-A177-847A0817862A}</Data>
    <Data Name='IPsecTrafficMode'>1</Data>
    <Data Name='BytesTransferredInbound'>34256</Data>
    <Data Name='BytesTransferredOutbound'>30672</Data>
    <Data Name='BytesTransferredTotal'>64928</Data>
    <Data Name='StartTime'>2020-03-24T02:33:00.492Z</Data>
    <Data Name='CloseTime'>2020-03-24T02:53:43.017Z</Data>
  </EventData>
</Event>

I have this in my props.conf

[directaccess:connections]
NO_BINARY_CHECK = 1
TIME_FORMAT = %a %b %d %H:%M:%S %T %Y
pulldown_type = 1
REPORT-xmlkv = xmlkv-alternative

and this in my transforms.conf

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

(found on another splunk answers post)

I'm really not sure how it works, but that is enough to exract the first section so that I end up with a Computer, Channel, Data, EventID, EventRecordID, Level, Opcode and Task field. Data just seems to contain the first of the "Data Name" fields.

The props.conf and transforms.conf seemed good enough to extract the top part contained inside "System", but not "EventData". For the botom "EventData" part, I tried with manual field extractions, first letting splunk pick one for me then trying to create the rest. I ended up with something like this:

^(?:[^=\n]*=){12}'\w+'>(?P[^<]+)

^(?:[^=\n]*=){15}'\w+'>(?P[^<]+)

For the fields, but using the count of characters (? I think that's what its doing) didn't always work because some fields were the same lenth and were giving me weird results.

At this point i"m ok with manually typing the field names, but I don't know how to build a proper query to extract the bottom part inside the "EventData" section. I was trying to do something like this (but this obviously didn't work):

^(?:[^=\n]*=)ConnectionID'\w+'>(?P[^<]+)

Unfortunately regex is my Achilles heel, so I appreciate any help I can get with this.

0 Karma

darrenfuller
Contributor

Hi jbandautrgv,

The easiest way to get the fields extracted on an xml is to use KV_MODE = xml in your props.conf

If you are determined to use props / transforms... i believe this works:

props:

[directaccess:connections]
 NO_BINARY_CHECK = 1
 TIME_FORMAT = %a %b %d %H:%M:%S %T %Y
 pulldown_type = 1
 REPORT-xmlkv = xmlkv-alternative
 REPORT-xmlkv2 = xmlkv-alternative2

transforms:

[xmlkv-alternative]
REGEX = <([^\s\>]*)[^\>]*\>([^<]*)\<\/\1\>
FORMAT = $1::$2

[xmlkv-alternative2]
REGEX = <Data\sName='([^']+)'>([^<]+)<\/Data>
FORMAT = $1::$2

This adds a second extraction that matches the Data Name bits

Hope this helps

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...