Splunk Search

How to normalize MAC address format?

jeff
Contributor

We have different log sources that may format the MAC address as:

 af:af:af:af:af:af  
 af-af-af-af-af-af  
 af.af.af.af.af.af  
 afafafafafaf  

In order to search for a MAC address across these sources, I created added

[mac]
# matches a valid media access control (ethernet) address 
# Extracts: mac address in format af:af:af:af:af:af
REGEX = \b([0-9A-Fa-f]{2})[:\- \.]?([0-9A-Fa-f]{2})[:\- \.]?([0-9A-Fa-f]{2})[:\- \.]?([0-9A-Fa-f]{2})[:\- \.]?([0-9A-Fa-f]{2})[:\- \.]?([0-9A-Fa-f]{2})\b
FORMAT = mac::$1:$2:$3:$4:$5:$6
MV_ADD = true

to my transforms.conf so that the format would be normalized. As a result, each log entry with a valid MAC address is extracted with the literal "$1:$2:$3:$4:$5:$6". If I set the format to any of the individual matches ($1 through $6) the correct match is returned. Bug? Feature? An id10t / pebsak issue?

Tags (2)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I have found that while the format you did above works for index-time extractions, it does not work for search time. Possibly a "bug".

I've searched for things like this by instead creating a macro, e.g.:

[mac_addr(6)]
args = one,two,three,four,five,six
definition = "$one$-$two$-$three$-$four$-$five$-$six$" OR "$one$:$two$:$three$:$four$:$five$:$six$" OR "$one$.$two$.$three$.$four$.$five$.$six$"

and searching with it like:

sourcetype=mysourcetype `mac_addr(ab,cd,ef,01,23,45)`

View solution in original post

0 Karma

jotne
Builder

Search time normalization of mac addresses. This will convert any mac format to XX:XX:XX:XX:XX:XX 

One liner:

 

| rex mode=sed field=mac "s/[^0-9a-fA-F]//g s/(..)(..)(..)(..)(..)(..)/\1:\2:\3:\4:\5:\6/g y/abcdef/ABCDEF/"

 

Other versions:

 

| rex mode=sed field=mac "s/[^0-9a-fA-F]//g"
| eval mac=upper(replace(mac, "(..)(..)(..)(..)(..)(..)", "\1:\2:\3:\4:\5:\6"))

 

 

| rex mode=sed field=mac "s/[^0-9a-fA-F]//g s/(..)(..)(..)(..)(..)(..)/\1:\2:\3:\4:\5:\6/g"
| eval mac=upper(mac)

 

0 Karma

cmeo
Contributor

Found the answer, but not where I was expecting. It seems--correct me if I'm wrong--that SEDCMD does not work on events forwarded from another system. I moved my SEDCMD from the index host--where it was NOT working--to the forwarder, and presto! it came good.

This is not documented as far as I can see, unless it's buried somewhere in the discussion of the pipeline. I'm quite happy to subst the whole output from getmac.exe, since I throw away most of it and keep the mac address--the field extraction on the index server DOES work.

This would be nice to have in the windows app, so you can align mac addresses in a multi-platform shop. Also, if windows had utilities like sed built-in, none of this would be necessary since you could do it all in the script, like on a real OS.

Anyway, the complete solution if anyone is interested:

On windows host--input and rewrite

-- create $SPLUNK\bin\scripts\getmac.cmd

@echo off
getmac.exe /nh

-- add an input to etc\system\local\inputs.conf

[script://C:\Program Files\Splunk\bin\scripts\getmac.cmd]
disabled = false
interval = 60
sourcetype = getmac
index = my_index

-- add sedcmd to etc\system\local\props.conf

[getmac]
TRANSFORMS-routing=route_bering # added by forwarder config
SEDCMD=s/-/:/g

On indexer: field extraction

EXTRACT-macaddr = (?P<mac_addr>\w{2}:\w{2}:\w{2}:\w{2}:\w{2}:\w{2})

Result: maximum joy

19/08/2011 11:44:06.000 00:16:3E:23:E8:09 \Device\Tcpip_{60AE15D3:02BD:4E10:8D86:D1FECF394DAB}

host=barents.remora.com.au sourcetype=getmac source=C:\Program Files\Splunk\bin\scripts\getmac.cmd mac_addr=00:16:3E:23:E8:09

cmeo
Contributor

yet another variation on theme. The above solution only works on a heavy forwarder. Now that I've deployed some Universal Forwarders, you CAN'T use sedcmd there, evidently because the apparatus to make it go isn't there.
HOWEVER, now it DOES work on the indexer. Can someone from splunk please explain these interactions?

0 Karma

cmeo
Contributor

Looks like the forum software has eaten the backslash characters in the above post. Sigh.

0 Karma

cmeo
Contributor

Is there no simpler way to do this? Grappling with the same problem...

I tried this in props.conf but it didn't do anything:

SEDCMD-fixmac=s/-/:/g

This is index-time, which is fine...if it worked.

Input is from windows, which provides aa-bb-cc-dd-ee-ff

0 Karma

Jason
Motivator

This can be solved by a scripted lookup, and you can apply it to entire sources or sourcetypes. (Be sure to extract the mac as messy_mac or change that to your non-normalized-mac field name.)

bin/normalizemac.py

# MAC address normalization script
# Turns a MAC in any of the following formats:
#      000000000000
#      00 00 00 00 00 00
#      00.00.00.00.00.00
#      00-00-00-00-00-00
#
# Into 00:00:00:00:00:00 format.
#

import sys
import csv
import re

def to_dict(d):
#######################################################
# Given a list or tuple, return a dictionary that
# has each value set to the name of the key.
# e.g.:      ( 'foo', 'bar' )
# becomes:   { 'foo': 'foo', 'bar': 'bar' }
#######################################################
        ret = {}
        for i in d:
                ret[i] = i
        return ret

# Read
data = csv.DictReader(sys.stdin)
fields = data.fieldnames
write = csv.DictWriter(sys.stdout, fields)
write.writerow(to_dict(fields))

for row in data:
        output = {}
        input = row['input']
        m = re.search("([0-9a-fA-F]{2})[-.:\s]?([0-9a-fA-F]{2})[-.:\s]?([0-9a-fA-F]{2})[-.:\s]?([0-9a-fA-F]{2})[-.:\s]?([0-9a-fA-F]{2})[-.:\s]?([0-9a-fA-F]{2})", input)
        mac = m.group(1) + ":" + m.group(2) + ":" + m.group(3) + ":" + m.group(4) + ":" + m.group(5) + ":" + m.group(6)
        mac = mac.lower()
        output['input'] = input
        output['mac'] = mac
        write.writerow(output)

local/transforms.conf

[normalizemac]
external_cmd = normalizemac.py input mac
fields_list = input,mac

local/props.conf

[datastreamname]
LOOKUP-normalizemac = normalizemac input AS messy_mac OUTPUTNEW mac
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I have found that while the format you did above works for index-time extractions, it does not work for search time. Possibly a "bug".

I've searched for things like this by instead creating a macro, e.g.:

[mac_addr(6)]
args = one,two,three,four,five,six
definition = "$one$-$two$-$three$-$four$-$five$-$six$" OR "$one$:$two$:$three$:$four$:$five$:$six$" OR "$one$.$two$.$three$.$four$.$five$.$six$"

and searching with it like:

sourcetype=mysourcetype `mac_addr(ab,cd,ef,01,23,45)`
0 Karma

jeff
Contributor

Yeah, that works... though I'd like it better if it worked using the method in my example above... Think you're right. I'll put it in as an enhancement request as soon as we become paying customers.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Actually, it's really not a bug, it's an expected behavior due to limitations on how extracted fields are searched for in the index. Might be a useful enhancement request though.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...