Getting Data In

VMware ESXi vmkernel error search

HCadmins
Communicator

Hi Splunkers.

A year ago we had a hardware issue that disabled our operation for 24 hours. The VMware vmkernel error looked like this:

2015-11-09T21:55:08.687Z cpu28:37026)MCE: 222: cpu28: bank7: status=0x8c00004000010090: (VAL=1, OVFLW=0, UC=0, EN=0, PCC=0, S=0, AR=0), ECC=no, Addr:0x1425a5200 (valid), Misc:0x42ef6f0000 (valid)

Now that we have Splunk, I am trying to set up a search that would specifically track these errors. I want the date/time, the CPU, and keyword "MCE"

I borrowed and modified a search from the VMware app that looks like this

sourcetype=vmware:esxlog:vmkernel *  * * * * * * | head 10000 | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message (if any)"

My first question is, what are all those * for? I know that an * is a wildcard, but for the VMware app, what does the multiple *'s do?

For the two rex fields, I used the field extractor and extracted the cpu28:37026 part from the above log, but I also want the MCE: part.

My search mostly works. I am getting the time, host, that CPU field, and then a message that doesn't usually contain the MCE errors (or anything useful). How do I make it show time, host, CPU And then either an MCE or MCA error, but only if an MCE or MCA error exists.

Thanks in advance!

0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

You are right about those stars: I don't know why the extras are in they're but I know they're not needed. Also, the head 20 is defeating the purpose by only showing you the last 20. So here's what I'd suggest:

First, try the simple search

sourcetype=vmware:esxlog:vmkernel (MCE OR MCA OR Error)

That should return the events with MCE, MCA or the word Error in it. Might be precisely what you need, but that also might return a few spurious "error" lines, too.

Another option is to build a new field out of where that shows up, and the only display the events that had it. Here's one way to do it:

sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=*

So that looks for the literal string "cpu" followed by some digits ( cpu\d+ as in cpu28), a colon and more digits ( :\d+ like :65656), a closing parenthesis and then either the string MCE or MCA. You could add |Error as a third option too by making that last piece (?<errcode>MCE|MCA|Error). (The pipe INSIDE the regex is a regex "or".) The last piece | search errcode=* says search for where errcode is set to something. With this search, you should get back all the events with MCA, MCE or Error (assuming you added error!) but nothing else. These are, if I'm right, the events of interest to you.

Once you have the right (hopefully few) events displaying - just the ones with the error - you can start adding the rest of your search back in one piece at a time - they seem reasonably straightforward and a little thinking on them will probably be all you need to understand what they do. But if one's still not obvious to you, ask away!

View solution in original post

0 Karma

Richfez
SplunkTrust
SplunkTrust

You are right about those stars: I don't know why the extras are in they're but I know they're not needed. Also, the head 20 is defeating the purpose by only showing you the last 20. So here's what I'd suggest:

First, try the simple search

sourcetype=vmware:esxlog:vmkernel (MCE OR MCA OR Error)

That should return the events with MCE, MCA or the word Error in it. Might be precisely what you need, but that also might return a few spurious "error" lines, too.

Another option is to build a new field out of where that shows up, and the only display the events that had it. Here's one way to do it:

sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=*

So that looks for the literal string "cpu" followed by some digits ( cpu\d+ as in cpu28), a colon and more digits ( :\d+ like :65656), a closing parenthesis and then either the string MCE or MCA. You could add |Error as a third option too by making that last piece (?<errcode>MCE|MCA|Error). (The pipe INSIDE the regex is a regex "or".) The last piece | search errcode=* says search for where errcode is set to something. With this search, you should get back all the events with MCA, MCE or Error (assuming you added error!) but nothing else. These are, if I'm right, the events of interest to you.

Once you have the right (hopefully few) events displaying - just the ones with the error - you can start adding the rest of your search back in one piece at a time - they seem reasonably straightforward and a little thinking on them will probably be all you need to understand what they do. But if one's still not obvious to you, ask away!

0 Karma

Richfez
SplunkTrust
SplunkTrust

By the way, assuming all the rest of the search is needed (I don't think the two "rex" statements you have are, but there's no way I can know that for sure), your whole search would be

sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=* | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message" 
0 Karma

Richfez
SplunkTrust
SplunkTrust

Oh, and if you don't need the "sublogger" field for this nor the various CPU and CPU_Message fields, you can skip all that.

sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=* | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message"
0 Karma

HCadmins
Communicator

Thank you so much for your thoughtful and complete answer!

0 Karma

HCadmins
Communicator

Okay, I solved the first part of my problem. Just needed to add | where Message NOT null.

Here's my current search string

sourcetype=vmware:esxlog:vmkernel * * * * * * * | head 20 | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message" | where Message NOT null

Now, I only want to display something if there are certain keywords in the Message field, like "MCE", "MCA" and "Error". I am not sure how to do that.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...