How to parse tabular data from OpenVZ's /proc/user...

Steven_McGrath · ‎06-02-2010

I'm sure someone has figured out how to handle this data. What I am trying to do is index and extract all of the data in the table below. The data is a dump of /proc/user_beancounters and is one of the main places to get performance information for OpenVZ containers. I have a grand idea for generating dashboards with the information once everything is properly extracted.

The information is a touch trixy as one of the most important pieces of info (called the uid in the table, really called veid) specifies what container, or virtual machine the stats are for. That piece of info isn't on every row though, only on the rows that specify a different container from the line before it.

I'm not a regex wizard in this regard, however I know a lot of OpenVZ admins that would love to have this data for alerting, dashboard metrics, etc.

Note this is a dump of listing of approx 50-60 containers. the 0 container is the hardware node, and can be excluded. http://pastebin.com/SgPrtL87

gkanapathy · ‎06-02-2010

Two ways, I would say.

Nontraditional for Splunk, but probably the neatest way to handle this data:

[openvzbeancounters]
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^\s*\d+\:
EXTRACT-veid = ^\s*(?<veid>\d+)\:
SEDCMD-fields = s/(?-m)((?:^\s*\d+\:|[\r\n]+\s*))(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+/\1 \2_held=\3 \2_maxheld=\4 \2_barrier=\5 \2_limit=\6 \2_failcnt=\7/g
KV_MODE=auto

Your data will be transformed during input to a format more suitable for Splunk to deal with, and you'll get a record for each veid with each stat with a different name and you can do stuff like:

sourcetype=openvzbeancounters veid!=0 | stats avg(kmemsize_held), avg(kmemsize_maxheld)

Alternatively, if for some reason you need the original file format stored in Splunk, props.conf:

[openvzbeancounters]
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE = ^\s*\d+\:
EXTRACT-veid = ^\s*(?<veid>\d+)\:
REPORT-fields = ext_held,ext_maxheld,ext_barrier,ext_limit,ext_failcnt

Then, in transforms.conf

[ext_held]
REGEX = (?-m)(?:^\s*\d+\:|[\r\n]+\s*)(?<_KEY_1>\S+)\s+(?<_VAL_1>\S+)
MV_ADD = true

[ext_maxheld]
REGEX = (?-m)(?:^\s*\d+\:|[\r\n]+\s*)(?<_KEY_1>\S+)\s+\S+\s+(?<_VAL_1>\S+)
MV_ADD = true

[ext_barrier]
REGEX = (?-m)(?:^\s*\d+\:|[\r\n]+\s*)(?<_KEY_1>\S+)\s+(?:\S+\s+){2}(?<_VAL_1>\S+)
MV_ADD = true

[ext_limit]
REGEX = (?-m)(?:^\s*\d+\:|[\r\n]+\s*)(?<_KEY_1>\S+)\s+(?:\S+\s+){3}(?<_VAL_1>\S+)
MV_ADD = true

[ext_failcnt]
REGEX = (?-m)(?:^\s*\d+\:|[\r\n]+\s*)(?<_KEY_1>\S+)\s+(?:\S+\s+){4}(?<_VAL_1>\S+)
MV_ADD = true

Then, when you search or report on your data, you'll have to use mvindex to get at the individual fields for each counter:

sourcetype=openvzbeancounters veid=200 | eval kmemsize_held=mvindex(kmemsize,0)

or

sourcetype=openvzbeancounters veid=200 | timechart sum(eval(mvindex(kmemsize,4))) as kmemsize_failcnt_total

Where the index number corresponds to the order of the fields listed in the REPORT-fields clause, starting from zero.

How to parse tabular data from OpenVZ's /proc/user_beancounters

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life