Getting Data In

How to split differently ordered values into subfields on forwarder input?

jhumkey
Path Finder

I keep seeing hints that I can do what I need, but the examples always stop short, or aren't "quite right"

I'm receiving concatenated Value's from a PLC through Kepware (IDF for Splunk DataForwarder).

And need to split that tag Value into independent (sub)fields.

And (of course) different Tags, positioned different (sub)fields in different orders.

(There are too many Tags at too many sites, to standardize now.)

The simplified example:

Two input lines from same PLC . . .

9/22/15 1:34:38.861 PM  2015-09-22 18:34:38.861 +0000 Tag="BubbaPlant.PrintStation.PrintResponseTag" Value="00000728299444223335/XPP00000275435/LAN/019//230/0" Quality="good" PLC="001"
PLC = 001 host = bubba.whocares.com source = tcp:5002 sourcetype = generic_single_line

9/22/15 1:34:37.611 PM  2015-09-22 18:34:37.611 +0000 Tag="BubbaPlant.PrintStataion.PrintSubmissionTag" "Value="XPP00000275435/x/x/x/0/230/S1" Quality="good" PLC="001"
PLC = 001 host = bubba.whocares.com source = tcp:5002 sourcetype = generic_single_line

What I "wish" they looked like, was this . . .

9/22/15 1:34:38.861 PM  2015-09-22 18:34:38.861 +0000 Tag="BubbaPlant.PrintStation.PrintResponseTag" BC20="00000728299444223335" BC14="XPP00000275435" LANE="019" Other1="" Other2="230" Other3="0" Quality="good" PLC="001"
PLC = 001 host = bubba.whocares.com source = tcp:5002 sourcetype = generic_single_line

9/22/15 1:34:37.611 PM  2015-09-22 18:34:37.611 +0000 Tag="BubbaPlant.PrintStataion.PrintSubmissionTag" BC14="XPP00000275435" Other1="x" Other2="x" Other3="x" Other4="0" Other5="230" Other6="S1" Quality="good" PLC="001"
PLC = 001 host = bubba.whocares.com source = tcp:5002 sourcetype = generic_single_line

With the original "Value" split apart into independent (sub)fields, and each SHOWN ON THE INDEXED LINE that I see during searches.

I could then search for BC14="\*275435\*" and get ONLY the lines that have 275435 in the 14 character barcode and NOT the lines that happen to have that as a substring in the 20 character barcode.

I can do this on the search line . . . (autoformatting is butchering the next line)

source="tcp:5002" 00000728299444223335 OR XPP00000275435 | rex field=Value "(?<aaa>.*)/(?<bbb>.*)/(?<ccc>.*)/(?<ddd>.*)/(?<eee>.*)/(?<fff>.*)/(?<ggg>.*)"

That works, but that just puts "aaa"/"bbb"... on the "Interesting Fields" accumulation on the left. It doesn't put them back on the individual indexed lines.
And, it can't 'act differently' for each unique field order per line.

1 Do I do this "split on input" in a fields.conf configuration?
2 How do I specify, that though both use "Value" as their source field, the lines from "PrintSubmissionTag" should split into DIFFERENT fields than the Value from "PrintResponseTag" (I assume some sort of section header in fields.conf)?
3 I see the TOKENIZER example from http://docs.splunk.com/Documentation/Splunk/6.3.0/admin/Fieldsconf that might achieve the split, but, how do I then "name" the fields? (BC20, BC14, Other1, ...)? And will they persist in the Log-Indexed line, or just lump as a group in Interesting Fields on the left (where I don't need them)?
4 Or is it props.conf (with FIELDALIAS) I need to do this in, instead of fields.conf?

I do NOT want . . . a Chart, or a Table, or an aggregated list . . . I need the individual lines to stay as Log Lines . . . just split the single Value="" on input, into it's component (sub)fields. (Differently depending on the source tag name.)

Any pointers?

Thanks in advance.

0 Karma
1 Solution

somesoni2
SplunkTrust
SplunkTrust

To update the _raw data, you would need rex with SED option. See this run anywhere sample, based on your sample data.

| gentimes start=-1 | eval temp="2015-09-22 18:34:38.861 +0000 Tag=\"BubbaPlant.PrintStation.PrintResponseTag\" Value=\"00000728299444223335/XPP00000275435/LAN/019//230/0\" Quality=\"good\" PLC=\"001\"##2015-09-22 18:34:37.611 +0000 Tag=\"BubbaPlant.PrintStataion.PrintSubmissionTag\" \"Value=\"XPP00000275435/x/x/x/0/230/S1\" Quality=\"good\" PLC=\"001\"" | table temp | makemv temp delim="##" | mvexpand temp | eval _raw=temp 
| rex mode=sed "s/(\.PrintResponseTag\") Value=\"([^\/]*)\/([^\/]*)\/LAN\/([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\"(.*)/\1 BC20=\"\2\" BC14=\"\3\" LANE=\"\4\" Other1=\"\5\" Other2=\"\6\" Other3=\"\7\" \8/g"
| rex mode=sed "s/(\.PrintSubmissionTag\") \"Value=\"([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\"(.*)/\1 BC14=\"\2\" Other1=\"\3\" Other2=\"\4\" Other3=\"\5\" Other4=\"\6\" Other5=\"\7\" Other6=\"\8\" \9/g"

View solution in original post

0 Karma

somesoni2
SplunkTrust
SplunkTrust

To update the _raw data, you would need rex with SED option. See this run anywhere sample, based on your sample data.

| gentimes start=-1 | eval temp="2015-09-22 18:34:38.861 +0000 Tag=\"BubbaPlant.PrintStation.PrintResponseTag\" Value=\"00000728299444223335/XPP00000275435/LAN/019//230/0\" Quality=\"good\" PLC=\"001\"##2015-09-22 18:34:37.611 +0000 Tag=\"BubbaPlant.PrintStataion.PrintSubmissionTag\" \"Value=\"XPP00000275435/x/x/x/0/230/S1\" Quality=\"good\" PLC=\"001\"" | table temp | makemv temp delim="##" | mvexpand temp | eval _raw=temp 
| rex mode=sed "s/(\.PrintResponseTag\") Value=\"([^\/]*)\/([^\/]*)\/LAN\/([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\"(.*)/\1 BC20=\"\2\" BC14=\"\3\" LANE=\"\4\" Other1=\"\5\" Other2=\"\6\" Other3=\"\7\" \8/g"
| rex mode=sed "s/(\.PrintSubmissionTag\") \"Value=\"([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\"(.*)/\1 BC14=\"\2\" Other1=\"\3\" Other2=\"\4\" Other3=\"\5\" Other4=\"\6\" Other5=\"\7\" Other6=\"\8\" \9/g"
0 Karma

jhumkey
Path Finder

There's an extra \" in front of the 2nd Value. But after fixing that . . . it does split up my lines "In Search". I have no clue what the "gentimes" line does yet. (Or how explicitly specifying One Static Example . . . will help me with the generic/different cases later on.) I'd like to get all this back into pre-processing so it's stored as the split fields. (I'm trying to idiot proof it . . . if I leave all these long quoted regex on the line for the end user . . . they'll surely accidentally edit it and screw something up.)
At least . . . I see my changes ON the line now. That's a substantial step in the right direction. Thanks for your time and answer.

0 Karma

jhumkey
Path Finder

For anyone following . . . I'm now guessing the 'eval temp' part with the static string, was to give the poster static data to test with (since they don't have access to the live feed data I have). And I was confused by the \1 inclusion. But that's matching the Tag field with the first set of parenthesis (so that first match for the explicit Tag, is what makes that rex statement and its sed, "specific" to THAT tag, and you have to then reinsert the found Tag). I was able to add a third section (for yet a different line, where the original PLC dev's ordered the Value in yet another order). So that's all good. I still need to figure out if this can be placed somewhere (props/fields/input/... in a .conf file, to do all this at input time, not at search time. #1 so the end users can't mess it up, and #2 so those BC20 type fields, then become "searchable" fields. Thanks again.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...