output from scripted input best practice question

mfrost8 · ‎06-04-2014

So I am writing a little python script that I intend to run as a scripted input. The script will collect information about what ports are assigned in Apache based on config file entries.

I was originally going to write something with ps-style tabular output such as

PORT     TYPE     COMMENT
123      apache   "important app 1"
456      apache   "important app 2"

and so on. That would mean that I'd need to use 'multikv' to pick values out, I guess.

However, since this is intended to be a scripted input for Splunk and not really a generic tool that anyone other than Splunk would use, I was thinking it's perhaps kind of silly to create the output this way rather than making the python script something more Splunk-friendly such as

port=123 type=apache comment="important app 1"
port=456 type=apache comment="important app 2"

I suspect I'd still need to use 'multikv' though, since all of that is still going to look like a single event to Splunk.

In any case, my question is ultimately what would be considered a best practice in this case? That is, where I am writing scripted input and it's just for Splunk and I completely control the output format. I realize the answer is kind of "Splunk can handle either format", but I'm not sure if one or the other is ultimately better either in terms of Splunk performance or in terms of creating simpler searches.

Thanks

lguinn2 · ‎06-04-2014

Here is what I would do:

1 - follow your second format; it looks great - I would add a timestamp at the beginning of each event as well

04-Jun-2014 20:18:23 PST port=456 type=apache comment="important app 2"

Remember to include the year AND the timezone in your timestamp!

2 - when indexing the data, set SHOULD_LINEMERGE=false in props.conf

This tells Splunk that each line is a separate event. If you end up with multi-line events, you can still tell Splunk how to parse them. You should set up props.conf so that no one has to use multikv to easily use your data.

mfrost8 · ‎06-04-2014

Thanks. I guess I'm confused though. In the case of 'ps' as scripted input Splunk provides the timestamp itself doesn't it? (As opposed to just a regular log file where I would need the timestamp for sure). And I also thought Splunk treated a scripted input as just one big event with a timestamp on that one event. If it represents multiple fields (as with 'ps') then you have to tell Splunk to break those lines up with multikv, right?

Or maybe I'm wrong about all of that...

output from scripted input best practice question

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!