Getting Data In

Custom script input - How to let Splunk handle files through a custom script that will stream converted data to be indexed

guilmxm
SplunkTrust
SplunkTrust

Hi,

I'm currently working on an application that handles files with a very specific format Splunk cannot directly manage, data has to be converted through a third party script. (currently a perl script)

I would like to adapt the current configuration to let Splunk handle files (based on pattern) and call the 3rd party script which gets the file name as argument.

To sum up, my goal is:

  • Splunk watches for any new or updated file (as for any standard files input)
  • when a new file is available or a CRC file differs, Splunk calls the third party script with the file name as argument
  • The third party script streams the converted data that Splunk will index

I already have a functional third party script that does that job but could yet find the better to proceed as required

Thanks in advance for any help

Tags (2)
0 Karma
1 Solution

Ayn
Legend

Did you look into using "unarchive_cmd" for this? It sounds like it could solve your situation, even though you're not strictly "unarchiving" anything, but the principle should still be the same - Splunk detects a change, invokes the script, then ingests the data that the script outputs.

http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.
* <string> specifies the shell command to run to extract an archived source.
* Must be a shell command that takes input on stdin and produces output on stdout.
* Use _auto for Splunk's automatic handling of archive files (tar, tar.gz, tgz, tbz, tbz2, zip)
* Defaults to empty.

View solution in original post

guilmxm
SplunkTrust
SplunkTrust

For those who would be intesrested in such as case, here is how i got it to work as i need.

As few links that helped to implement a 3rd party script with the unarchive_cmd stanza:

http://answers.splunk.com/answers/7729/how-to-invoke-unarchive_cmd
http://blogs.splunk.com/2011/07/19/the-naughty-bits-how-to-splunk-binary-logfiles/
http://answers.splunk.com/answers/10501/python-script-as-unarchive_cmd-in-propsconf

Also i had to adapt my 3rd party script to be able to manage data from stdin instead of the filename as argument (eg. cat | myscript)
Depending on your case and script, you may want your script to stream the converted data that will be directly indexed by Splunk (the simplest) or you may need your script to generate csv file(s) that would be indexed by Splunk. (my case)

The configuration that worked as i need:

props.conf

  1. You need to declare a source stanza associated to your 3rd party script:

    [source::/pathtorawfiles/*.]
    invalid_cause = archive
    unarchive_cmd =
    sourcetype = mysourcetype
    NO_BINARY_CHECK = true

  2. In my case, my script generates several csv files (standard csv files with header) that Splunk will index, so i declared a second stanza. (you don't need this if your script outputs the data directly)

    [mydatasourcetype]

    FIELD_DELIMITER=,
    FIELD_QUOTE="
    HEADER_FIELD_LINE_NUMBER=1
    NO_BINARY_CHECK=1
    INDEXED_EXTRACTIONS=csv
    KV_MODE=none
    SHOULD_LINEMERGE=false
    pulldown_type=true

inputs.conf

  1. I declare a monitor associated to the raw data that need to be converted through my 3rd party script:

    [monitor:///pathtorawfiles/*.]
    disabled = false
    index = myindex
    sourcetype = mysourcetype

  2. As i my script generates csv files, i just want to index and delete them automatically:

    [batch:///*.csv]
    disabled = false
    move_policy = sinkhole
    recursive = false
    index = myindex
    sourcetype = mydatasourcetype

And that's it, works like a charm 🙂

This off course has to adapted to your requirement.

0 Karma

Ayn
Legend

Did you look into using "unarchive_cmd" for this? It sounds like it could solve your situation, even though you're not strictly "unarchiving" anything, but the principle should still be the same - Splunk detects a change, invokes the script, then ingests the data that the script outputs.

http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf

unarchive_cmd = <string>
* Only called if invalid_cause is set to "archive".
* This field is only valid on [source::<source>] stanzas.
* <string> specifies the shell command to run to extract an archived source.
* Must be a shell command that takes input on stdin and produces output on stdout.
* Use _auto for Splunk's automatic handling of archive files (tar, tar.gz, tgz, tbz, tbz2, zip)
* Defaults to empty.

guilmxm
SplunkTrust
SplunkTrust

Ayn,

Thank you very much for your clever suggestion, this indeed did the job as i need 🙂

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Nice idea, i'll check and let you know, thanks

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...