Getting Data In

Collecting data via a python script, later putting it into Splunk

tleyden
Explorer

We have some customers which are running into memory issues, and we need to provide them a script to collect several pieces of data:

  • Netstats for a particular pid (sudo netstat -apeen | grep -i app_name)

  • Application server stats which are available at our application server's REST endpoint which returns JSON

  • Overall memory stats (eg, top output) for a particular pid

and probably a few others.

It feels like a perfect job for Splunk! But .. it also feels a bit heavyweight to tell customers to install and configure a splunk forwarder. So I'm planning to take a "middle ground" approach:

  1. Ship them a python script that they would run, and which will have little or no 3rd party dependencies (single script, possibly even bundled as an exe)

  2. The python script will collect outputs mentioned above and put them in a directory structure

  3. The customer can then run the script to collect data, and then zip up the directory, and ship that back to us

  4. We somehow get the data into our own Splunk server to analyze it. (unzip, load somehow)

Here are my questions:

  • For #2 above, what is the best directory/file structure to use? Something like this?

    /netstat/
    timestamp1.txt (contains raw netstat output, anything else needed?)
    timestamp2.txt

    /sync-gateway
    timestamp1.txt (contains raw JSON, ditto)
    timestamp2.txt
    /top
    timestamp1.txt (contains raw top output, ditto)
    timestamp2.txt

  • For #4 above, what's the easiest way to get this data into splunk?

Also, any general guidelines on the approach would be very helpful.

Tags (1)
1 Solution

MuS
Legend

Hi tleyden,

basically there is nothing to recommend for #2, it is your script so do the directory structure like you prefer. Provide the content as JSON or CSV or Key=Value pairs - Splunk can handle those without trouble.

Regarding #4:
Setup an monitor in inputs.conf for some directory (http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureyourinputs) and put the zips inside of the directory. Splunk will unpack them and index the data.

Hope this helps ...

cheers, MuS

View solution in original post

MuS
Legend

Hi tleyden,

basically there is nothing to recommend for #2, it is your script so do the directory structure like you prefer. Provide the content as JSON or CSV or Key=Value pairs - Splunk can handle those without trouble.

Regarding #4:
Setup an monitor in inputs.conf for some directory (http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureyourinputs) and put the zips inside of the directory. Splunk will unpack them and index the data.

Hope this helps ...

cheers, MuS

tleyden
Explorer

Thanks, that is helpful.

it is your script so do the directory
structure like you prefer. Provide the
content as JSON or CSV or Key=Value
pairs - Splunk can handle those
without trouble.

Since I have three different types of information (netstat, sync-gateway, top), how can I "tag" these files such that they show up in splunk in such a way that I can say things like "show me all the netstat readings, but ignore the other stuff"?

0 Karma

MuS
Legend
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...