Solved: Collecting data via a python script, later putting...

tleyden · ‎06-11-2015

We have some customers which are running into memory issues, and we need to provide them a script to collect several pieces of data:

Netstats for a particular pid (sudo netstat -apeen | grep -i app_name)
Application server stats which are available at our application server's REST endpoint which returns JSON
Overall memory stats (eg, top output) for a particular pid

and probably a few others.

It feels like a perfect job for Splunk! But .. it also feels a bit heavyweight to tell customers to install and configure a splunk forwarder. So I'm planning to take a "middle ground" approach:

Ship them a python script that they would run, and which will have little or no 3rd party dependencies (single script, possibly even bundled as an exe)
The python script will collect outputs mentioned above and put them in a directory structure
The customer can then run the script to collect data, and then zip up the directory, and ship that back to us
We somehow get the data into our own Splunk server to analyze it. (unzip, load somehow)

Here are my questions:

For #2 above, what is the best directory/file structure to use? Something like this?

/netstat/
timestamp1.txt (contains raw netstat output, anything else needed?)
timestamp2.txt

/sync-gateway
timestamp1.txt (contains raw JSON, ditto)
timestamp2.txt
/top
timestamp1.txt (contains raw top output, ditto)
timestamp2.txt
For #4 above, what's the easiest way to get this data into splunk?

Also, any general guidelines on the approach would be very helpful.

MuS · ‎06-11-2015

Hi tleyden,

basically there is nothing to recommend for #2, it is your script so do the directory structure like you prefer. Provide the content as JSON or CSV or Key=Value pairs - Splunk can handle those without trouble.

Regarding #4:
Setup an monitor in inputs.conf for some directory (http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureyourinputs) and put the zips inside of the directory. Splunk will unpack them and index the data.

Hope this helps ...

cheers, MuS

View solution in original post

MuS · ‎06-11-2015

Hi tleyden,

basically there is nothing to recommend for #2, it is your script so do the directory structure like you prefer. Provide the content as JSON or CSV or Key=Value pairs - Splunk can handle those without trouble.

Regarding #4:
Setup an monitor in inputs.conf for some directory (http://docs.splunk.com/Documentation/Splunk/latest/Data/Configureyourinputs) and put the zips inside of the directory. Splunk will unpack them and index the data.

Hope this helps ...

cheers, MuS

tleyden · ‎06-12-2015

Thanks, that is helpful.

it is your script so do the directory
structure like you prefer. Provide the
content as JSON or CSV or Key=Value
pairs - Splunk can handle those
without trouble.

Since I have three different types of information (netstat, sync-gateway, top), how can I "tag" these files such that they show up in splunk in such a way that I can say things like "show me all the netstat readings, but ignore the other stuff"?

MuS · ‎06-12-2015

almighty docs can help http://docs.splunk.com/Documentation/Splunk/6.2.3/Knowledge/Abouttagsandaliases 🙂

Collecting data via a python script, later putting it into Splunk

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life