Getting Data In

Best way to invoke python script that generates .csv files for lookup

beaumaris
Communicator

I have a python script that retrieves data from an external source and stores it in several .csv files. I have added the necessary information to transforms.conf and savedsearches.conf to use the lookup function in the search to find the data mappings. The .csv files are stored in the apps//lookups directory. This is working as expected. I plan to run the python program once per hour to refresh the data in the .csv files but I'm looking for the recommended way to do this.

Questions: - What is the best way to run the script on a schedule? - Is there a specific entry I should make in savedsearches.conf? Should the script be placed in the apps//bin directory? - Is it advisable to use inputs.conf, send the tables to stdout and have splunk index them directly? (I really only want one copy of the data, it is not time-based) - When performing the lookups, does splunk cache the .csv data? - If the .csv file is updated on the fly, does splunk know to refresh it's internal representation? - Is there a reduction in efficiency if the lookup tables grow very large? I expect 10K-20K rows.

Tags (2)

hazekamp
Builder

beaumaris,

I would recommend setting up a scripted inputs for this in inputs.conf like so:

## inputs.conf
[script://$SPLUNK_HOME/etc/apps/<your_app_here>/bin/<your_script>.py]
disabled = false
## once per week on wednesday; using cron such that search doesn't execute @ start time
interval = 0 0 * * 3

For schedules, you can use an interval specified as # secs between executions, or a chron schedule. I think the approach you are using to generate a .csv and use as a lookup w/in Splunk is the correct one. I don't believe Splunk cache's the .csv data, so contents will be read from disk per invocation. Updates to the .csv should take immediate affect in Splunk. 10k-20k rows should not be a problem. There are considerations for distributed environments as the list will by default be replicated down to the indexers. If the list is interacted w/ via "| lookup" instead of props.conf you can add the csv to distsearch.conf replication blacklist and use "| lookup local=true" which will make the lookup local to your search server.

See also:

http://www.splunk.com/base/Documentation/latest/Admin/Inputsconf

http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...