Getting Data In

Best way to invoke python script that generates .csv files for lookup

beaumaris
Communicator

I have a python script that retrieves data from an external source and stores it in several .csv files. I have added the necessary information to transforms.conf and savedsearches.conf to use the lookup function in the search to find the data mappings. The .csv files are stored in the apps//lookups directory. This is working as expected. I plan to run the python program once per hour to refresh the data in the .csv files but I'm looking for the recommended way to do this.

Questions: - What is the best way to run the script on a schedule? - Is there a specific entry I should make in savedsearches.conf? Should the script be placed in the apps//bin directory? - Is it advisable to use inputs.conf, send the tables to stdout and have splunk index them directly? (I really only want one copy of the data, it is not time-based) - When performing the lookups, does splunk cache the .csv data? - If the .csv file is updated on the fly, does splunk know to refresh it's internal representation? - Is there a reduction in efficiency if the lookup tables grow very large? I expect 10K-20K rows.

Tags (2)

hazekamp
Builder

beaumaris,

I would recommend setting up a scripted inputs for this in inputs.conf like so:

## inputs.conf
[script://$SPLUNK_HOME/etc/apps/<your_app_here>/bin/<your_script>.py]
disabled = false
## once per week on wednesday; using cron such that search doesn't execute @ start time
interval = 0 0 * * 3

For schedules, you can use an interval specified as # secs between executions, or a chron schedule. I think the approach you are using to generate a .csv and use as a lookup w/in Splunk is the correct one. I don't believe Splunk cache's the .csv data, so contents will be read from disk per invocation. Updates to the .csv should take immediate affect in Splunk. 10k-20k rows should not be a problem. There are considerations for distributed environments as the list will by default be replicated down to the indexers. If the list is interacted w/ via "| lookup" instead of props.conf you can add the csv to distsearch.conf replication blacklist and use "| lookup local=true" which will make the lookup local to your search server.

See also:

http://www.splunk.com/base/Documentation/latest/Admin/Inputsconf

http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...