I am new to Splunk (6.3) and am interested in knowing a few things in addition to the original question:
A. Assuming I can connect to a locally residing MySQL database (5.7) and extract rows from the database is it more efficient to:
1. Have Splunk operate directly on the results of queries against the database OR
2. Have Splunk operate on the results of the query that are stored as a CSV file on the Splunk Server.
B. How do I estimate (ahead of time) the size of the index that will be created using either method.
Best way to query a database (local or remote) is using Splunk DBconnect (v2). DBconnectv2 will handle pooling and caching etc. It can import the table in block by block basis, so you can test plan before you load whole of the system. (You can operate the database like a lookup if you don't want to index it.)
Check to see if this works for you:
https://splunkbase.splunk.com/app/2686/
Unless you can automate the production of the CSV, the export of the CSV from MySQL, and the import into Splunk, then that becomes cumbersome. Also, consider the max limits of a CSV- not sure how big your datasets are.
Here is the method for estimating size (you'll need a sample dataset):
http://docs.splunk.com/Documentation/Splunk/6.5.0/Capacity/Estimateyourstoragerequirements
There are so many factors that go into efficiency in your scenario I would use the DB connect app and also the CSV method and see which you find easier.