That's a good reduction with the stats values, I assume the simple host count is significantly less than the 1.9m rows, so although you will have the same number of qids per host, the lookup command count will be reduced - it would be interesting to compare the job inspector details between the two searches. As for KV store replication - a KV store on the search head is not a KV store on the indexer, instead a CSV is transferred to the indexers, so any accelerations in the KV are lost and you are simply using CSV lookups on the indexer. I am not sure how a 250MB CSV on the indexer will be handled - if it works the same way as on the SH, it exceeds the max_memtable_bytes value discussed in limits.conf https://docs.splunk.com/Documentation/Splunk/9.2.0/Admin/Limitsconf#.5Blookup.5D so I imagine it will then be "indexed" (as in a file system index) on disk. If you are not running in Splunk Cloud you may want to try a local CSV lookup and play around with the limits.conf settings for your app. If you have the time, you might also want to experiment with using the eval lookup() statement and use a split CSV. For example, you could split the CSV into 10 * 25MB files to stay under the existing threshold and partition the QIDs into each lookup. Then you could do some weird SPL like | eval p=partition_logic(QID)
| eval output_json=case(
p=1, lookup("qid_partition_1.csv", json_object("QID", QID), json_array("field1", "field2"),
p=2, lookup("qid_partition_2.csv", json_object("QID", QID), json_array("field1", "field2"),
p=3, lookup("qid_partition_3.csv", json_object("QID", QID), json_array("field1", "field2"),
p=4, lookup("qid_partition_4.csv", json_object("QID", QID), json_array("field1", "field2"),
p=5, lookup("qid_partition_5.csv", json_object("QID", QID), json_array("field1", "field2"),
p=6, lookup("qid_partition_6.csv", json_object("QID", QID), json_array("field1", "field2"),
p=7, lookup("qid_partition_7.csv", json_object("QID", QID), json_array("field1", "field2"),
p=8, lookup("qid_partition_8.csv", json_object("QID", QID), json_array("field1", "field2"),
p=9, lookup("qid_partition_9.csv", json_object("QID", QID), json_array("field1", "field2"),
p=10, lookup("qid_partition_10.csv", json_object("QID", QID), json_array("field1", "field2"))
technically this could work, but whether you would see any improvements, I have no idea. Was that 50 seconds using local lookup or standard lookup. If it was normal (remote) then try using local so it does use the local KV store
... View more