Hi!
Our Customer needs to check data coming from 4-5 millions unique SIM and detect SIMs not sending data recently.
Which is the best approach? I can get the SIM catalogue with a scheduled dbxquery, but better to user a csv lookup or KVStore?
Thanks for suggestions!
Marco
For large datasets you should be better off with the KV Store. CSV files get rewritten entirely on every update, the KV Store allows targeted updates.
For large datasets you should be better off with the KV Store. CSV files get rewritten entirely on every update, the KV Store allows targeted updates.
That's rewriting the entire collection because you're telling splunk "here's a search result, now write that into this lookup". From the search language, there is no targeted insert/update/delete - you'll need to descend into the REST API for that.
From the search language, you can only fall back to loading the entire collection and writing out the entire collection, hoping that it'll be smart enough to not actually update unchanged entries:
base search | stats latest(_time) as last_connect latest(status) as status by SIM
| inputlookup append=t SIM-lookup
| stats first(_*) as _* first(*) as * by SIM
| outputlookup SIM-lookup
Martin,
thanks for clarification. I was confused by this example in docs:
| inputlookup csvcoll_lookup | search _key=544948df3ec32d7a4c1d9755 | eval CustName="Marge Simpson" | eval CustCity="Springfield" | outputlookup csvcoll_lookup append=True
Hope thiss will have decent performances with a global SIM cathalogue of 4.5Millions SIMs and growing! This is for a big companing managing Auto insurance satellite data!
Marco
That might be a new feature 🙂
Martin,
I knew KVStore was the right answer, but how?
Here's a schema I wrote down but's not 100% working in the update part. I made some tests using the oidemo index from the oidemo app, using the mdn field as SIM id.
Here's my collection
[SIM-cathalogue]
field.SIM = string
field.last_connect = time
field.status = string
accelerated_fields.SIMaccelerated = {"SIM":1, "last_connect":1,"status":1}
with the following lookup defined:
[SIM-lookup]
collection = SIM-cathalogue
external_type = kvstore
fields_list = SIM,last_connect,status
Here are the steps I tried:
1) create master SIM repository
index=oidemo mdn=* | dedup mdn | fields mdn, _time | rename mdn as SIM| eval status="WARN" |eval last_connect=_time |table SIM,last_connect, status| outputlookup SIM-lookup
2) update every 5 minutes the KVStore with the SIM (mdn) that sent data in the last 5m:
index=oidemo mdn=* | dedup mdn | fields mdn, _time | rename mdn as SIM | lookup SIM-lookup SIM|eval previous_connect=last_connect | eval last_connect=_time |eval oldstatus=status|eval status="OK"| table SIM,SIMKEY,_key,previous_connect, last_connect, status | outputlookup SIM-lookup append=True
The problem is that the second search completely overwrites the whole KVStore, instead of just updating the updated entries.
Where's the error?!
Marco