I have a lookup with multiple columns (keys). Some combinations make a unique match, but I need an ambiguous search on a single key to return all matched items of a particular field. In a simplified form, the lookup is like this
QID | IP | Detected |
12345 | 127.0.0.1 | 2022-12-10 |
45678 | 127.0.0.1 | 2023-01-21 |
12345 | 127.0.0.2 | 2023-01-01 |
45678 | 127.0.0.2 | 2022-12-15 |
23456 | ... | ... |
QID and IP determines a unique Detected value; you can say the combination is a primary key. No problem with search by primary key. My requirement is to search by QID alone. For 12345, for example, I expect the return to be multivalued (2022-12-10, 2023-01-01).
If I hard code QID in an emulation, that's exactly what I get.
| makeresults
| eval QID=12345
| lookup mylookup QID
| table QID Detected
This will give me
QID | Detected |
12345 |
2022-12-10 |
But if use the same lookup in a search, e.g.,
index=myindex QID=12345
| stats count by QID ``` result is the same whether or not stats precedes lookup ```
| lookup mylookup QID
| table QID Detected
the result is blank
QID | Detected |
12345 |
|
The behavior can be more complex if the search returns more than one QID (e.g., QID IN (12345, 45678)). Sometimes one of them will get Detected populated, but not others.
How can I make sure multiple matches are all returned?
It looks like it's related to the size of the lookup, which exceeds max_memtable_bytes (default 25MB) and there is a memory issue, which can give somewhat random results.
When a lookup exceeds that value, Splunk uses an indexing mechanism to access the lookup, but it seems that does not work properly.
[lookup]
max_memtable_bytes = 52428800
It looks like it's related to the size of the lookup, which exceeds max_memtable_bytes (default 25MB) and there is a memory issue, which can give somewhat random results.
When a lookup exceeds that value, Splunk uses an indexing mechanism to access the lookup, but it seems that does not work properly.
[lookup]
max_memtable_bytes = 52428800
Thank you so much, @bowesmana! This had been bugging me for years. Given the unpredictability, I should have suspected a memory-related problem.
So, there seems to be two separate problems. One is that index mechanism (mylookup.csv_XXXXX.idx) is not performing as designed. The second problem is the lack of any indication in splunkd.log that suggests max_memtable_bytes might be exceeded.
As your makeresults example shows, you should get both results, so it points to something different about the true value of QID in the index=myindex case.
If you add the following after stats count for a bit of diagnostics
| eval len=len(QID)
| eval Q=12345
| lookup mylookup QID OUTPUT Detected as D_1
| lookup mylookup QID as Q OUTPUT Detected as D_2
| eval match=if(QID=Q, "true", "false")
| table count QID len Q match D_*
what do you get?
Thanks for the diagnostic sequence, @bowesmana. As I try to implement, I realized two things. First, I didn't record the real-world QID in the original problem statement, even if the statement was precise. Now I have doubts about the precision because if I filter down to a single QID, I haven't found one that can trigger a blank return.
Second, I notice that even with makeresults, I can construct a group of QIDs to trigger this behavior. (The behavior seems to be the same as the group of QIDs appear in an index search.) For example,
| makeresults
| fields - _time
| eval QID = split("257210,257212,257100", ",")
| mvexpand QID
| lookup mylookup QID output Detected
and
index=myindex QID IN ("257210","257212","257100")
| stats count by QID
| lookup mylookup QID output Detected
both return something like
QID | Detected |
257210 | |
257212 | |
257100 | 2022-12-10 2023-01-01 ... |
Within this group, which one returns null and which real values do not change no matter the order of QIDs. If I search individual ones, each of them has non-null return.
In short, I couldn't reproduce a difference between simulated QID and real-world QID. The difference seems to be whether a group of events contain different QIDs or just a single QID.
How can I diagnose this further?
Is it KV store or CSV - if CSV is mylookup a definition or the CSV directly. If the definition, remove the layer of abstraction and try CSV. If KV store, try writing to the CSV and trying CSV.
Try making it a wildcard(QID) in the definition and adding for a couple of the failing examples, the ** around the QID.
Is it always the case that one of the QID Detected outputs is MV, as in your example? If so, disable one of the entries in the lookup. If it's a definition, set it to return 1 max result to see if that then causes the others to populate.
Just stabbing in the dark here to see what gives...
This is CSV with definition. I didn't even know that you can use CSV directly. So I tried. It made no difference. I then redefined QID's match type to wildcard. It also made no difference; if I surround any test value with **, no value can return because the entries themselves do not have wildcard characters.
The third test, unfortunately also doesn't give me additional info. I use the following as a basic test
| inputlookup mylookup
| stats count by QID
| lookup mylookup QID
| where isnull(Detected)
Somewhat to my surprise, there are many null outputs with count of 1. (Also, because how many QIDs are in search events is known to make a difference in which QIDs return null, I cannot say that the above test is definitive. It only shows that it is possible for single-entries to return null.)
For the wildcard definition, it is the CSV version that needs the **, not the data itself, so the lookup should contain
*257210*
and the data has 257210 - just in case there was something odd.
Can you DM me your lookup - as I can't reproduce it