I need to have summary index containing only unique values, say "src_ip" and "page_visited" from web traffic.
While I can run hourly searches and ... | dedup src_ip, page_visited
- how do I avoid getting values into the summary index that already exist in there?
Note: hourly search generates about 25,000 unique pairs so using subsearch method against summary index to drop duplicates does not sound like a working solution.
As you populate the summary index you could also populate a lookup table to reference at the beginning of the search for de-duplication.
index=blah | lookup my_table.csv id output summarized | search summarized!=1 | eval summarized=1 | stats avg(this) by that, id, summarized | outputlookup my_table.csv
Could you elaborate on what you're doing and what kind of duplicates you want to avoid?
Hi Martin,
I need to build a summary index of unique combinations of IP + user_agent + usernames that are using portal.
This summary index will be used as a lookup to detect in realtime if some user is suddenly logging in using IP or user_agent he never used before.