Does anyone know of best practices around managing Summary Indexes in a consistent way?
Let’s say that some data occasionally arrives late (eg. forwarder was down). The scheduled search that populates summary index will calculate stats without this data. Later on, the data arrives, but the stats in the summary index are already incorrect. There is fill_summary_index.py. However, if I run it with “-dedup true” it will not re-calculate statistics that already exist. If I run it without dedup, it will not replace the existing statistics but add new ones. In other words, I’ll have two records, such as “3/24/17 10:30:00.000 XYZ=5” AND “3/24/17 10:30:00.000 XYZ=10”. This would make it hard to know which entry is the correct one. This will also fill the index over time with unnecessary data. Are there known ways to deal with such scenario?
What are some best practices around managing this in a consistent way? Occasionally, data can arrive late from different sources without me even knowing about it (eg, someone stops/restarts the forwarder). So if fill_summary_index.py was re-calculating and replacing records instead of adding them, I could schedule this script to run over the weekend from the beginning of time and correct anything that might have gapped. Can I do this somehow?
How do Accelerated Reports deal with late data arrivals? Would they detect it? Or should I go and manually trigger “Rebuild” for them from time-to-time? Is there any way to automatically trigger the "Rebuild" so it always runs over the weekend?
... View more