How do i compare my raw data volume to the indexed data volume for a specific source type?
Can someone help with this query?
We have index clustering, a deployment server, and a distributed management console.
i want to make sure their same data is not indexed more than one time. (dual, triple indexing of same data)
To determine duplicate data, you could do a | stats count by _raw, _time, host, source
although I promise that will be a slow and painful process.
Indexed data volume is captured in index=_internal source=*/license_usage.log sourcetype=splunkd
and then you can specify a sourcetype using the st=
field.
Where do you think you have duplication? Starting with the symptoms that motivated your question will help us be more surgical in what would otherwise be a very involved process.