Can anyone explain why the following two searches produce different results? It seems to me that the output should be the same - but maybe the _internal lists kb usage a little different?
source="/opt/ShoppingSite/work/logs/ShoppingSite.log"|eval record_length=len(_raw)|stats sum(record_length) as record_length | eval record_length=record_length/1024
158002.983398
index="_internal" source="*metrics.log" per_source_thruput series="/opt/shoppingsite/work/logs/shoppingsite.log" | stats sum(kb)
64048.911130
First of all, metrics contains only a sample of the top 10 items (for host/source/sourcetype..)
therefore may not contain all the values.
on Splunk 4.2.* a workaround is to use the license_usage.log file on the license.master (in byte)
index="_internal" source="*license_usage.log" s="/opt/shoppingsite/work/logs/shoppingsite.log"| stats sum(b) by s
see http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume
Second remark, the end of line characters may not be counted, you can check with the field linecount.
Edit :
Details are on this wiki page : http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume
remark :
License_usage.log is available in the Splunk license master instance only. A license master logs indexed events volume every minute by the information the slaves send to the master. A slave maintains a table of how much you've indexed on a slave in chunks of time. Typically that chunk of time is 1 minute, but the chunk may grow if the slave cannot contact the master -- Splunk only resets the chunk when the table is sent to the master. The table is of src,srctype,host tuples… if that table grows to exceed 1000 entries, then Splunk squashes the host/source keys. So, if you have more than 1000 different tuple entries, you find no value for h(ost) and s(ource) fields. Splunk never suppresses st(sourcetype) in the log.
First of all, metrics contains only a sample of the top 10 items (for host/source/sourcetype..)
therefore may not contain all the values.
on Splunk 4.2.* a workaround is to use the license_usage.log file on the license.master (in byte)
index="_internal" source="*license_usage.log" s="/opt/shoppingsite/work/logs/shoppingsite.log"| stats sum(b) by s
see http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume
Second remark, the end of line characters may not be counted, you can check with the field linecount.
Edit :
Details are on this wiki page : http://wiki.splunk.com/Community:TroubleshootingIndexedDataVolume
remark :
License_usage.log is available in the Splunk license master instance only. A license master logs indexed events volume every minute by the information the slaves send to the master. A slave maintains a table of how much you've indexed on a slave in chunks of time. Typically that chunk of time is 1 minute, but the chunk may grow if the slave cannot contact the master -- Splunk only resets the chunk when the table is sent to the master. The table is of src,srctype,host tuples… if that table grows to exceed 1000 entries, then Splunk squashes the host/source keys. So, if you have more than 1000 different tuple entries, you find no value for h(ost) and s(ource) fields. Splunk never suppresses st(sourcetype) in the log.
Additional, the first search covers data with a timestamp in a particular range, while the second covers data that was indexed during a particular time period. These are not necessarily the same.