Splunk Search

How to search which forwarder is sending the most data to an indexer?

MikeBertelsen
Communicator

How can I determine which forwarder is impacting the indexer the most?
I have an index taking up 53 gigs of space with an event count of 296 million.
There are multiple forwarders feeding into this index.
The forwarders with the most events have directories that are less than 2 gigs in size.
I am manually going server to server to try and determine 'what is using all the space?'

0 Karma

MikeBertelsen
Communicator

Turns out the individual that set up the monitoring on these twelve servers didn't exclude log rolling.
I'm going to clean up the monitoring and if the issue persists i will seek assistance another day.
Thanks for the answers and support.

Mike

0 Karma

MikeBertelsen
Communicator

To clarify I'm interested in the data versus the events. Can a filepath containing 2 gigs spawn 10 gigs of disk space on an index?

In this case 12 hosts are involved. Each apparently has files that total about 2 gigs. So at best I would expect that the index size on disk would be about 16 gigs on a daily basis.

Putting it another way, SoS reports that this index is consuming on average 30 gigs of data per day. Where is the other 14 gigs coming from?

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

Mike,
it is technically possible, but seems a bit out of the ordinary. Raw data usually compresses very nicely on disk, we frequently see compression rates beyond 75%.
The remaining disk space is used for the indexes and metadata that go along with the raw data. If you, for example, configured INDEXED_EXTRACTIONS = json/xml/etc. and you have very high cardinality in your source data, the size of the index files can quickly exceed the raw data size on disk.

In other words: We need a bit more details on how you have your inputs configured for these 12 hosts.
You can also look at the directory structure on the indexer to see if you have multiple large .tsidx files.

Hope this helps.
Stefan

0 Karma

bmacias84
Champion

If you are using 6.2 or higher you can use DMC (Distributed Management Console). Here is the raw search you may want

index=_internal host=lyn-del-spl-101 source="*metrics.log" sourcetype=splunkd  group=per_host_thruput | timechart per_second(kb) as per_second sum(kb) as kb by series useother=false limit=15
0 Karma

Yasaswy
Contributor

Hi, the _internal index should have this information. You can try something like:
index=_internal metrics "group=tcpin_connections"

"sourceHost" would be the forwarder... you extract per your requirement. Eg
index=_internal earliest=-15m metrics "group=tcpin_connections"|stats sum(tcp_Kprocessed) by sourceHost
or tcp_eps (check out the docs for additional options)

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

You may also get what you need from something like this:
| metadata type=hosts index=

It will return a totalCount column per host. If your forwarders are the original source of your log events, the event count should accurately reflect what's coming from each forwarder.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...