We're looking to use Splunk to index our application logs; But the application creates a separate log file for each process it spawns, for this month so far have approx 720000 separate log files totalling 32GB.
Ultimately we'd look to roll this out across multiple instances of the application which means over 2 million application logs a month. We are going to filter the incoming data to reduce the amount indexed but we still need to read the total number of sources to get the bits we want.
To add to this we have system logs (eg apache, sendmail/postfix, ftp daemon) that all go into a syslog-ng instance, these are ordered by server then year then month and finally a file for each day. I have heard that we are better off just putting all of these into the one big file and then roll that daily than our current method of a file per day.
I notice we can use ignoreOlderThan which will be useful once we have the initial set of data indexed, but I was wondering if someone has similar experience and can provide some ideas and experiences on how best to handle this sort of input optimally?
Cheers,
Mark
... View more