Getting Data In

Large apache log files

mq20123167
New Member

Hello!

I'm new to Splunk and just getting my head around it all.

Our company is already using Splunk and we are considering using it on an apache server to gather web statistics in a similar fashion to AWstats.

We have enabled a log rotation on our server and we have 1 month worth of logs that is rotated. My concern is that once the apache server deletes the logs older then one month then I assume we will no longer be able to be search on that old information through splunk.

Ideally I would like 6-12 months worth of data. We have already racked up 645,000 events in a single month.

If we saved our logs somewhere else and got splunk to review our 6-12 months of data we would be going over a few million events. If splunk the right tool for this job? Can it handle that number of events? Or is it mostly made for short term log analysis?

Tags (1)
0 Karma
1 Solution

Ayn
Legend

First, regarding your concern - your assumption is incorrect, because Splunk doesn't work directly on the source files. What happens when you add a file/directory to be monitored by Splunk is that events are indexed - you could say they're copied to Splunk's index (database). Once that's done, it doesn't matter what happens to the source file. The events are in the index, and will be indefinitely (or at least for as long as you've told Splunk to keep events).

There's really no limit to how many events Splunk can handle. Many use it for analysis of huge amounts of data spanning over several years. There are Splunk deployments out there indexing several terabytes of data each day. For that kind of deployment you obviously can't just put your one so-so specced Splunk indexer, but you can scale your deployment easily by adding more indexers and other Splunk instances as you go.

View solution in original post

Ayn
Legend

First, regarding your concern - your assumption is incorrect, because Splunk doesn't work directly on the source files. What happens when you add a file/directory to be monitored by Splunk is that events are indexed - you could say they're copied to Splunk's index (database). Once that's done, it doesn't matter what happens to the source file. The events are in the index, and will be indefinitely (or at least for as long as you've told Splunk to keep events).

There's really no limit to how many events Splunk can handle. Many use it for analysis of huge amounts of data spanning over several years. There are Splunk deployments out there indexing several terabytes of data each day. For that kind of deployment you obviously can't just put your one so-so specced Splunk indexer, but you can scale your deployment easily by adding more indexers and other Splunk instances as you go.

mq20123167
New Member

Thanks Ayn, appreciate your help with this.

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...