Getting Data In

How to configure universal forwarders on roaming laptops to maintain Windows event logs to be forwarded once connected to our network?

jganger
Explorer

I've installed a few Universal Forwarders on Windows laptops that are not consistently connected to the network. One machine did seem to cache events and forward them when reconnected, but another did not. My hypothesis is that this is because the first machine was only ever placed into hibernation, not shutdown or restarted, so the in-memory queue was preserved, whereas the other was shutdown.

That said, I need to maintain these logs regardless of connectivity. From my research, I believe that two settings should achieve these goals: useACK = true in the output stanza, and persistentQueueSize = 100MB in each input stanza. This should cause all events to be written to disk until such time as the indexer is available.

Is this a reasonable approach? I understand that there's some network and disk overhead involved, but is there any reason why this wouldn't work in the way I understand? Thanks for suggestions,

0 Karma
1 Solution

javiergn
SplunkTrust
SplunkTrust

Your approach looks correct to me with regards to the persistent queue but something you need to ask yourself is: where are my logs being written to?

If your logs are written to disk and therefore there's no immediate risk of losing them if connection is not available, your UF will catch up automatically next time the Splunk servers are available.

If your logs are not written to disk, syslog for instance, and there's a chance of losing them my preferred approach would still be to try to save them to disk somehow first and then point the UF to those.

Caching and ACK is by no means a good approach. I use both in my deployments and I'm quite happy with it. But combining those two with log files makes your log collection more reliable.

Hope that helps.

Thanks,
J

View solution in original post

jkat54
SplunkTrust
SplunkTrust

What input types are you using?

According to the documentation:
http://docs.splunk.com/Documentation/Splunk/6.2.0/Data/Usepersistentqueues

Persistent queues are not available for these input types:

 Monitor
 Batch
 File system change monitor
 splunktcp (input from Splunk forwarders)
0 Karma

jganger
Explorer

Sorry I forgot to mention that. These are Windows Event Log inputs.

0 Karma

javiergn
SplunkTrust
SplunkTrust

Your approach looks correct to me with regards to the persistent queue but something you need to ask yourself is: where are my logs being written to?

If your logs are written to disk and therefore there's no immediate risk of losing them if connection is not available, your UF will catch up automatically next time the Splunk servers are available.

If your logs are not written to disk, syslog for instance, and there's a chance of losing them my preferred approach would still be to try to save them to disk somehow first and then point the UF to those.

Caching and ACK is by no means a good approach. I use both in my deployments and I'm quite happy with it. But combining those two with log files makes your log collection more reliable.

Hope that helps.

Thanks,
J

javiergn
SplunkTrust
SplunkTrust

Hi Jganger, I can't see your last comment here. There are probably too many nested comments and the website can't cope with them.

In summary, protect your logs against tampering using tools outside of Splunk. If a user has local admin access to your machine it's always going to be complicated to prevent that so restrict your user rights first.

0 Karma

jganger
Explorer

These events are Windows Event Log inputs. I'm a little confused though. You said in your response that my approach is correct, but by no means a good approach?

When you say "if your logs are written to disk" you mean specifically if I enable the persistentQueue ? Windows event logs are ultimately on disk but the behavior I saw was that the UF did not attempt to catch up with old events. It began forwarding when the connection to the indexer was restored, but older events which occurred while it was offline were not forwarded.

0 Karma

javiergn
SplunkTrust
SplunkTrust

Hi, if you are reading win event logs then the AcK flag should do the trick.

That's what I was using in my last place and it was working great. We had to increase the max event log size on Windows just in case a laptop did not have connectivity for weeks, but my domain admins did this via GPO.

jganger
Explorer

Oh actually, I guess I'm still a little unclear. If I only add the useACK but don't specifically enable persistentQueue how long will it continue to enqueue events?

0 Karma

javiergn
SplunkTrust
SplunkTrust

Short answer:

By default, forwarders and indexers
have an in-memory input queue of
500KB. If the input stream runs at a
faster rate than the forwarder or
indexer can process, to a point where
the queue is maxed out, undesired
consequences occur.

All the details here. See this link.

0 Karma

jganger
Explorer

Yes, I've read that page. However this page on useACK refers to a separate 7 MB (default) waitqueue. It's totally unclear how this would interact with the persistenQueue feature etc.

0 Karma

javiergn
SplunkTrust
SplunkTrust

Sorry if I wasn't clear before. You don't need persistent queue with event logs. Simply enable the useACK attribute.

Once you've done that this will increment the default 500kb queue I was taking about before to 7MB or whatever is default nowadays. This is needed to prevent any problems in case the ack takes longer than expected, but it's not persistent. It resides in memory.

0 Karma

jganger
Explorer

So you're saying that event logs will resume forwarding from the oldest first until current when the connectivity to the indexer is restored simply by setting the useACK flag?

0 Karma

javiergn
SplunkTrust
SplunkTrust

UseAck will ensure all your events arrive to your next hop.

The Splunk UF keeps track of any logs correctly read in order to resume successfully in case of an error.

Combining both capabilities ensures a correct log collection.

Hope that makes sense

0 Karma

jganger
Explorer

Perfect. It sounds like useACK is the only setting I'll need to modify. If you want to reformat this as a top level answer with the explanation I'd be happy to mark it solved. Thanks for your time and patience on this issue.

0 Karma

jganger
Explorer

Wait, I just thought of another problem with this scenario. If the logs are deleted while the machine is offline, and it's then rebooted the memory queue is gone, and the logs on disk are gone.

Would the persistentQueue prevent these logs from disappearing?

0 Karma

jganger
Explorer

javiergn: your latest comment came in my email but is not showing up on this page.

The issue with event logs being deleted is more related to a security perspective. If someone were to clear the event logs then reboot thus clearing the in-memory cache, splunk would have no record. I suppose such an actor could also delete splunk's disk cache as well though so I'm not sure that it's worth going down this rabbit hole.

0 Karma

javiergn
SplunkTrust
SplunkTrust

But why would your event logs be deleted while your machine is offline. Event logs might be overwritten if the queue is full but that's the reason you need to configure a decent queue size on Windows.

Persistent queue might help but it's not bullet proof. What if the Splunk agent is down or the queue itself is full?

I know you are trying to see all the angles but from I can tell you from experience that it is highly unlikely that Splunk will miss an event log. One of previous companies was reading more than 100M event logs every day from 3000 servers, desktops and laptops and after some initial tweaking with the load and some filtering it's all working great.

I've seen one of those little agents reading 20M logs in one hour from a small Windows 7 workstation that went mad and it barely used the queue or had a noticeable impact in cpu or memory.

0 Karma

jganger
Explorer

Cool. I'm thinking that I would prefer to use a larger persistentQueue and keep the event log size small. EventVwr can have difficulty working with event logs that get very large so we have opted to keep those to 50MB. It seems like the persistentQueue will allow us to do that and provide a larger buffer at negligible cost in terms of disk space.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...