Monitoring Splunk

Splunk directory monitor issues with aws s3 sync downloads (logs from Cisco Umbrella)

Alesk_andr
New Member

Hi,

I have dns/proxy logs on Aws S3, which are originate from Cisco Umbrella platform.

Each day, I download new logs from Aws S3 in directories which are monitored.

I would like to know how to force Splunk to add these new logs just after their downloading.

PS : I don't use AWS add-on for Splunk

Tags (1)
0 Karma

FrankVl
Ultra Champion

What exactly is your issue? Because a Splunk Monitor input on a folder should do exactly that: ingest new files (or new lines added to existing files).

Can you describe in a bit more detail what these files look like? If the start of these files is the same each time, then you may indeed run into the issue that Splunk's crc detection thinks it is a file it already had ingested. This could be solved by setting (literally like this)crcSalt = <SOURCE> to include the path/filename in the crc calculation, such that files with a new name are always considered new files, even if the first X bytes are the same as previously ingested files.

Whether or not Splunk ignores files due to CRC match you can also see in splunkd.log.

0 Karma

Alesk_andr
New Member

Hi,

I have two types of logs into my directory : dnslogs and proxylogs, from AWS S3.
These two directories are monitored according to their sourcetype.

When I begin downloading of new logs from AWS to my directories, Splunk add only 2 events instead of thousands, and these events are not good because Splunk add these event before the end of downloading.

So, I stop Splunk, then I download new logs. When it's finish, I start Splunk.

So, I would like to configure Splunk to scan, when it starts, my directories to add new log downloaded.

0 Karma

FrankVl
Ultra Champion

Ok, that makes the issue a lot more clear, thanks. Perhaps add this info to your question, so others don't overlook it.

Can you also share your current inputs.conf?

0 Karma

Alesk_andr
New Member

[default]
host = debianSplunk
[batch://$SPLUNK_HOME/var/spool/splunk]
disabled = 1

I am a beginner on Splunk, so maybe I forgot some operations.

Like transfer inputs.conf and server.conf from folder default to folder local, do I have to do it ?

0 Karma

FrankVl
Ultra Champion

Are you indeed dumping these AWS files into var/spool/splunk?

Have you tried with a monitor stanza instead of batch?

0 Karma

Alesk_andr
New Member

Nop, AWs files are not into var/spool/splunk.

I realize lots of operations from web interface, so I don't know the impact on configuration files.

Maybe, I should try with stanza because I have some warnings about stanza.

I am sorry for little information I gave you

0 Karma

FrankVl
Ultra Champion

If you have configured your data inputs through the gui, then look for the inputs.conf under etc/apps/search/local/

0 Karma

Alesk_andr
New Member

Thanks for this information, the contents of this inputs.conf matches with my directories.

[monitor:///home/ale/awscli-bundle/logs_test/dnslogs]
disabled = false
sourcetype = opendns:dnslogs
[monitor:///home/ale/awscli-bundle/logs_test/proxylogs]
disabled = false
sourcetype = opendns:proxylogs

So, now, I have to use crcSalt ?

0 Karma

FrankVl
Ultra Champion

No, not sure if crcSalt will fix this.

I guess there is some issue with how you download these files, that causes issues with Splunk's monitoring mechanism. Can you elaborate a bit on how you download the files from S3 to your splunk box?

You might also want to take a look at Splunk's add-on for AWS, that allows you to collect data directly from S3, rather than first downloading it with some script. https://splunkbase.splunk.com/app/1876/

0 Karma

Alesk_andr
New Member

Hello,

It is just a "aws S3 sync" on a directory, hosted on S3, and logs come from Cisco Umbrella platform.

I know AWS add-on for Splunk but it doesn't work for Umbrella, or rather, it is not suitable for my use.

0 Karma

FrankVl
Ultra Champion

Right, ok. Too bad you can't use the AWS add on in your situation. I've successfully seen that used for Umbrella data collection.

I'm no expert on how exactly that aws s3 sync works and why it is giving the issues you see with Splunk not properly reading the created files.

I would suggest you might want to either update the information in the start post of this question (and have the title changed), or post a new question (and have this one closed). Clearly describing (also in the title) that this is about Splunk file monitor issues with aws s3 sync downloads. That way, the people who do have experience with it may notice it quicker and provide you with an answer.

0 Karma

Alesk_andr
New Member

Thanks a lot for your help !

0 Karma

sudosplunk
Motivator

Hi,

Did you try using crcSalt or initCrcLength in your inputs.conf?

0 Karma

Alesk_andr
New Member

I don't because I don't know which value use for these settings.

I read the link that you sent me but I don't find a solution.

Have you suggestions please ?

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...