Re: Splunk directory monitor issues with aws s3 sy...

Alesk_andr · ‎08-17-2018

Hi,

I have dns/proxy logs on Aws S3, which are originate from Cisco Umbrella platform.

Each day, I download new logs from Aws S3 in directories which are monitored.

I would like to know how to force Splunk to add these new logs just after their downloading.

PS : I don't use AWS add-on for Splunk

FrankVl · ‎08-17-2018

What exactly is your issue? Because a Splunk Monitor input on a folder should do exactly that: ingest new files (or new lines added to existing files).

Can you describe in a bit more detail what these files look like? If the start of these files is the same each time, then you may indeed run into the issue that Splunk's crc detection thinks it is a file it already had ingested. This could be solved by setting (literally like this)crcSalt = <SOURCE> to include the path/filename in the crc calculation, such that files with a new name are always considered new files, even if the first X bytes are the same as previously ingested files.

Whether or not Splunk ignores files due to CRC match you can also see in splunkd.log.

Alesk_andr · ‎08-17-2018

Hi,

I have two types of logs into my directory : dnslogs and proxylogs, from AWS S3.
These two directories are monitored according to their sourcetype.

When I begin downloading of new logs from AWS to my directories, Splunk add only 2 events instead of thousands, and these events are not good because Splunk add these event before the end of downloading.

So, I stop Splunk, then I download new logs. When it's finish, I start Splunk.

So, I would like to configure Splunk to scan, when it starts, my directories to add new log downloaded.

FrankVl · ‎08-17-2018

Ok, that makes the issue a lot more clear, thanks. Perhaps add this info to your question, so others don't overlook it.

Can you also share your current inputs.conf?

Alesk_andr · ‎08-17-2018

[default]
host = debianSplunk
[batch://$SPLUNK_HOME/var/spool/splunk]
disabled = 1

I am a beginner on Splunk, so maybe I forgot some operations.

Like transfer inputs.conf and server.conf from folder default to folder local, do I have to do it ?

FrankVl · ‎08-17-2018

Are you indeed dumping these AWS files into var/spool/splunk?

Have you tried with a monitor stanza instead of batch?

Alesk_andr · ‎08-17-2018

Nop, AWs files are not into var/spool/splunk.

I realize lots of operations from web interface, so I don't know the impact on configuration files.

Maybe, I should try with stanza because I have some warnings about stanza.

I am sorry for little information I gave you

FrankVl · ‎08-17-2018

If you have configured your data inputs through the gui, then look for the inputs.conf under etc/apps/search/local/

Alesk_andr · ‎08-17-2018

Thanks for this information, the contents of this inputs.conf matches with my directories.

[monitor:///home/ale/awscli-bundle/logs_test/dnslogs]
disabled = false
sourcetype = opendns:dnslogs
[monitor:///home/ale/awscli-bundle/logs_test/proxylogs]
disabled = false
sourcetype = opendns:proxylogs

So, now, I have to use crcSalt ?

FrankVl · ‎08-17-2018

No, not sure if crcSalt will fix this.

I guess there is some issue with how you download these files, that causes issues with Splunk's monitoring mechanism. Can you elaborate a bit on how you download the files from S3 to your splunk box?

You might also want to take a look at Splunk's add-on for AWS, that allows you to collect data directly from S3, rather than first downloading it with some script. https://splunkbase.splunk.com/app/1876/

Alesk_andr · ‎08-20-2018

Hello,

It is just a "aws S3 sync" on a directory, hosted on S3, and logs come from Cisco Umbrella platform.

I know AWS add-on for Splunk but it doesn't work for Umbrella, or rather, it is not suitable for my use.

FrankVl · ‎08-20-2018

Right, ok. Too bad you can't use the AWS add on in your situation. I've successfully seen that used for Umbrella data collection.

I'm no expert on how exactly that aws s3 sync works and why it is giving the issues you see with Splunk not properly reading the created files.

I would suggest you might want to either update the information in the start post of this question (and have the title changed), or post a new question (and have this one closed). Clearly describing (also in the title) that this is about Splunk file monitor issues with aws s3 sync downloads. That way, the people who do have experience with it may notice it quicker and provide you with an answer.

Alesk_andr · ‎08-20-2018

Thanks a lot for your help !

sudosplunk · ‎08-17-2018

Hi,

Did you try using crcSalt or initCrcLength in your inputs.conf?

Alesk_andr · ‎08-17-2018

I don't because I don't know which value use for these settings.

I read the link that you sent me but I don't find a solution.

Have you suggestions please ?

Splunk directory monitor issues with aws s3 sync downloads (logs from Cisco Umbrella)

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio