Deployment Architecture

Issue Forwarding Cloud Custodian Logs Using a Splunk Universal Forwarder

gerrykahn
Explorer

I have a developer running Cloud Custodian scans in AWS and dropping the JSON results on a Linux box running a Splunk Universal Forwarder. The results go into a file hierarchy: Out/BU_name/ORG_name/TypeOfScan_name/Results

I installed a Splunk UF on the box and set it up monitor the Out directory and all the sub-directories.

The problem is because there are many BUs, each with several ORGs and all running 5 different types of scans I end up with several hundred files with exactly the same name in hundreds of sub-directories. And to make matters worse the scan reruns every 10 minutes and the output file goes in the same location and has the same name, just the time stamp is updated.
I have tried many configuration and none have worked.

My latest attempted inputs.conf:
[monitor:///home/cloud-user/out/]
disabled = false
index = aws_scan
sourcetype = cloudcustodian
recursive = true
crcSalt =
initCrcLength = 1048576

Has anyone faced a similar issue and found a solution?

Tags (1)
0 Karma
1 Solution

masonmorales
Influencer

You can have Splunk recurse through directories by using "..." in the stanza. e.g.:

[monitor:///home/cloud-user/out/.../nameoflogfile.log]
disabled = false
index = aws_scan
sourcetype = cloudcustodian
recursive = true

Make sure that whatever use splunkd is running as has permissions to those files. You might want to take a look at how many file descriptors are in use as well and ensure that there are enough configured to monitor all of those files.

View solution in original post

0 Karma

masonmorales
Influencer

BTW, why do you have initCrcLength set? Do the files have very long headers?

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Would you elaborate on this:

I end up with several hundred files with exactly the same name in hundreds of sub-directories. And to make matters worse the scan reruns every 10 minutes and the output file goes in the same location and has the same name, just the time stamp is updated.

It could mean a few different things. Do the files show up in splunk with the same value for source or just the filename part of the source is the same? What specifically is "exactly the same name" mean?
Is the scan that runs every 10min a process that produces the outputs in these locations? What happens to the files that were there already after the scan is run? Does the scan append or replace or roll the existing logs? Sounds like it replaces the file in which case you've got a case of a log file whose cursor is at a point in the file that no longer exists because Splunk didn't realize the file is actually new (it assumes the file is appended to).

Clarify those and we'll see where to go next.

0 Karma

masonmorales
Influencer

You can have Splunk recurse through directories by using "..." in the stanza. e.g.:

[monitor:///home/cloud-user/out/.../nameoflogfile.log]
disabled = false
index = aws_scan
sourcetype = cloudcustodian
recursive = true

Make sure that whatever use splunkd is running as has permissions to those files. You might want to take a look at how many file descriptors are in use as well and ensure that there are enough configured to monitor all of those files.

0 Karma

gerrykahn
Explorer

I tried what has been suggested but I still have two issues. The first is that I am getting each line of a JSON file a an individual log event. The second is that I am not getting most of the files. I suspect the UF thinks it has indexed the files already. That is why I had been trying things like "crcsalt = " and initCrcLength = 1048576. Is there anything else you would suggest I try?

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...