Splunk Search

Monitor File Count and Age within a folder without indexing

jreagan
New Member

Im a Splunk newb and i am trying to find the best way to use Splunk to monitor an FTP Home Folder. I do not care about the contents of the file and prefer to not have their contents in Splunk since they contain HIPPA data.

I need to monitor how many files are in the folder and how old they are so i can be alerted if a file is left in there for a long time as well as if the number of files pass a threshold.

Tags (4)
0 Karma
1 Solution

dflodstrom
Builder

I think a scripted input would be useful for this task. Take the output of a command like $ls -l and index it. Extract the output into appropriate fields and Splunk it.

View solution in original post

daverodgers
Explorer

Just to finish off what I have ended up doing. Our splunk environment is windows based not linux.

so I created this script which is batch file that runs every 5 mins as a windows scheduled task. This runs on the splunk server. but i use a network path in the batch file so i can count remote directories.

@ECHO OFF
SETLOCAL 
SETLOCAL ENABLEDELAYEDEXPANSION
SET count=0 
for %%o IN ("\\network\folder\location\*.*") DO ( 
      echo %%o 
      SET /A count=count + 1 
)
net time \\%computername% |find "Current time" >> c:\count\countfiles.txt
echo dataareaid=UK >> c:\count\countfiles.txt
echo currentcount=%count% >> c:\count\countfiles.txt
ENDLOCAL ENABLEDELAYEDEXPANSION 
ENDLOCAL

this outputs the timestamp (net time) and the count result to the txt file (countfiles.txt). i use the double arrow >> to append the results each time it runs.

I then created a new data input > file input in splunk that index's this countfiles.txt file.

Splunk automatically picked up the timestamp and created the correct event rows for me.

because i put "currentcount=" in the batch file, splunk identifies that as a custom field so i can search on it.

When creating the input i created a new sourcetype called "filecount".

I monitor different directories each with a different batch file and resulting counttxt file. I have setup a file input for each of these in splunk and assigned them all this new sourcetype. This way i can search using "sourcetype="filecount" and it returns all my file count results which i plot on a single chart. In our case each directory relates to a different country.

the full search i use is:

sourcetype="filecount" | timechart max(currentcount) by dataareaid span=5m

this gives me exactly what i needed, a running count over time showing the maximum file count.

We have functions that process these files and move them on. If that function fails the files arent moved and the count rises as the files stay in the folders. This is shown perfectly on our splunk chart as a rising count line and alerts us to any issues with this process.

The same could be used for anyone with an ftp server, processing incoming files.

hope that helps anyone wanting to do a similar thing.

dflodstrom
Builder

I think a scripted input would be useful for this task. Take the output of a command like $ls -l and index it. Extract the output into appropriate fields and Splunk it.

daverodgers
Explorer

hi guys

this is exactly what i need to do.

did this suggested answer work?

if so, can someone be more specific on how to go about doing it?

thanks

0 Karma

dflodstrom
Builder

Hopefully this will help. Create a script that does two things:

  1. Count the number of files in a folder (this is a question for google/stackexchange). Put output from command into a text file (make sure you're appending the file)
  2. Determine the age of files in folder (another question for google/stackexchange). Put output from command into a text file, append a new file for this info or append the same file as before.

Create a cron job that executes this script on a given interval.

Your requirements might differ a little bit; maybe you want to examine several directories or all of the child directories within a given directory. You can use your scripting skills to format the output in a way that makes field extraction happen automatically or you can just use your Splunk field extraction skills on the default output.

Create a monitor input to read the file your script is making, make sure Splunk is exctacting your fields, create an alert in Splunk and you're set! Bonus*Syslog your output to another server if you don't want to install a universal forwarder on the server in question

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...