Monitoring Splunk

Best way to monitor and index millions of files in Splunk

raja21
Explorer

Hi developers, I am trying to analyse some logs by extracting them in JSON format and feeding to splunk.
I have millions of these logs each resulting in a JSON file of 4-5 kb.
How to monitor these files effectively so that spunk picks up each file.

Thanks.

Tags (1)
0 Karma

ddrillic
Ultra Champion

A major issue can be the ulimits for open files. Read please the great post by @yannk at how to tune ulimit on my server ?

0 Karma

FrankVl
Ultra Champion

I see 2 main options:

  1. Put a Universal Forwarder on the system that is storing these logs and create a monitor input for the respective folder.
  2. If you're using some kind of script to extract those logs, you could modify that script to send the JSON data by HTTP POST request to a Splunk Heavy Forwarder / Indexer set up as a HTTP Event Collector: http://docs.splunk.com/Documentation/Splunk/latest/Data/AboutHEC

I don't have experience myself with such huge amounts of files, but unless you get some specific recommendations here, I'd suggest to just give it a try (in a test setup ideally of course) and see what issues you run into. Then you can always post back here to get help resolving those issues.

0 Karma

raja21
Explorer

hi @FrankVl, I tried HTTP Event Collector method and found it to be useful.

Now the issue is i have to run curl command for each files. On a daily basis i get millions of files to process so would it be an overhead to run curl so many times?

I also have an idea of merging all the JSON records into one file seperated by [EOF] and send that file across to splunk and break events using [EOF].
But its not getting inputted into splunk as [EOF] is not in JSON format.

Any other solutions??

0 Karma

FrankVl
Ultra Champion

Don't think curl should give too much overhead, but you should be able to see that for yourself whether it causes problematic load.

As per your other idea: I don't completely follow what you tried and what is failing.

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...