Getting Data In

Splunk not detecting local files recursively.

millarma
Path Finder

I am I have a couple hundred log files I pulled from client computers using powershell. I am experimenting with having Splunk index them. It was working prior to upgrading to 6.6.

basically if I monitor a file directly, it works. But Splunk is not recursing sub-directories. I have never indexed these files before. On the data inputs screen it detects the files, but no events are parsed.

I think it has to do with the path of the log files. Because I am lazy, I copied recursively with a filter, resulting in a long path, e.g. C:\splunkdragonlogs\top25828\CCW03310\*****CCW03310*****\AppData\Roaming\Nuance\NaturallySpeaking12

I use a regex to define the host as the 'user', as you see bolded above.

I have tried editing input.conf to say recursive=true although that should be happening anyway.

any thoughts of things to explore?

0 Karma
1 Solution

woodcock
Esteemed Legend

Start with ./splunk list inputstatus and ./splunk list monitor but the problem is almost certainly that there are too many files to sort through. One quick way to test is to do ./splunk restart on your forwarder. Do most of the files start to catch up and then stop updating? Somewhere in the "thousands" of files, a forwarder will take so long sorting through and keeping track of everything that it cannot keep up with the actual task of forwarding. Usually the solution is simple: make sure that your housekeeping design is deleting or archiving files that are no longer going to change so that they disappear form the places Splunk is monitoring. If that cannot be done (the files must stay in place), then you can use this trick (be sure to UpVote😞
https://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html

Also check inodes; your splunk user should probably be running ulimit unlimited (or something quite large). This can also cause inability to handle large numbers of files and directories.

View solution in original post

woodcock
Esteemed Legend

Start with ./splunk list inputstatus and ./splunk list monitor but the problem is almost certainly that there are too many files to sort through. One quick way to test is to do ./splunk restart on your forwarder. Do most of the files start to catch up and then stop updating? Somewhere in the "thousands" of files, a forwarder will take so long sorting through and keeping track of everything that it cannot keep up with the actual task of forwarding. Usually the solution is simple: make sure that your housekeeping design is deleting or archiving files that are no longer going to change so that they disappear form the places Splunk is monitoring. If that cannot be done (the files must stay in place), then you can use this trick (be sure to UpVote😞
https://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html

Also check inodes; your splunk user should probably be running ulimit unlimited (or something quite large). This can also cause inability to handle large numbers of files and directories.

millarma
Path Finder

thank you. I think this was the issue. There were thousands of extra directories that , while empty, would keep the tailingprocessor busy.

0 Karma

millarma
Path Finder

I am OP. Please find my inputs.conf below. However you should know that the files are now there.

I have done nothing in the meantime. Can you help me understand why? I would hazard a guess that they weren't done indexing the last time I looked. This makes me think that files do not become searchable until the entire data input has been indexed. Is that so?

How would one know if files were in the process of being indexed? Thank you all for you help.

[monitor://C:\splunkdragonlogs\top25sinceJune1]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
recurse = true

[monitor://C:\splunkdragonlogs\top25810*]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dgn

[monitor://C:\splunkdragonlogs\top25828*]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dgn

[monitor://C:\dgnlogs\top25828]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dgn

[monitor://C:\dgnlogs\top25sinceJune1]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dragonlog

[monitor://C:\dgnlogs\PathDragonLogs]
disabled = false
host_regex = \w+:\\w+\\w+\\w+\d+\(\w+)
index = dgn
sourcetype = dgn- clone

0 Karma

woodcock
Esteemed Legend

Show us your inputs.conf. All of it.

0 Karma

tlam_splunk
Splunk Employee
Splunk Employee

1) Please check splunkd.log and find there is any TailingProcessor stanza for your folder when you startup the Splunk
e.g.
TailingProcessor - Parsing configuration stanza: monitor://xxxx/xxx/xxx

2) Try to add the '...' and '*' wildcard in the monitor stanza and see it helps.

0 Karma

mattymo
Splunk Employee
Splunk Employee

I'd start with ./splunk list inputstatus and check out what your inputs are saying.

Or check out index=_internal source=*splunkd.log ERROR OR WARN

- MattyMo
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...