Security

Can I Lock Hunk Out while doing an admin task - such as a data file tidy up?

alexmc
Explorer

I have a small Hunk problem which results in the occasional failed search.

What I want is to have some way of flagging to Hunk that it should hold off from searching for a few minutes.

Basically every few minutes I fetch more data and new small files appear in my HDFS directory tree. It happens to be a tree like /data/TOPIC/2015/06/16/datafile.compressionformat
Once an hour I have a job which looks at each of these directories and takes all the many small files and merges them into one big file per directory. (This is generally a good idea because each file requires a separate Map which slows the MR down. Also many small files take up much more heap space memory in the Namenode than few large files.)

The problem is that if a Hunk search is going on at that time then the MR job is told which files to look at when the job starts, but by the time they get around to processing the files some of them have disappeared - they have been merged into new larger files.

So one possible solution is to get my file merge job to
a) only start when no Hunk jobs are running. and
b) prevent any Hunk jobs from starting until it has finished.

Has anyone tried such a thing?

I am guessing the only way to do this would be to use queues in some way - perhaps making my file merge take up a whole queue?

Any ideas?

Thanks

Tags (4)
0 Karma

kschon_splunk
Splunk Employee
Splunk Employee

You can disable searches against an index or virtual index by adding the line "disabled = 1" to its stanza in indexes.conf. This won't cause searches to wait, but it will cause them to return immediately with no results.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...