Re: events were not deleted and delete-query hangs

marcokrueger · ‎05-23-2017

Currently, we want to delete some events (that is, all events with a certain sourcetype in a defined range in 2016) from Splunk. And normally, deleting with ... | delete works fine and almost all events could be deleted successfully. However, for some single days, the delete-query hangs and lets thousends of events undeleted. The index, sourcetype etc. are all equal, but the events won't let themselves be deleted 😞

What we observe is that when we run the search index=myindex sourcetype=mytype earliest="03/27/2016:00:00:00" latest="03/28/2016:00:00:00" | delete we get

INFO: 0 events successfully deleted
INFO: 0 events successfully deleted
INFO: 0 events successfully deleted
INFO: 0 events successfully deleted
...
till to the bitter end. But index=myindex sourcetype=mytype earliest="03/27/2016:00:00:00" latest="03/28/2016:00:00:00 | stats count immediately gives the result
INFO: Your timerange was substituted based on your search string

count

239343
now we are totally distressed. Does anybody know how to get Splunk to delete the events?

P.S.
Unfortunaly cleaning the whole index is not an option.

jkat54 · ‎05-24-2017

Did you check the permissions of the files in the buckets?

Perhaps splunk ran as root for a while, and was corrected to run as splunk... now some files still are owned by root?

If so, easy solution is to stop splunk on the indexer(s), and chown -Rf splunk. /opt/splunk assuming you dont keep your data in other places.

marcokrueger · ‎05-29-2017

Thank you for your answer,
all the files all have the correct user and permissions. To be absolute sure, I copied the complete bucket to backup, removed the bucket (after splunk shutting down) and copied the backup back.
After the restart the event are at place again but not deleteable like before.

splunkannm · ‎10-25-2017

restarting splunk usually resolves it

jkat54 · ‎05-29-2017

Wow, between that and the fsck I really thought one of them would have solved the issue.

aakwah · ‎05-23-2017

Hello,

As per delete command documentation (https://docs.splunk.com/Documentation/Splunk/6.6.0/SearchReference/Delete)

Note: The delete command does not work if your events contain a field named index aside from the default index field that is applied to all events. If your events do contain an additional index field, you can use eval before invoking delete, as in this example:

index=fbus_summary latest=1417356000 earliest=1417273200 | eval index = "fbus_summary" | delete

So try this query instead:

index=myindex sourcetype=mytype earliest="03/27/2016:00:00:00" latest="03/28/2016:00:00:00" | eval index = "myindex" | delete

Regards

anandsplunkies · ‎11-22-2017

Can you tell the other way to delete the index buckets using delete query because i tried the ways you suggested above. can you provide solution for this

marcokrueger · ‎05-23-2017

Thank you, but this doesn't work neither, the ...| delete gives no error, just report it deletes 0 events.
As mentioned, most events could be deleted except the 239343, and all events have the same index-field.

To be sure, I tested it, like in your recommendation and also done a ` ... | stats count by index '. Everything is okay.

aakwah · ‎05-23-2017

Welcome, could you please provide a sample log line and Splunk version, to try to reproduce the issue?

Regards

marcokrueger · ‎05-23-2017

we are running splunk.version 6.4.4 on 6 indexer and 4 searchheads.
here comes an example event

1459108799815, revisit_creationtime=1459108799815, cookie_value="01m401s5v5i2w9izrz", leadout_click_bokey="IrmmSwDt8tX2HXLBVRWhpA", leadout_shop_id="9701", leadout_type="OFFER", leadout_provider="EBYDE", leadout_click_position=2, leadout_affiliate="ipc-android", root_category_id="3626", category_id="26491", product_type="nonVaried", product_id="4019314", product_name="Shimano CN-HG95", manufacturer_name="Shimano", tracetime=1459105200, redirect_to="http://rover.ebay.com/rover/1/707-53477-19255-0/1?ff3=4&pub=5574635388&toolid=10001&campid=533777055...", leadouts=1, reloadblocked_leadouts=0, checkouts=0, loggedin_leadouts=0, loggedin_checkouts=0, page_template="GoToShop", analyze_begin=1459105200, analyze_end=1459108800, kpi_type=session_object_lo

aakwah · ‎05-23-2017

Thanks, I ingested the event and I was able to delete it normally, so I believe the issue now is with the buckets.

you can run the following command to get the distribution of buckets on indexers with the corresponding path of the bucket on filesystem, then you can check the permissions and ownership of buckets if there is something wrong.

| dbinspect index=myindex

Regards

marcokrueger · ‎05-23-2017

Hi aakwah,
all rights for the bucket seems to be okay. All files in the involved buckets are owned by the splunk-user and have read and write permissions. Of course the dictionaries are also executable. 🙂

Unfortunaly in splunk 6.4.4 the dbinspect-command haven't the corruptonly-option yet 😞

Should I run the splunk cmd fsck?

Best Regards

marcokrueger · ‎05-24-2017

I have run ./splunk cmd splunkd fsck scan --all-buckets-one-index --index-name=myindex
and got "No issues found" many times. I think the buckets are okay.

Is the splunk fsck comparable to the linux fsck?

Best regards
Marco

aakwah · ‎05-24-2017

Hello Marco,

I believe that splunk fsck handling metadata of the bucket.

Did you run fsck command on all the 6 indexers?

I think it is a good idea now to capture splunkd.log events on indexers and on the searchead during the execution of the delete query, may be there is a clear error message.

Regards

marcokrueger · ‎05-24-2017

Dear aakwah,
I did run the fsck on all indexers.
I also have examined the log-files. On the searchhead where I executed the query also on the indexers. Ones direct in the file and also with splunks index=_internal ... nothing showy 😞

I'm so distressed, I think there is nothing but deleting the involved buckets ... 😞

Thank you and best regards
Marco

aakwah · ‎05-24-2017

It is really weird, final thoughts from my side,
- Try to run the delete query from indexers directly
- Try to make the time range smaller one hour for example or try to delete single event
- Finally submit a case to Splunk support 🙂

Regards

marcokrueger · ‎05-29-2017

Thank you, I alos tried to delete single events, but it also doesn't work.

I also copied the complete bucket to backup, removed the bucket (after splunk shutting down) and copied the backup back.
After the restart get the events by a query but can't delete them like before.

best regards
Marco

adonio · ‎05-23-2017

is that an indexer cluster?

marcokrueger · ‎05-23-2017

no, it is not a cluster

events were not deleted and delete-query hangs

count

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases