Getting Data In

Can someone possibly suggest a better way to handle snapshot data or tell me how to get more information on buckets?

jaredlaney
Contributor

We currently have some data that appears in "snapshot" form. In other words, we get a snapshot of the data every day from a restful interface and upload to Splunk.

To eliminate search issues for customers, we do a soft delete ( | delete - Not really deleted ) of the index before ingesting the new data. This has been problematic because we often deploy our new app and sometimes sources that were soft deleted return and there are many delete directories which form.

First of all, any suggestions on how we could handle this would be appreciated. (We've tried lookups and kv_lookups but failed because they didn't have the faceted search our customers wanted).

One remedy we've suggested is to remove all of the buckets in the index before ingesting the new data instead of "| delete".

Is there a good way to list all of the buckets for an index? (| dbinspect or index=test | eval bkt=_bkt | table bkt splunk_server)??

Then, remove them? (reference below)
https://answers.splunk.com/answers/133845/delete-corrupt-bucket-or-down-index-in-cluster.html

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi jaredlaney

Ok, I got a new idea. It involves three steps:

1 - Create a scheduled search that identifies the latest (by indextime) source by sourcetype and adds a field called current and saves this into a lookup. Run it every 5 minutes (or similar) and have the time range set to the longest snapshot batch load internal (7days?).

index=* | eval _time=_indextime | stats first(source) AS source by sourcetype | eval current=1 | outputlookup createinapp=t snapshot.csv

This assumes your snapshots files have different filenames or a different paths. If not, we need to find some other unique field.

2 - Create a lookup definition and then configure an automatic lookup and attach it to all sourcetypes that involves these snapshots.

In props.conf or through the GUI:

[(?::){0}snapshot*]
LOOKUP-snapshot = snapshot source AS source OUTPUTNEW current AS current

The stanza above would attach the lookup to all sourcetypes that start with "snapshot"

3 - Add current=1 to the Restricted search terms under Settings->Access Controls->Roles->YOURUSERROLE

This way your users would only see the latest sources by indextime without knowing. I tried this in my own environment and the concept works.

Let me know if this works for you.

j

0 Karma

jaredlaney
Contributor

Thanks for you continued effort. I really appreciate it. I'll start working on this.

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

How about applying a restricted search term for the Role of the users.
Settings->Access Controls->Roles->YOURUSERROLE

If you add something like this they will only see what has been indexed in the last day regardless of the event time:

_index_earliest=-1d@d

This will happen in the background so your users would never know.

j

0 Karma

jaredlaney
Contributor

@jberke - Again, the snapshot has data from the past few months so putting in _index_earliest=-1d@d wouldn't work.

Is there a _index_latest(source) command I could run by role?

0 Karma

jaredlaney
Contributor

See example:

Example
index = test
source1 = (1/28, 5), (1/29, 6)
source2 = (1/30, 7), (1/29, 6), (1/29, 5.5), (1/28, 5), (1/28, 5.4)

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi jared

I think you are misunderstanding, _index_earliest is not the same as earliest . By typing _index_earliest=d@d you would show all data that was indexed during the day today regardless of event time - even if that data is many years old. The time picker would say "All time" but it would only show what has been indexed since this morning at 00:00.

j

0 Karma

jaredlaney
Contributor

@jbjerke - I'm sorry. You are correct. I was interpreting it incorrectly. I will try it out. Thanks for the tip.

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

No worries 🙂

If it works, please mark as answered.

j

0 Karma

jaredlaney
Contributor

@jbjerke - What would my restricted search terms be if I have multiple snapshot indexes and different intervals? (Daily vs. Hourly)

Do I have to create different roles for different sourcetypes or indexes? The method seems to break when I think about scaling. Thoughts?

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

My thoughts on this is that it wont work so well with multiple time frames like you describe. I will think of another solution.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

When you run the delete command, it actually marks the data / buckets as unsearchable. So when you say you are seeing these results come back in searches, this sounds like a bit of problem. They shouldn't be returned in search results, if they are you might want to talk to support and see if there is a bug.

In terms of deleting the index, you can use dbinspect to find the buckets...

| dbinspect index=main | convert ctime(endEpoch) ctime(startEpoch) | table bucketId path startEpoch endEpoch

That will give you the location on disk and associated times with the data in the buckets. You could manually delete the buckets...

If you are in a clustered environment, you need to be careful the way you delete the indexes or buckets.. Best practices would be to put the CM into maintenance mode. From there you need to clean the index on each indexer, you have to stop Splunk first...

splunk clean eventdata -index <index_name>

After that you can restart the indexers and take the CM out of maintenance mode.

0 Karma

jaredlaney
Contributor

@esix - There is a bug in Splunk. It is SPL-100516. We haven't been able to get it fixed so we're looking for alternate ways to do it. We're all Splunk Certified Architects so we're aware of how to delete in a clustered environment. We just have a high SLA and bringing Splunk into maintenance mode isn't a great option for us. We've also stopped all the indexers and ran the "splunk clean eventdata" command.

We're kind of looking for a solution where we can freeze buckets without taking down the cluster.

0 Karma

jbjerke_splunk
Splunk Employee
Splunk Employee

Hi jaredlaney

Would it be possible to just tell your users to set the time picker to "Today". That way they would only see that data imported in the last batch. This is much easier.

You could even set "Today" as the default time range when your users login to Splunk.

j

0 Karma

jaredlaney
Contributor

Our snapshots are also indexed by time so choosing the time picker to Today would cut off much of the snapshot as the snapshots become intertwined over time.

Example
index = test
source1 = (1/28, 5), (1/29, 6)
source2 = (1/30, 7), (1/29, 6), (1/29, 5.5), (1/28, 5), (1/28, 5.4)

0 Karma

jaredlaney
Contributor

Hi jbjerke. We suggested this previously but our users didn't love it. Maybe we could try it again.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...