Getting Data In

Using the collect command in a scheduled search to add data to an index, why does each bucket only have 500MB of data with my current retention policy?

bruceclarke
Contributor

Hi all,

I have a search that runs about every 20 minutes to merge a bunch of information together and make it easily accessible in a separate index. I do this using the collect command that Splunk provides. The search looks something like:

{huge aggregation query that merges a bunch of logs together}
| table _time RequestId UrlHit RoundTripDur ServerDur BrowserDur ...
| collect index=requestIndex sourcetype=mySourceType addtime=true testmode=false

I have a retention policy for the requestIndex index that says to make the maxDataSize over 5GB. Essentially, I want to make each bucket store a day's worth of data.

When I look at the actual breakdown, however, I'm seeing multiple buckets a day with only ~500MB of data each. This is well below the value I set in indexes.conf (again, 5GB). Does this have something to do with the collect command? Is there anyway I can make it so that requestIndex respects my setting?

Here is my stanza from indexes.conf. I have restarted Splunk to make sure this stanza took effect:

[requestIndex]
frozenTimePeriodInSecs = 63072000
maxDataSize = 50000
coldPath = F:\splunkIndex\CustomIndexes\requestIndex\colddb
maxWarmDBCount = 730
0 Karma

somesoni2
SplunkTrust
SplunkTrust

Can you confirm if you're setting following attribute in your indexes.conf for the requestIndex? Also, if you can post what values have been set.

maxDataSize
maxHotIdleSecs
maxHotBuckets

0 Karma

bruceclarke
Contributor

I've edited the question above with my indexes.conf staza. I set maxDataSize, but not maxHotIdleSecs or maxHotBuckets. For what it's worth, I have never had a problem with other indexes which also don't have the latter two values set.

0 Karma

masonmorales
Influencer

Can you post the stanza for this index from your indexes.conf? Also, did you restart your indexer(s) after configuring maxDataSize?

0 Karma

bruceclarke
Contributor

I've edited the question above with my indexes.conf staza. I have restarted the indexer since configuring maxDataSize.

0 Karma

gyslainlatsa
Motivator

hi bruceclarke,

No, it has nothing to do with the collect command but rather concerns the type of license splunk you use. it is because you are using the free version and it allows you to index than 500Mb per day. If you want to index more data, change your license splunk.

for more informations about the License SPLUNK see below:

http://docs.splunk.com/Documentation/Splunk/6.3.2/Admin/MoreaboutSplunkFree

http://docs.splunk.com/Documentation/Splunk/6.3.2/Admin/TypesofSplunklicenses

0 Karma

bclarke5765
Explorer

Thanks, but this is definitely not the issue. I'm using the full license. I'm allowed 125GB per day. I have a separate index which is collected without a using the collect command (i.e. it is forwarded to and indexed by a Splunk indexer). This separate index has buckets of over 5GB each day.

For reference, these are the "buckets" that I'm referring to (warm, cold, frozen): https://wiki.splunk.com/Community:UnderstandingBuckets

0 Karma

shahid285
Path Finder

Hi bclarke5765,
I am not having a solution to your problem, as i am stuck at very first stage of collecting the data into index. But i think you might help me cross this initial hurdle of mine where i am trying to write an event to an index with a source type, using collect command, but it has literally no effect of insertion of data in to the index.
The following is the link to my issue. Please do have look,and help me with this.

https://answers.splunk.com/answers/736766/is-there-a-possibility-to-write-an-event-to-splunk.html

Thanks in advance,
Shahid

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...