About mattcg

mattcg · ‎11-05-2010

What is the easiest way to transfer populated summary indexes from an old Splunk box over to a new instance? We have weeks worth of computation summarized in our summary indexes, I'd like to be able to simply transfer it (only the summaries, not the raw data) over to our new instance so our histograms continue to function. We currently use 3 different indexes: summary_day, summary_hour, summary_minute. Thanks! *Edit: to clarify, the summaries we have took weeks of computation (hence, we don't have the time to run them all from scratch) - not that we simply have weeks of data summarized, otherwise I would just use the fill_summary_index.py script.*

mattcg · ‎11-02-2010

Excellent. That's exactly what I was looking for, and I forgot to mention that we do use custom named summary indexes. I will try the setup as you suggest and report back. Thanks!

mattcg · ‎11-02-2010

Hello, I'm looking to set up our search head to send summary index data it generates back to our indexers in a distributed environment. I found the following question, and I understand the theory of the answer. However, I don't know specifically how to set up the search head as a forwarder and how to tell it to forward the summaries generated instead of indexing them. http://answers.splunk.com/questions/5837/summary-indexing-on-a-search-head Furthermore, is Splunk intelligent enough to determine that summaries generated by a search head and then forwarded back down to our indexers are summaries and therefore not count them toward our license? Thanks for any guidance.

mattcg · ‎10-13-2010

Can anyone shed light on why |delete is so slow and whether there is any way to increase performance?

mattcg · ‎10-13-2010

After importing several TB of logs, we have discovered that logs were repeated in 8 days worth of our dataset. The data is in a volume of around 2,000,000 events per hour. I'm trying to figure out the fastest way to delete this data so that we can import it again properly. One of the forwarders doing the forwarding during this timeframe was improperly configured so the events have to be deleted not simply deduped from search. The events are spread across 4 indexers (distributed environment), are in 2 indexes, and come from about 60 different sources. I opened up SSH on each indexer and am trying to run this query: ./splunk search "index=index_name1 source=/opt/splunk/var/spool/splunk/x* earliest=08/18/10:20:16:18 latest=08/19/10:0:0:0 | delete" This takes forever! It's been running for 1.5 hours now (on each box) to delete just 4 hours of data. Is there any faster way to do this or can I optimize my query for better performance? (I understand delete is a masking agent and that the events won't be removed until we clean the indexes, that's fine. The events just have to be masked from search results for now and, as I mentioned earlier, can't be deduped from search.)

mattcg · ‎09-30-2010

Strange I didn't see that question when I was searching. Thank you for the response, I'll give this a try.

mattcg · ‎09-30-2010

We're trying to set up a dynamic sourcetype extraction at index time. The reason for this is that we have about 40-50 different sourcetypes that would be generated in this format and we don't want to have 40-50 separate stanzas with a specific regex for each. Is there a way to extract and set sourcetype dynamically? Example Event: <190>Sep 29 19:38:46 hostIP.ec2.internal INFO-ct-UserTransaction: userID="123456789" transactionType="WorkerAssign" itemID="156" taskType="WorkBay"... transactionType will occur once at an inconsistent position in the payload after INFO-ct-UserTransaction: WorkerAssign is representative of the value we want to extract as the sourcetype. We may have up to about 50 different values in its place. What is the best way (if any) to extract and assign this sourcetype at index time?

mattcg · ‎08-17-2010

Please see edited question - how would you go about using summary indexing to achieve this kind of measurement on a field which is unique with millions of possible values and needs to be compared with the following days worth of unique events? This is why a statistical sampling seemed like the simple solution (though it isn't actually a possibility).

mattcg · ‎08-17-2010

Please see edited question - how would you go about using summary indexing to achieve this kind of measurement on a field which is unique with millions of possible values and needs to be compared with the following days worth of unique events? This is why a statistical sampling seemed like the simple solution (though it isn't actually a possibility).

mattcg · ‎08-16-2010

I am looking for a way to do a statistical sampling of events in order to provide rapid insight into millions of events. An example: We have login events from a specific sourcetype. Millions of logins per day, want to look at how often users are logging in over a 7 day period (% of 1 login per user, % of 2 logins per user, % of 3 logins per user, etc). Right now this is a very long query to run even as a scheduled search. < search string > | STATS count as logins by userID | STATS count as users by logins This gives me a graph of how many users are in each category of login attempts (1,2,3,4,5,etc). But it takes a very long time to run. Any suggestions on how to take a statistical sampling would be appreciated. << EDIT >> The suggestion has been to use summary indexing for this. How would you generate summary indexes to achieve this type of information? If, for example, we have 2 million unique users per day who login about 5 million times, shrinking 5 million daily events down to 2 million daily events (stats count by userID) is the only way I can think of to check for repeat visits on subsequent days. This still involves looking at 14+ million events in the summary index (which, it would seem, defeats the purpose of a summary index) each time the query needs to run. Any tips on how to optimize this kind of search?

mattcg · ‎08-12-2010

How can I get outputlookup or outputcsv to only include certain fields in the resulting lookup file? An example explains it better: SEARCH | DEDUP FieldName1 | FIELDS FieldName1, FieldName2 | OUTPUTLOOKUP lookupFile.csv I want the resulting lookup file to be formatted with just an entry of "FieldValue1, FieldValue2" per line per result. I do not want the full raw logs in the lookup as it seems to be doing.

mattcg · ‎08-03-2010

Can't seem to find information on adding a lookup table via the CLI. Can you provide a link or example? I assume you're not talking about changing config files in CLI, as those require a restart.

mattcg · ‎08-03-2010

Is there a way to manually specify a lookup table for a search using a csv located on the server without making conf changes that require a splunk restart? The lookup would be used for a single manual search so it can be specified at search time.

Posts	13
Solutions	0
Karma Given	1
Karma Received	14
Member Since	‎07-28-2010

Online Status	Offline
Date Last Visited	‎06-05-2020 02:02 AM

Easiest Way to Transfer Summary Index Data?

Distributed Summary Indexing from Search Head

Fastest Delete Method on Large Dataset

Dynamic Sourcetype Extraction

Generate Statistical Sampling

Specify Fields for Outputlookup or Outputcsv

Can you add a lookup table without a restart?

Easiest Way to Transfer Summary Index Data?

Re: Distributed Summary Indexing from Search Head

Distributed Summary Indexing from Search Head

Re: Fastest Delete Method on Large Dataset

Fastest Delete Method on Large Dataset

Re: Dynamic Sourcetype Extraction

Dynamic Sourcetype Extraction

Re: Generate Statistical Sampling

Re: Generate Statistical Sampling

Generate Statistical Sampling

Specify Fields for Outputlookup or Outputcsv

Re: Can you add a lookup table without a restart?

Can you add a lookup table without a restart?