About joshbeckett

joshbeckett · ‎05-14-2020

I have several questions about data architecture that are rooted in CIM data models and performance considerations. Background: We have about 2T of new log data every day. Some sourcetypes get 100's of M of new events per day, one gets 1.1 B new events per day, quite a few get a few M new events per day. From a data architecture standpoint, we generally drop our events from a given log generator type into a index and sourcetype for the technology, such as windows events go into index = win sourcetype = win. These are not the real names, but you get the idea. When evaluating the CIM data models, windows events span a range of data models, depending on the event type. As an example, Windows events can potentially be a part of the following CIM data models (list not complete) - Alerts Application State Authentication Certificates Inventory etc... Questions: Given that we have massive data considerations and this could adversely affect the performance of any given search, wouldn't it be prudent to create a data architecture that would sort data into smaller piles by index and sourcetype that more closely mimics the CIM data models? Would changing our sourcetype for windows events from sourcetype = win to sourcetype = win-authentication and sourcetype = win-application-state (et. al.) have significant implications on performance and potentially reduce the search target area of a given model from a really big 'pile' to a smaller, more specific 'pile' of event types? Would such a data architecture give noticeably better performance improvements over data model acceleration or in addition to data model acceleration or would it be a wash? Does anyone else out there leverage any data architecture based designs at the index and sourcetype levels for their data due to performance concerns? If so, can you give an example of your data architecture design and ballpark volumes of data? What other considerations may have led you to that data architecture design? Are there any flaws in this line of thinking? Is it potentially too much work to manage when contrasted with potentially small performance gains? Are the performance gains worth the overhead of setting up and maintaining the data architecture?

joshbeckett · ‎04-03-2020

Per my original question: I have tried to a couple of iterations of fillnull statements against the ev and dailyEv variables without success. I believe the issue may be related to streamstats and the fact that the _time field may be missing and required when the events are no longer seen in myfeed.

joshbeckett · ‎04-01-2020

Thank you for your help. Certainly an interesting solution. I wasn't familiar with that command. Unfortunately, I am getting the same results as before. The final table and visualization do not have dates with zero data when the data does to zero.

joshbeckett · ‎03-26-2020

I have some data that is being forwarded to another entity via our heavy forwarders and I am trying to monitor that stream to ensure it doesn't fail or go too high or low. The below query is a stepping stone toward some other graphing that I want to do, but I need to solve the issue where my charted data stops when the feed goes to zero (aka dies). To be clear, it is the source feed going to my HF on my side that has died, not the HF itself. I know this because there are multiple feeds and only one is down. The others are fine. index=myindex sourcetype=mysourcetype group=per_sourcetype_thruput series=myfeed | bin _time span=1d | stats sum(ev) as dailyEv by _time sourcetype | streamstats time_window=30d avg(dailyEv) as avgev stdev(dailyEv) as standardDev by sourcetype | eval lowerBound=(avgev-(standardDev*2)) | eval upperBound=(avgev+(standardDev*2)) | eval isOutlier=if(dailyEv < lowerBound OR dailyEv > upperBound, 1, 0) | table _time,dailyEv,lowerBound,upperBound,isOutlier I am watching a rolling 30d worth of data but when the event count [sum(ev)] goes to zero on calendar day 22, the graph stops at calendar day 21, even though today is calendar day 26. I have tried to a couple of iterations of fillnull statements against the ev and dailyEv variables without success. I believe the issue may be related to streamstats and the fact that the _time field may be missing and required when the events are no longer seen in myfeed. Any thoughts on how to get the table to show zero values when myfeed dies so that I can potentially alert on isOutlier?

Posts	4
Solutions	0
Karma Given	0
Karma Received	1
Member Since	‎06-05-2019

Online Status	Offline
Date Last Visited	‎06-17-2020 05:27 PM

Questions about data models, data architecture, an...

Dates with zero data don't populate with zeros

Questions about data models, data architecture, an...

Re: Dates with zero data don't populate with zeros

Re: Dates with zero data don't populate with zeros

Dates with zero data don't populate with zeros