I have this query below .. I need to report on the last successful backup 'over' 24 hours.. which this does... however what I cant figure out how to do is report when all backups have failed for 30 days. (no successful backups ever). _time is the time the backup ran and the log information written to splunk.
For instance if I remove the 'where' statement on the query it brings back 2 rows (1 success, 1 failed) on those backups that had both. it will bring 1 row back where there has been no successful backup (all failed)
I cannot figure out how to code the where to handle both conditions
index="storage" sourcetype="rubrik:prod" ndc="MSSQL_DB*" status=Failure OR status=Success
| rex field=_raw "from [\'](?[^\']\w+-\w+)"
| rex field=_raw "backup for (?\w+-\w+)\."
| rex field=_raw "eventSeriesId=(?.*?)\ objectId="
| rex field=_raw "objectName=(?.*?)\ eventId"
| rex field=_raw "Microsoft SQL Server Database \'(?.+)\' from"
| search SERVER_NAME="VMPIT-G4FDB003"
| search DATABASE_NAME="FBI"
| search NOT [ | inputlookup Servers_Pending_Deletion.csv | fields SERVER_NAME ]
| table SERVER_NAME _time _raw status DATABASE_NAME| stats max(_time) as TopTime by SERVER_NAME, DATABASE_NAME,status | sort by SERVER_NAME,DATABASE_NAME, _time desc
| where ((TopTime <= relative_time(now(),"-24h") and TopTime > relative_time(now(),"-30d") and status="Success" ))
| eval lm_24_ago=strftime(relative_time(now()-14400,"-24h"),"%m-%d-%y %H:%M:%S")
| eval lm_report_date=strftime(now()-14400,"%m-%d-%y %H:%M:%S")
| eval lm_7d_ago=strftime(relative_time(now()-14400,"-7d"),"%m-%d-%y %H:%M:%S")
| eval lm_last_backup=strftime(TopTime-14400, "%m-%d-%y %H:%M:%S") | sort by lm_last_backup desc
| eval lm_ci=SERVER_NAME
| eval lm_database=DATABASE_NAME
| eval lm_status=status
| eval lm_rows=rct
| table lm*
The biggest problem was and
instead of AND
. Other things were inefficient. Try this:
... | search SERVER_NAME="VMPIT-G4FDB003" AND DATABASE_NAME="FBI" AND NOT [ | inputlookup Servers_Pending_Deletion.csv | fields SERVER_NAME ]
| eval Time = if(_time >= relative_time(now(),"-24h"), mvappend(Time, "Within_24_hours"), Time)
| eval Time = if(_time >= relative_time(now(),"-30d"), mvappend(Time, "Within_30_days"), Time)
| eval Time = coalesce(Time, "UNKNOWN")
| stats max(_time) AS _time BY SERVER_NAME DATABASE_NAME status Time
| eval {Time} = _time
| stats values(*) AS * BY SERVER_NAME DATABASE_NAME status
At this point, the logic should be easy.
P.S. Your subject line is terrible; put some real effort into it next time. I was tempted to respond with, Can't figure out your question
.
Thanks will do better next time
Did the answer work for you?
something like this...?
index="storage" sourcetype="rubrik:prod" ndc="MSSQL_DB*" status=Failure OR status=Success
| rex field=_raw "from [\'](?[^\']\w+-\w+)"
| rex field=_raw "backup for (?\w+-\w+)."
| rex field=_raw "eventSeriesId=(?.?)\ objectId="
| rex field=_raw "objectName=(?.?)\ eventId"
| rex field=_raw "Microsoft SQL Server Database \'(?.+)\' from"
| search SERVER_NAME="VMPIT-G4FDB003"
| search DATABASE_NAME="FBI"
| search NOT [ | inputlookup Servers_Pending_Deletion.csv | fields SERVER_NAME ]
| table SERVER_NAME _time _raw status DATABASE_NAME| stats max(_time) as TopTime by SERVER_NAME, DATABASE_NAME,status | sort by SERVER_NAME,DATABASE_NAME, _time desc
| where ((TopTime <= relative_time(now(),"-24h") and TopTime > relative_time(now(),"-30d") and status="Success" ))
| eval lm_24_ago=strftime(relative_time(now()-14400,"-24h"),"%m-%d-%y %H:%M:%S")
| eval lm_report_date=strftime(now()-14400,"%m-%d-%y %H:%M:%S")
| eval lm_7d_ago=strftime(relative_time(now()-14400,"-7d"),"%m-%d-%y %H:%M:%S")
| eval lm_last_backup=strftime(TopTime-14400, "%m-%d-%y %H:%M:%S") | sort by lm_last_backup desc
| eval lm_ci=SERVER_NAME
| eval lm_database=DATABASE_NAME
| eval lm_status=status
| eval lm_rows=rct
| table lm*|append [search index="storage" sourcetype="rubrik:prod" ndc="MSSQL_DB*" status=Failure OR status=Success
| rex field=_raw "from [\'](?[^\']\w+-\w+)"
| rex field=_raw "backup for (?\w+-\w+)."
| rex field=_raw "eventSeriesId=(?.?)\ objectId="
| rex field=_raw "objectName=(?.?)\ eventId"
| rex field=_raw "Microsoft SQL Server Database \'(?.+)\' from"
| search SERVER_NAME="VMPIT-G4FDB003"
| search DATABASE_NAME="FBI"
| search NOT [ | inputlookup Servers_Pending_Deletion.csv | fields SERVER_NAME ]
| table SERVER_NAME _time _raw status DATABASE_NAME| stats max(_time) as TopTime by SERVER_NAME, DATABASE_NAME,status | sort by SERVER_NAME,DATABASE_NAME, _time desc
| where ((TopTime <= relative_time(now(),"-24h") and TopTime > relative_time(now(),"-30d") and status="Success" ))
| stats count as countx|where countx=0|eval countx="no backups"]
That was very helpful. Thanks for taking some time with this.. it is doing what I want and I understand what you did.
How would I do this 'without' passing in server_name and database_name ... to do this for all nodes and servers we have.
hi @kjonesdba_lm
I am a bit confused now replace specific server or db names with * in the query?
That should search for ALL server and db names, also as woodcock mentioned change your syntax of and to AND
PS - woodock is a very knowledgeable and valuable member of the forum, he acts tough sometimes, but trust me , he is real good inside 🙂 🙂
@kjonesdba_lm
I have converted my comment into an answer, please accept the answer if it significantly helped resolve your query for the benefit of other forum members, who might run into a similar issue.
"report when all backups have failed for 30 days"
Isn't below condition returns the result you expect?
If below condition didn't return any events, I believe it meant all backups failed for last 30 days,
... | where ((TopTime <= relative_time(now(),"-24h") and TopTime > relative_time(now(),"-30d") and status="Success" ))
how do you alert on that "if the condition didn't return any events"? that's the issue.. we use splunk to create problem tickets.. one wont be created for no results which is what occurs if backups fail on a database for 30days straight.
I am stumped