Splunk Search

How do you purge a lookup table of values over 30 days?

JakeInfoSec
Explorer

So I have a search that runs hourly over a lookup table which I have created that includes IP, ticket number, date_added, date_last_seen. Every hour, I search for the IPs on this list across the logs to see the last time we have seen the IP. I currently use the below search to achieve this, which may not be the simplest way for me to do this.

sourcetype=blah| lookup suspicious_list.csv suspect_ip as src_ip OUTPUT Ticket_num date_added date_last_seen | search Ticket_num="*" | rename src_ip as suspect_ip | eval time=strftime(_time, "%H:%M:%S %m-%d-%y") | eval date_last_seen=time | table suspect_ip, Ticket_num, date_added, date_last_seen | inputlookup append=t suspicious_list.csv | dedup suspect_ip | outputlookup suspicious_list.csv

I would like to have this search remove entries that have a date_last_seen value that is from greater than 30 days ago. I had issues setting this search up to begin with due to the multiple lookups, but I can't seem to figure out the best way to work with comparing the times. I tried something like the following but was not successful.

 sourcetype=blah| lookup suspicious_list.csv suspect_ip as src_ip OUTPUT Ticket_num date_added date_last_seen | search Ticket_num="*" | eval diff=(_time - newt_t) | where diff < 2436985 | rename src_ip as suspect_ip | eval time=strftime(_time, "%H:%M:%S %m-%d-%y") | eval date_last_seen=time | table suspect_ip, Ticket_num, date_added, date_last_seen | inputlookup append=t suspicious_list.csv | dedup suspect_ip | outputlookup suspicious_list.csv
0 Karma
1 Solution

pkeenan87
Communicator

The relative time function should be useful here: https://docs.splunk.com/Documentation/Splunk/7.2.4/SearchReference/DateandTimeFunctions#relative_tim...

Something like this right before your outputlookup command should get rid of all the entries older than thirty days

sourcetype=blah
| lookup suspicious_list.csv suspect_ip as src_ip OUTPUT Ticket_num date_added date_last_seen 
| search Ticket_num="*" 
| rename src_ip as suspect_ip 
| eval date_last_seen=_time  
| table suspect_ip, Ticket_num, date_added, date_last_seen 
| inputlookup append=t suspicious_list.csv 
| where date_last_seen > relative_time(now(), "-30d@d")
| outputlookup suspicious_list.csv

It looks like right now date_last_seen field is a string, keeping it in the numerical format will help us perform the date math

View solution in original post

0 Karma

woodcock
Esteemed Legend

First of all, always store your times as time_t values (AKA epoch, which is an integer); never as a formatted time.
When you pull the data back in, DO NOT convert it with eval, instead use fieldformat so that when you write it back out, it is still a time_t. Better yet, just keep the time value as _time which has an implied fieldformat already. This makes the math and other work very easy, like this:

index=foo sourcetype=bar
| other command stuff here
| inputlookup append=t YourLookupHere
| dedup YourByFieldsHere
| where _time >= relative_time(now(), "-30d@d")
| outputlookup YourLookupHere
0 Karma

pkeenan87
Communicator

The relative time function should be useful here: https://docs.splunk.com/Documentation/Splunk/7.2.4/SearchReference/DateandTimeFunctions#relative_tim...

Something like this right before your outputlookup command should get rid of all the entries older than thirty days

sourcetype=blah
| lookup suspicious_list.csv suspect_ip as src_ip OUTPUT Ticket_num date_added date_last_seen 
| search Ticket_num="*" 
| rename src_ip as suspect_ip 
| eval date_last_seen=_time  
| table suspect_ip, Ticket_num, date_added, date_last_seen 
| inputlookup append=t suspicious_list.csv 
| where date_last_seen > relative_time(now(), "-30d@d")
| outputlookup suspicious_list.csv

It looks like right now date_last_seen field is a string, keeping it in the numerical format will help us perform the date math

0 Karma

JakeInfoSec
Explorer

Oh wow yeah that relative_time is useful. I still seem to not be getting the correct results that I am looking for, the search still seems to be returning some values that have a date of "22:44:00 11-10-18" and now all my new updated times are in the Unix time format.

Below is what I am using

 sourcetype=blah
 | lookup suspicious_list.csv suspect_ip as src_ip OUTPUT Ticket_num date_added date_last_seen 
 | search Ticket_num="*" 
 | rename src_ip as suspect_ip 
 | eval date_last_seen=_time  
 | table suspect_ip, Ticket_num, date_added, date_last_seen 
 | inputlookup append=t suspicious_list.csv 
 | where date_last_seen > relative_time(now(), "-30d@d")
 | dedup SIRT_suspect_identifier
0 Karma

pkeenan87
Communicator

Since some of the old entries in the existing lookup are using the old definition of date_last_seen it wont work with the data math since they are strings. You will need to purge those manually for now. Going forward the date math will take care of everything

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...