Getting Data In

lookup csv file contains multiple occurrences of items. Need to query these items in an index for each unique time stamp range in csv

ebele
New Member

lookup csv format where EVENT_ID can have multiple SiteID fields and SiteID can have multiple EVENT_IDs. Only SiteID is a field in the splunk index.

YEAR, SiteID, earliest_date, latest_date, EVENT_ID
2019, AB111, 1560988800, 1562112000, ABSE00350
2019, AB111, 1562198400, 1563321600, ABSE00351
2019, AB111, 1548892800, 1550016000, ABSE00352
2019, AB112, 1548892800, 1550016000, ABSE00352

I use the lookup to query an index, to calculate a KPI for each row.

Ideal query Output (KPI computed for unique combination of SiteID & EVENT_ID):

SiteID, KPI, EVENT_ID
AB111, 68.4, ABSE00350
AB111, 74.3, ABSE00351
AB111, 22.1, ABSE00352
AB112, 34.5, ABSE00352

This is the top of my code, where I do the inputlookup, before proceeding to calculate the KPI from data in the index. However it only gives me a result with a single aggregate of the SiteID, not unique per row as desired.

index=pm_busy_half_hour
[| inputlookup Test2.csv
| rename earliest_date as earliest, latest_date as latest
| table SiteID earliest latest
]
.....
.....
.....

Please advise

0 Karma

ololdach
Builder

Hi ebele,
I assume that only the SiteID field is in your indexed events, not the EVENT_ID. If the assumption is correct, then a possible solution could be:

index=pm_busy_half_hour | dedup SiteID | lookup Test2.csv SiteID as SiteID output EVENT_ID | mvexpand EVENT_ID | lookup Test2.csv EVENT_ID output earliest_date,latest_date  | eval KPI=round(random() % 1000/10,1)

First, select all SiteID's that you'd like the KPI on and make sure they are unique (dedup). Then lookup all EVENT_IDs for the SiteIDs that you are interested in. They will be returned as MV multivalue field. Use mvexpand to make every EVENT_ID into a unique event. Use the same lookup file again, using the EVENT_ID as key this time, to retrieve the dates for each EVENT_ID. From here on do whatever you do to calculate the KPI. The random is just to show the concept in a search that does somethin.

Hope it helps
Oliver

0 Karma

ebele
New Member

Thanks @ololdach for the feedback on using multivalue field. Yes, only the SiteID field is in the indexed events.

After I try this, it seems the search is not recognizing 'earliest_date' and 'latest_date' as time fields. It does not calculate the KPI for each row between these time windows. Instead uses the default time range picker. I believe that that is why I did this renaming in my earlier sample code:

| rename earliest_date as earliest, latest_date as latest

Also, I need to apply this to a lookup csv with same format, but where EVENT_ID can have multiple SiteID fields and SiteID can have multiple EVENT_IDs. (added fourth row to illustrate this):

YEAR, SiteID, earliest_date, latest_date, EVENT_ID
2019, AB111, 1560988800, 1562112000, ABSE00350
2019, AB111, 1562198400, 1563321600, ABSE00351
2019, AB111, 1548892800, 1550016000, ABSE00352
2019, AB112, 1548892800, 1550016000, ABSE00352

Ideal query Output: (KPI computed for unique combination of SiteID & EVENT_ID)
SiteID, KPI, EVENT_ID
AB111, 68.4, ABSE00350
AB111, 74.3, ABSE00351
AB111, 22.1, ABSE00352
AB112, 34.5, ABSE00352

0 Karma

ololdach
Builder

Hi ebele,

you have epoch timestamps in your sample data. If you want the the timestamps to display as human readable dates, use an

 

... | eval human_earliest=strftime(earliest_date,"%c") | ...

 

 If you have a dashboard with a time range picker that populates a token called "time_selection" and would like to  have your search to deliver only the events in the selected time range add the following to your search:

 

...| where earliest_date>=$time_selection.earliest$ AND latest_date<$time_selection.latest$ | ...

 

Last, but not least, if you would like to honor the timestamps in your standard search bar, using the standard time selector on the right of the splunk search field... add the following to your search:

...| addinfo | where earliest_time>=info_min_time AND latest_time<info_max_time | ...

I assume that you only want those KPIs that have both, the earliest and the latest time inside the time range. You may need to tweak that to meet your requirements.

Hope it helps,

Oliver

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...