Splunk Search

How to join data from two indexes

poojak2579
Explorer

I want to get data from joining two indexes out of which one is summary index.
Summary Index has more than 500000 records
I have two fields Asset and Date in the summary index as well as in the other index.

I am planning to schedule a query that will check for any new asset in today's records and if it is a new, it will insert that record in the summary index.

I tried to do it by leftjoin but it works if I specify a particular value for the Asset.

Below is the query that works(ASSETNAME is the hard coded value)

index=I ASSETNAME earliest=-1d@d latest=now
|fields Asset Date
|join type=left Combo(index=summary ASSETNAME earliest=-1Y@Y latest=now|eval Asset1=Asset
|where isnull(Asset1)

The same query does not work if I remove the asset name and run it with all the records in the Summary Index.

It shows me null values in the column 'Asset1' for the assets that are there in the summary index.

I am not sure if it is because of the the limit of the records a subsearch can return.
Please suggest that if there is a better way of querying instead of using join.

I tried to do it by this way also but it is not showing me complete set of records.

(index=I earliest=-1d@d latest=now) OR (index=summary earliest=-1Y@Y latest=now)

Tags (1)
0 Karma

poojak2579
Explorer

am sorry for the mistake, I wrote just part of the query .

Index=I has many other fields along with Asset and Date.

I want to get the Earliest Date when that asset was tracked.

Here is the query

index=I earliest=-1d@d latest=now
|eval combo = Asset+""+ID
|stats min(First_Found_Date) as Earliest_FF by combo
|join type=left combo[search index=summary earliest=-1Y@Y latest=now|eval combo=Asset+"
"+ID|stats min(Earliest_FF) as Earliest_FF by combo|eval combo1=combo]

|where isnull(combo1)

0 Karma

DalJeanis
Legend

@poojak2579 - did you ever get this issue resolved?

0 Karma

DalJeanis
Legend
  • Always mark your code as code (the button marked 101 010 for example) so that the web interface doesn't
    strip out HTML-like constructs.

  • Your code as posted can't work, because the subsearch isn't in square braces.

  • According to the posted code, you are left-joining on a field named Combo, and we can't see what that is.

  • From your description, I don't know whether Date is a required part of the asset information, or whether Asset is a unique key. The right answer depends on that.

  • We don't know the format of the field Date. It may be significant to the solution.

  • Your final line as posted is correct. (I'd suggest using latest=@d for the index=I, but that's quibbling.) The next part after that is the processing that determines which ones need to be added.

This version assumes that Date1 is the Date field on the index=I record, that it is called Date2 on the summary record, and that Asset is a unique key regardless of the Date value.

(index=I Asset=* earliest=-1d@d latest=@d) OR (index=summary Asset=* earliest=-1Y@Y latest=now)
| eval NewDate = if(index=I, Date1, null())
| stats latest(NewDate) as Date2, count(eval(index="summary")) as SumExists by Asset
| where SumExists=0
| table Asset Date2

poojak2579
Explorer

Thanks for the reply.
Not sure why OR is not working for me.
I tried this but it is not showing all the Assets.

index=I earliest=-1d@d latest=now) OR (index=summary earliest=-1Y@Y latest=now)

0 Karma

DalJeanis
Legend

That part you posted is missing the initial open parenthesis.

Hey, @cmerriman - If you could, please run a quick run-anywhere search to verify "earliest" can be used like that, with two different values inside different parenthesis? I've never actually run something like that and I'm not at work today/ tomorrow so I can't test it.

0 Karma

cmerriman
Super Champion

i ran it with some of my own data using a earliest=-30d@d latest=@d and earliest=-1d@d latest=now and my events went from an average of 400 events/day to 100k yesterday, so i'd say it worked. i see both sourcetypes are coming through. @DalJeanis

0 Karma

poojak2579
Explorer

I shows me both the sourcetypes if I use "append" but only one sourcetype if I use "OR"

0 Karma

poojak2579
Explorer

So you are getting events for yesterday and today only and not 30 days?

0 Karma

cmerriman
Super Champion

I am getting events all 30 days for one of my events and only yesterday for the other. It seems to be working for me

0 Karma

poojak2579
Explorer

Dont know why it is not working for me.
I guess I have to use another approach than the summary indexing to make those queries faster that checks for earliest first found among million of records.

Thanks everyone for your help.

0 Karma

poojak2579
Explorer

Thanks Dal Jeanis
I have that initial open parenthesis in my query.
(index=I earliest=-1d@d latest=now) OR (index=summary earliest=-1Y@Y latest=now)

Actually I was also not sure when I posted this question if we can use earliest like that with two different values in two different parenthesis.

0 Karma

poojak2579
Explorer

I think we cant use earliest like that.
I tried this query on the same index and got today's results only.
(index=I earliest=-3d@d latest=now) OR (index=I earliest=-5d@d latest=now).

Is there any other way to get data from two different indexes with different time frame and without using join or append?

0 Karma

cmerriman
Super Champion

i would try this:

(index=I earliest=-1d@d latest=now) OR (index=summary earliest=-1Y@Y latest=now)|stats values(importantFieldFromIndex1) values(importantFieldFromSummaryIndex) by Asset Date

since both have Asset and Date fields, you should be returned with a lot of data from either index shared by the two common fields.

otherwise, if the join is the way you want to work it out, i'd try flipping them around, though i suggest trying to work through using the stats command, since join has limitations.

index=summary ASSETNAME earliest=-1Y@Y latest=now|eval Asset1=Asset
|where isnull(Asset1)|stats count by Asset Date
|join type=left Asset Date [index=I earliest=-1d@d latest=now
|fields Asset Date ]
0 Karma

poojak2579
Explorer

Thanks for the reply.

0 Karma

poojak2579
Explorer

It will be great if anybody can help me understand why Or is not working for me.

0 Karma

cmerriman
Super Champion

Try this and see what happens. What type of data comes back?

(index=I earliest=-1d@d latest=now) OR (index=summary earliest=-1Y@Y latest=now)
|eval combo = Asset+""+ID|eval combo1=If(index="summary",combo,null())
|stats min(First_Found_Date) as FFD min(Earliest_FF) as EFF values(combo1) as combo1 by combo
|where isnull(combo1)
0 Karma

poojak2579
Explorer

If I am not wrong "OR" should return everything from the index=I as well as Index=summary.

(index=qualys_summary earliest=-1Y@Y latest=now) returns 8000 Assets

(index=I earliest=-1d@d latest=now) returns 6000 Assets

(index=I earliest=-1d@d latest=now) OR (index=summary earliest=-1Y@Y latest=now) returns 2500 Assets from Index=I only

Both the indexes have different fields. But Asset and Id is common to both.

0 Karma

cmerriman
Super Champion

are you using index=qualys_summary or index=summary? you have it listed both ways in your previous comment

0 Karma

poojak2579
Explorer

Please ignore that. I am using summary for all my queries. Please ignore that mistake.

0 Karma

sbbadri
Motivator

try this

1) (index=A OR index=B) rest of your query
2) index=A earliest=-1d@d latest=now | append [ search index=summary earliest=-1Y@Y latest=now]

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...