Hi. I've been a very basic user of Splunk for a while, but now have a need to perform more advanced searches. I have two different sourcetypes within the same index. Examples of the fields are below.
index=vehicles
Sourcetype=autos
VIN
MAKE
MODEL
Sourcetype=cars
SN
MANUFACTURER
PRODUCT
I'd like to search and table VIN, MAKE, MODEL, MANUFACTURER and PRODUCT where -
VIN=SN
MAKE <> MANUFACTURER
OR
MODEL<>PRODUCT
Basically, where VIN and SN match, if one or both of the other fields don't match, show me.
I'm not sure if a join (VIN and SN) statement is the best approach in this case. I've researched and found questions and answers related to searching and comparing multiple sourcetypes. But, I've been unable to find examples that include conditions. Any suggestions you can provide would be greatly appreciated.
Thank you!
Here is a very simple example of "joining" two different datasets together based on their common ID. Almost all of the example is just setting up some example data. What you really need are the last 3 lines.
If you paste this to a search window it will randomly return you some results if the PRODUCT contains MISMATCH - if you remove the last line of the example you will all results of the made up data.
| makeresults
| fields - _time
``` Make some data for Sourcetype=autos ```
| eval sourcetype="autos"
| eval MAKE=split("Audi,Porsche,Mercedes",",")
| mvexpand MAKE
| eval MODEL=case(MAKE="Audi", split("AU-123,AU-988", ","), MAKE="Porsche", split("PO-123,PO-988", ","), MAKE="Mercedes", split("MX-123,MX-988", ","))
| mvexpand MODEL
| eval VIN=case(MAKE="Audi", split("AU-VIN:12345678,AU-VIN:9876543", ","), MAKE="Porsche", split("PO-VIN:12345678,PO-VIN:9876543", ","), MAKE="Mercedes", split("MX-VIN:12345678,MX-VIN:9876543", ","))
| mvexpand VIN
| eval VIN=MODEL.":".VIN
``` Make some identical data for Sourcetype=autos ```
| append [
| makeresults
| fields - _time
| eval sourcetype="cars"
| eval MANUFACTURER=split("Audi,Porsche,Mercedes",",")
| mvexpand MANUFACTURER
| eval PRODUCT=case(MANUFACTURER="Audi", split("AU-123,AU-988", ","), MANUFACTURER="Porsche", split("PO-123,PO-988", ","), MANUFACTURER="Mercedes", split("MX-123,MX-988", ","))
| mvexpand PRODUCT
| eval SN=case(MANUFACTURER="Audi", split("AU-VIN:12345678,AU-VIN:9876543", ","), MANUFACTURER="Porsche", split("PO-VIN:12345678,PO-VIN:9876543", ","), MANUFACTURER="Mercedes", split("MX-VIN:12345678,MX-VIN:9876543", ","))
| mvexpand SN
| eval SN=PRODUCT.":".SN
| eval PRODUCT=PRODUCT.if(random() % 100 < 10, "-MISMATCH", "")
]
``` Take the common field ```
| eval COMMON_ID=if(sourcetype="autos", VIN, SN)
| stats values(*) as * by COMMON_ID
| where MAKE!=MANUFACTURER OR MODEL!=PRODUCT
Don't ever consider JOIN as the first option - it's not a Splunk way of doing things and has numerous limitations. Splunk uses stats ... BY COMMON_FIELD.
Hope this helps
Actually, the further I review this, the more confused I get. In your example, why did you split makes and models? Is it necessary to append data from one sourcetype to the other? I assume so, otherwise the where command would be invalid.
You're right, though. The last three commands are key to the search. As powerful as Splunk is, I'd sure think there's a much simpler process to search multiple sourcetypes with conditions applied. (There truly is no comparison between the two, but I could create this query using Access in about 30 seconds. However, the amount of data I'm searching is far too large for Access...)
Thanks for any feedback you can provide.
Thanks for your response, bowesmana! You've got me headed in the right direction.