Getting Data In

Splunk DB Connect: How to combine data from 3 different sourcetypes into single event with join, transaction or stats?

LiquidTension
Path Finder

Hi all,
I have been giving my best shot an attempting to use stat or transaction without success. I don’t even know if they are the best way to accomplish what I am looking for.

I have 3 queries which are populating 3 sourcetypes and this is working great:

Query to make sourcetype 1

| dbquery "app_stage" "SELECT FILING_ID, JOB_ID FROM custom.app_1_FILING"

(this table does not contain a timestamp but should be the first one written by the app)

Query to make sourcetype 2

| dbquery " app_stage " "SELECT FILING_EVENT_ID, FILING_ID, to_char(FILING_DATE,'YYYY-MM-DD HH24:MI:SS') FILING_DATE, A_NUMBER, RESPONSE_SUBJECT, RESPONSE_SUBJECT FROM custom. 1_FILING_EVENT" limit=1000

Query to make sourcetype 3

| dbquery "jms_stage" "SELECT JOB_ID, PROJECT_ID, PROJECT_ID, JOB_NUMBER, to_char(CREATE_DATE,'YYYY-MM-DD HH24:MI:SS') CREATE_DATE, CREATE_BY_NAME FROM custom. app_JOB" limit=1000

JOB_ID in sourcetype 1 is equal to JOB_ID in sourcetype 3
FILING_ID in sourcetype 1 is equal to FILING_ID in sourcetype 2

I want to eventually present various dashboards to display the status of this applications workflow. The big use case will be when a JOB_ID generates a FILING_ID but does not yield an A_NUMBER and RESPONSE_SUBJECT (from sourcetype 2).

I have tried various searches using join, transaction, and stat but the most I have been able to achieve is ending up with a table showing the JOB_ID and FILING_ID aligned as a single event but missing all of the other field data.

This is an example of one of these using stats for testing (I would optimize the search later by being as specific as possible.

index=* host=* sourcetype=stage*  JOB_ID=*
| stats values(FILING_DATE) AS FilingDate, values(A_NUMBER) AS ANum, values(RESPONSE_SUBJECT) AS ResponseSubject, values(PROJECT_ID) AS ProjectID, values(JOB_NUMBER) AS JobNum by FILING_ID, JOB_ID

Anybody able to help steer me in the right direction?

0 Karma

aholzer
Motivator

Look into the "join" command. Note that subsearches have a default limit of 10k results though.

Also - have you considered doing the combination while you are running the SQL instead of in Splunk? If you are familiar with SQL why not do all the necessary joins before storing the data in Splunk? Is the data coming from different databases?

LiquidTension
Path Finder

So I took your advice and developed a single query that joins everything together from the 3 separate tables and it works great.

I do have a question specific to DB connect now. When I set the rising column do I have to worry that some columns may not populate for a time while the "rising column" may have incremented hundreds of times?

For context, 2 of the columns (the important ones) are only filled when an email response comes and the process is a success. The rising columns are incremented every time a user starts this process.

This table will have over a million rows and if splunk does a differential based on the rising column only that would not help for our use case.

0 Karma

jcoates_splunk
Splunk Employee
Splunk Employee

Yes, I think that it is going to do that differential based on the rising column.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...