Hi all,
I have been giving my best shot an attempting to use stat or transaction without success. I don’t even know if they are the best way to accomplish what I am looking for.
I have 3 queries which are populating 3 sourcetypes and this is working great:
Query to make sourcetype 1
| dbquery "app_stage" "SELECT FILING_ID, JOB_ID FROM custom.app_1_FILING"
(this table does not contain a timestamp but should be the first one written by the app)
Query to make sourcetype 2
| dbquery " app_stage " "SELECT FILING_EVENT_ID, FILING_ID, to_char(FILING_DATE,'YYYY-MM-DD HH24:MI:SS') FILING_DATE, A_NUMBER, RESPONSE_SUBJECT, RESPONSE_SUBJECT FROM custom. 1_FILING_EVENT" limit=1000
Query to make sourcetype 3
| dbquery "jms_stage" "SELECT JOB_ID, PROJECT_ID, PROJECT_ID, JOB_NUMBER, to_char(CREATE_DATE,'YYYY-MM-DD HH24:MI:SS') CREATE_DATE, CREATE_BY_NAME FROM custom. app_JOB" limit=1000
JOB_ID
in sourcetype 1 is equal to JOB_ID
in sourcetype 3
FILING_ID
in sourcetype 1 is equal to FILING_ID
in sourcetype 2
I want to eventually present various dashboards to display the status of this applications workflow. The big use case will be when a JOB_ID
generates a FILING_ID
but does not yield an A_NUMBER
and RESPONSE_SUBJECT
(from sourcetype 2).
I have tried various searches using join, transaction, and stat but the most I have been able to achieve is ending up with a table showing the JOB_ID
and FILING_ID
aligned as a single event but missing all of the other field data.
This is an example of one of these using stats for testing (I would optimize the search later by being as specific as possible.
index=* host=* sourcetype=stage* JOB_ID=*
| stats values(FILING_DATE) AS FilingDate, values(A_NUMBER) AS ANum, values(RESPONSE_SUBJECT) AS ResponseSubject, values(PROJECT_ID) AS ProjectID, values(JOB_NUMBER) AS JobNum by FILING_ID, JOB_ID
Anybody able to help steer me in the right direction?
Look into the "join" command. Note that subsearches have a default limit of 10k results though.
Also - have you considered doing the combination while you are running the SQL instead of in Splunk? If you are familiar with SQL why not do all the necessary joins before storing the data in Splunk? Is the data coming from different databases?
So I took your advice and developed a single query that joins everything together from the 3 separate tables and it works great.
I do have a question specific to DB connect now. When I set the rising column do I have to worry that some columns may not populate for a time while the "rising column" may have incremented hundreds of times?
For context, 2 of the columns (the important ones) are only filled when an email response comes and the process is a success. The rising columns are incremented every time a user starts this process.
This table will have over a million rows and if splunk does a differential based on the rising column only that would not help for our use case.
Yes, I think that it is going to do that differential based on the rising column.