I have 2 datasets:
1: Windows events to review that have a DoneBy
user and a DoneTo
user.
2: Work Orders in a DB that should have been the cause for some/most of the Windows events, each row having 3 fields: DoneBy
, Notes
, and Description
.
The Work Order DB is too big to deal with directly (I won't get into the details), Each Work Order DB row consists mainly of a Note
that has new content but other than the modified_date
usually has no other changes (although occasionally somebody adjusts the Description
field which is usually static row-to-row for every Work Order), so there is no good way to tie them directly together in a single SQL query. What I mean is that maybe there is a WO screener who assigns the WO to somebody else so that DB row has the correct DoneTo
embedded in the Notes
but the wrong DoneBy
(because it is the screener, not the admin who will do the work). Then a later row may have the correct DoneTo
but he may not reiterate what he did so there is no DoneBy
in the Notes
(which may just say something like "Done"). Because of this, I cannot write SQL to say, "Give me the row where DoneBy
=DoneBy
AND ( Notes
contains DoneTo
OR Description
contains DoneTo
) because these may be fractured among multiple distinct rows.
What I had to do is create 3 SQL filters, one for each single-field match, then joining all the Notes
rows together by Work Order ID
.
So now I have both datasets together and I need to overlay the Work Order dataset onto the Windows Event dataset and that's really where the problem is. Keep in mind that the Windows Event dataset is necessary to generate the WorK Order dataset so the searches cannot be separated. I need to end up with just the Windows Event dataset where each row has a (sometimes multi-valued) Work Order ID
field that is the only vestige of the other dataset.
I see no way of moving forward without saving off the Work Order dataset as a lookup file, which I definitely can do.
However, I need the wildcard capability to work opposite to the match_type
capability because the Event data will have a short DoneTo
field whose value will be embedded deep inside the lookup file's Notes
or Description
field. Is this possible with lookup
?
The only other option that I can think to use is to encode all of Windows Event data into a single KVP field string, feed this to map
, have map
reconstitute each Windows Event table row (using rex
to break apart the amalgam KVP single-field back into the original constituent multiple-fields row), then pulling in just the portions of the lookup file that match the fields of each single row. This is tedious and inefficient but I do not see any other way; do you?
The answer to this question, is:
You should not ever have to, so don't!
My problem with making the Event dataset the lookup file is that in it, as-is, there is nothing to use as the OUTPUT field of the lookup. So because there is nothing unique, it cannot be done. Then I realized (DUH!) that we can easily manufacture like this:
... | streamstats count AS serial
will manufacture a unique key in the event dataset that can be used as the OUTPUT field of the lookup. I am working on this now.
Then I have an Events dataset lookup that looks like this:
DoneBy,DoneTo,serial
*XXX\gwoodcock*,*XXX\luser1*,1
*XXX\gwoodcock*,*YYY\luser2*,2
Now I can do this from the Work Order dataset
... | lookup LookupName DoneBy OUTPUT serial AS DoneBySerial
... | lookup LookupName DoneTo OUTPUT serial AS DoneToSerial
The only tricky part is that I have to enable both match_type = WILDCARD(DoneBySerial), WILDCARD(DoneToSerial)
and also case_sensitive_match = false
for it all to work.
I can enhance the Work Order dataset with serial numbers, drop those with no lookup matches, pull back in the Events dataset and use stats
to merge on the *serial
fields (after some normalizing) to get my final mashup.
The answer to this question, is:
You should not ever have to, so don't!
My problem with making the Event dataset the lookup file is that in it, as-is, there is nothing to use as the OUTPUT field of the lookup. So because there is nothing unique, it cannot be done. Then I realized (DUH!) that we can easily manufacture like this:
... | streamstats count AS serial
will manufacture a unique key in the event dataset that can be used as the OUTPUT field of the lookup. I am working on this now.
Then I have an Events dataset lookup that looks like this:
DoneBy,DoneTo,serial
*XXX\gwoodcock*,*XXX\luser1*,1
*XXX\gwoodcock*,*YYY\luser2*,2
Now I can do this from the Work Order dataset
... | lookup LookupName DoneBy OUTPUT serial AS DoneBySerial
... | lookup LookupName DoneTo OUTPUT serial AS DoneToSerial
The only tricky part is that I have to enable both match_type = WILDCARD(DoneBySerial), WILDCARD(DoneToSerial)
and also case_sensitive_match = false
for it all to work.
I can enhance the Work Order dataset with serial numbers, drop those with no lookup matches, pull back in the Events dataset and use stats
to merge on the *serial
fields (after some normalizing) to get my final mashup.
thanks woodcock. this is very complicated and you answered with breeze
So... in what way would a wildcarded lookup for *value*
on the Notes
field not work?
Sample data would really help illustrate this.
Actually after more thought, it can be made to work. To answer your question (pre-theough):
Because there is no way to map a match in the one dataset back to a particular event. In other words, what will the OUTPUT field of the lookup field be that will be unique to the original dataset? Because there is nothing unique, it cannot be done. That is UNLESS we manufacture one! I realized that I can use |streamstats count AS serial
will manufacture a unique key in the event dataset that can be used as the OUTPUT field of the lookup. I am working on this now.
I also had a breakthrough on the other way (using map
) and will probably have both solutions done at about the same time and can do side-by-side comparison. I know the lookup
solution will beat the map
solution but I am curious how severely.
How big are the notes fields? Would it be possible/accurate and not too expensive to break each word in that field into an event, eg split the notes field by space and mvexpand to put them in each in their own event? And then join the windows event data to that subsearch on the doneby doneto fields?
No, Notes
and Description
are up to 4000 characters each.
The other thought i had was whether it would help at all to make the lookup from the windows event data, since you could wildcard doneby and doneto, and then that could be used to match against the notes field. Not sure if that would be at all helpful/relevant thought
I'm distilling "I have a field Notes
that may contain a field DoneTo
inside its value, and want to add a lookup to that DoneTo
field"?
If so, extract the DoneTo
field from the Notes
field and do the lookup in the usual manner.
If not, do elaborate - ideally with sample data.
s/may contain a field/may contain the value of field/