I have telephony log data containing multiple record types each with their own set of numerically tagged data fields. Example format showing 2 records :-
001 002 110622080011 003 110622080011 004 022306a25710 030 0000000000 005 0000000427712649 006 0000000000300000 054 00030 055 155 052 000 056 02 094 0 137 0 049 0000000000000006 138 06 144 1
005 002 110622075952 003 110622080011 004 012306a2570f 007 23062ec957 005 0417606265 0080000010401148949 009 00008 010 000 049 006 011 00 012 0 013 0 014 0 015 0 024 0 096 0000000000000000 052 52 050 000 145 0 062 000000000000000000 063 0000010401148949
Each record has 3 digit record type followed by
Preferably without explicit coding for each record type, I want to index to key/values pairs eg. F_001_002=110622075952 F_001_003=110622080011 etc
There are about 30 record types each with up to about 30 field types.
Can I do this with transforms.conf ?
If I script it (which is fairly simple), what's the best way of ensuring that each source file (in a RO directory) is indexed only once ?
I'll try the suggestion above but I suspect that the variable number of fields in each record type will not allow a consistent regex match
I don't think there's a good way to get the "record" with the value, but you can do this:
props:
[mysourcetype]
EXTRACT-recordtype = ^(?<rec_type>\d+)
REPORT-vals = vals
transforms:
[vals]
REGEX = (\d{3})\s(\d*)
FORMAT = $1::$2
MV_ADD = true
CLEAN_KEYS = false
This will give you field name-value sets like 002=110622075952, which doesn't include the record type. However, for every record, the associated record type will also be in the field named recordtype, so you should be able to do just about anything, albeit possibly with a slightly more complex eval, e.g., instead of:
... | stats sum(F_001_002)
you might need
... | stats sum(eval(if(recordtype=="001",'002',null()))) as val
Untested, but I believe this will work:
# transforms.conf
[some_stanza]
REGEX = (\d+)\s+(\d+)\s+(\d+)...and so on
FORMAT = F_$1_$2::$3 F_$4_$5::$6 ...and so on
# props.conf
[my_sourcetype]
REPORT-tel_extract = some_stanza