Extracting Fields from Structured HL7 Data

dmbreton · ‎08-07-2014

I am trying to figure out how to extract structured data from an HL7 2.x message

The entire message is wrapped in a hl7 mlp wrapper, <VT><payload><FS><CR>, which I am using in the source type I created to extract individual messages. The grammar of this message is MSH PID PV1 OBR { OBX }. Essentially what this means is that the message will have 4 segments(strings) delimited by a <CR> followed by 1 to n OBX segments each delimited by a <CR>. Each segment represents a different set of information:

MSH => Message Header
PID => Patient Info
PV1 => Patient Visit/Encounter Info
OBR => Observation Request
OBX => Observation/Result

Because the first 4 segments are required and in order I was able to extract all fields using a regex.

Example:

Message(excluding message wrapper)

MSH|^~\&|Sending Application|N|||20140731105559||ORU^R01|47311055594607d|P|2.3||||||8859/1
PID|||MRN19||PV1^19||19000101|M||||||||||CSN19
PV1||I|SNGH GICU||||||||||||||||ECN123456
OBR|||||||20140731105559
OBX|1|ST|<Observation_Identifier>||<Observation_Value>|<Observation_Units>|||||<Observation_Status>|||<Observation_Time>||
OBX|2|NM|Temperature||98.6|Celsius|||||F|||20140731105559||
OBX|3|ST|Heart Rate||60|/min|||||F|||20140731105559||

Regex to extract all fields from the MSH segment

(?m).*MSH\|(?:(?:(?:$)|(?:\n)|(?<encoding_characters>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<date_time_of_message>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<security>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_control_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<processing_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<version_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sequence_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<continuation_pointer>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<accept_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<application_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<country_code>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<character_set>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<principal_language_of_message>[^|\n]*)|(?:\|))(?:\|?))

In the message above message each OBX segment represents a measurement.

OBX 1 => Example with field names
OBX 2 => Temperature Measurement
OBX 3 => Heart Rate Measurement

So for any given message I need to be able to extract each measurement plus the attributes of the measurement, Value, Units, Time, .... and there can be 1 to n instances of the OBX segments or even of the same measurement type at a different time.

The only way I have been able to get this to work so far is to deconstruct the message before injecting it into splunk and generating a new message for each measurement. This is a less than ideal solution and I would prefer to get this to work using splunk.

Any suggestions would be greatly appreciated.

dstuder · ‎09-19-2016

There is now a TA for parsing HL7 that was released subsequent to this question being asked.

https://splunkbase.splunk.com/app/3283/

somesoni2 · ‎08-07-2014

Something like this (just for OBX, assuming there are 15 fields after the keyword OBX)

[YourSourceType]
REPORT-mv_obx = xf-obx

TRANSFORMS.CONF:

[xf-obx]
REGEX = ^OBX\|(?<field1>.*)\|(?<field2>.*)\|(?<field3>.*)\|.....write others...\|(?<field15>.*)\|
MV_ADD = true

somesoni2 · ‎08-07-2014

You can setup multivalue field extraction using transforms.conf.

Reference:
http://answers.splunk.com/answers/112311/multi-value-field-extraction
http://answers.splunk.com/answers/11777/field-extraction-into-multivalue-field

Extracting Fields from Structured HL7 Data

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!