Splunk Search

Extracting Fields from Structured HL7 Data

dmbreton
New Member

I am trying to figure out how to extract structured data from an HL7 2.x message

The entire message is wrapped in a hl7 mlp wrapper, <VT><payload><FS><CR>, which I am using in the source type I created to extract individual messages. The grammar of this message is MSH PID PV1 OBR { OBX }. Essentially what this means is that the message will have 4 segments(strings) delimited by a <CR> followed by 1 to n OBX segments each delimited by a <CR>. Each segment represents a different set of information:

  • MSH => Message Header
  • PID => Patient Info
  • PV1 => Patient Visit/Encounter Info
  • OBR => Observation Request
  • OBX => Observation/Result

Because the first 4 segments are required and in order I was able to extract all fields using a regex.

Example:

Message(excluding message wrapper)

MSH|^~\&|Sending Application|N|||20140731105559||ORU^R01|47311055594607d|P|2.3||||||8859/1
PID|||MRN19||PV1^19||19000101|M||||||||||CSN19
PV1||I|SNGH GICU||||||||||||||||ECN123456
OBR|||||||20140731105559
OBX|1|ST|<Observation_Identifier>||<Observation_Value>|<Observation_Units>|||||<Observation_Status>|||<Observation_Time>||
OBX|2|NM|Temperature||98.6|Celsius|||||F|||20140731105559||
OBX|3|ST|Heart Rate||60|/min|||||F|||20140731105559||

Regex to extract all fields from the MSH segment

(?m).*MSH\|(?:(?:(?:$)|(?:\n)|(?<encoding_characters>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sending_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_application>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<receiving_facility>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<date_time_of_message>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<security>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<message_control_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<processing_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<version_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<sequence_id>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<continuation_pointer>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<accept_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<application_acknowledge_type>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<country_code>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<character_set>[^|\n]*)|(?:\|))(?:\|?))(?:(?:(?:$)|(?:\n)|(?<principal_language_of_message>[^|\n]*)|(?:\|))(?:\|?))

In the message above message each OBX segment represents a measurement.

  • OBX 1 => Example with field names
  • OBX 2 => Temperature Measurement
  • OBX 3 => Heart Rate Measurement

So for any given message I need to be able to extract each measurement plus the attributes of the measurement, Value, Units, Time, .... and there can be 1 to n instances of the OBX segments or even of the same measurement type at a different time.

The only way I have been able to get this to work so far is to deconstruct the message before injecting it into splunk and generating a new message for each measurement. This is a less than ideal solution and I would prefer to get this to work using splunk.

Any suggestions would be greatly appreciated.

0 Karma

dstuder
Communicator

There is now a TA for parsing HL7 that was released subsequent to this question being asked.

https://splunkbase.splunk.com/app/3283/

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Something like this (just for OBX, assuming there are 15 fields after the keyword OBX)

[YourSourceType]
REPORT-mv_obx = xf-obx

TRANSFORMS.CONF:

[xf-obx]
REGEX = ^OBX\|(?<field1>.*)\|(?<field2>.*)\|(?<field3>.*)\|.....write others...\|(?<field15>.*)\|
MV_ADD = true
0 Karma

somesoni2
SplunkTrust
SplunkTrust
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...