Splunk Search

(Field) Extracting lines from a single file based on the leading word

pvdijssel
Engager

Hi,

I have a device generating CDR's. Within this CDR file there are multiple type of CDR's. Each type start with: START, ATTEMPT, STOP,. Because each type has a different length and populates different fields I can't work with 1 single field extraction within the transforms.conf.

This is how I have it currently set-up:
Splunk Forwarder (inputs.conf):
[batch://opt/cdrs]
move_policy = sinkhole
index = cdrs
sourcetype = cdrs

Splunk Indexer (inputs.conf:
[cdrs]
SHOULD_LINEMERGE=false
KV_MODE = none
REPORT-fields = cdrs_extractions

Splunk Searcher (props.conf):
[cdrs]
SHOULD_LINEMERGE=false
KV_MODE = none
REPORT-fields = cdrs_extractions

Splunk Searcher (transforms.conf):
[cdrs_extractions]
DELIMS = ","
FIELDS = "Type", etc....

Any idea how to solve this within the inputs.conf, props.conf or transforms.conf. I could also use a shell script to break the files into 3 seperate files, but I like to keep it all in Splunk.

0 Karma

pvdijssel
Engager

OK, I haven't got the time untill last week to work on this issue. Some things have changed though:

[machine that generate logs/cdrs] --> [dumps files to Splunk server into /tmp/cdrs/ using SSH]:

[root@Splunk local]# ll /tmp/cdrs/
total 64
-rw-r--r--. 1 root root  3052 May  9 09:38 cdr.20160509093852.1000016.ACT
-rw-r--r--. 1 root root 11681 May  9 09:54 cdr.20160509095452.1000017.ACT

This is what my props.conf look like:

[root@Splunk-IX local]# cat props.conf
[cdrs]
SHOULD_LINEMERGE=false
KV_MODE = none
REPORT-cdrs = start_fields

Which will push it to transforms.conf which currently only regexxes the CDR's that start with 'START':

[root@Splunk local]# cat transforms.conf
[start_fields]
REGEX = (?=[^S]*(?:START|S.*START))^\w+,(?P<GatewayName>[^,]+),(?P<AccountingID>\d+\w+),(?P<StartTimeSystemTicks>[^,]+),(?P<NodeTimeZone>[^,]+),(?P<StartTimeMMDDYYYY>[^,]+),(?P<StartTimeHHMMSSs>[^,]+),(?P<TicksfromSetupMsgtoPolicyRespons>[^,]+),(?P<TicksfromSetupMsgtoAlertProcProg>[^,]+),(?P<TicksfromSetupMsgtoServiceEst>[^,]+),(?P<ServiceDelivered>[^,]+),(?P<CallDirection>[^,]+),(?P<ServiceProvider>[^,]+),(?P<TransitNetworkSelectionCode>[^,]+),(?P<CallingNumber>[^,]+),(?P<CalledNumber>[^,]+),(?P<ExtraCalledAddressDigits>[^,]+),(?P<NumberofCalledNumTranslation>[^,]+),(?P<CalledNumberBeforeTranslation1>[^,]+),(?P<TranslationType1>[^,]+),(?P<CalledNumberBeforeTranslation2>[^,]+),(?P<TranslationType2>[^,]+),(?P<BillingNumber>[^,]+),(?P<RouteLabel>[^,]+),(?P<RouteAttemptNumber>[^,]+),(?P<RouteSelected>[^,]+),(?P<EgressLocalSignalingIPAddr>[^,]+),(?P<EgressRemoteSignalingIPAddr>[^,]+),(?P<IngressTrunkGroupName>[^,]+),(?P<IngressPSTNCircuitEndPoint>[^,]+),(?P<IngressIPCircuitEndPoint>[^,]+),(?P<EgressPSTNCircuitEndPoint>[^,]+),(?P<EgressIPCircuitEndPoint>[^,]+),(?P<OriginatingLineInformation>[^,]+),(?P<JurisdictionInfParameter>[^,]+),(?P<CarrierCode>[^,]+),(?P<CallGroupID>[^,]+),(?P<TicksfromSetupMsgtoRxofEXM>[^,]+),(?P<TicksfromSetupMsgtoGenofEXM>[^,]+),(?P<CallingPartyNatureofAddress>[^,]+),(?P<CalledPartyNatureofAddress>[^,]+),(?P<IngressProtVariantSpecificData>[^,]+),(?P<I_ProtocolVariant>[^,]+),(?P<I_CallID>[^,]+),(?P<I_FromField>[^,]+),(?P<I_ToField>[^,]+),(?P<I_RedirectAttemptCount>[^,]+),(?P<I_Reserved>[^,]+),(?P<I_DisplaynameofSIPURIPAIhdr>[^,]+),(?P<I_UserfPKCallForwardingLasthdr>[^,]+),(?P<I_UserHostnameofSIPRequestURIhdr>[^,]+),(?P<I_UserHostnameofSIPURIPAIhdr>[^,]+),(?P<I_UsernameparameterProxyAuthhdr>[^,]+),(?P<I_DisplaynameofTelURIPAIhdr>[^,]+),(?P<I_INVITEContacthdr>[^,]+),(?P<I_200OKINVITEContacthdr>[^,]+),(?P<I_RedirectingReasonPKCallFwdOrig>[^,]+),(?P<I_UserinfoofTelURIPAIhdr>[^,]+),(?P<I_ContractorNumberPSigInfohdr>[^,]+),(?P<I_ACKReceivedfor200OK>[^,]+),(?P<I_StatusMsgforCallRelease>[^,]+),(?P<I_ReasonhdrvalueQ850>[^,]+),(?P<I_NAPTStatusSIPSGforSignaling>[^,]+),(?P<I_NAPTStatusSIPSGforMedia>[^,]+),(?P<I_OriginalPeerSDPAddressforNAPT>[^,]+),(?P<I_UUISendingCount>[^,]+),(?P<I_UUIReceivingCount>[^,]+),(?P<I_ServiceInformation>[^,]+),(?P<I_ICID>[^,]+),(?P<I_GeneratedHost>[^,]+),(?P<I_OriginatingIOI>[^,]+),(?P<I_TerminatingIOI>[^,]+),(?P<I_PKAdnhdrNumber>[^,]+),(?P<I_IPAddressforFQDNcalls>[^,]+),(?P<I_TransportProtocol>[^,]+),(?P<I_DirectMediaCall>[^,]+),(?P<I_InboundSMMIndicator>[^,]+),(?P<I_OutboundSMMIndicator>[^,]+),(?P<I_OriginatingChargeArea>[^,]+),(?P<I_TerminatingChargeArea>[^,]+),(?P<I_FeatureTaginContacthdr>[^,]+),(?P<I_FeatureTaginAcceptContacthdr>[^,]+),(?P<I_PChargingFunctionAddress>[^,]+),(?P<I_PCalledPartyId>[^,]+),(?P<I_PVisitedNetworkId>[^,]+),(?P<I_DirectMediawithNAPTCall>[^,]+),(?P<I_IngressSMMProfileName>[^,]+),(?P<I_EgressSMMProfileName>[^,]+),(?P<IngressSignalingType>[^,]+),(?P<EgressSignalingType>[^,]+),(?P<IngressFarEndSwitchType>[^,]+),(?P<EgressFarEndSwitchType>[^,]+),(?P<CarrierCodewhoOwnsiTGFarEnd>[^,]+),(?P<CarrierCodewhoOwnseTGFarEnd>[^,]+),(?P<CallingPartyCategory>[^,]+),(?P<DialedNumber>[^,]+),(?P<CarrierSelectionInformation>[^,]+),(?P<CalledNumberNumberingPlan>[^,]+),(?P<GenericAddressParameter>[^,]+),(?P<EgressTrunkGroupName>[^,]+),(?P<EgressProtocolVariant>[^,]+),(?P<E_ProtocolVariant>[^,]+),(?P<E_CallID>[^,]+),(?P<E_FromField>[^,]+),(?P<E_ToField>[^,]+),(?P<E_RedirectAttemptCount>[^,]+),(?P<E_Reserved>[^,]+),(?P<E_DisplaynameofSIPURIPAIhdr>[^,]+),(?P<E_UserPrmofPKCallFwdLasthdr>[^,]+),(?P<E_UserHostnameSIPReqURIhdr>[^,]+),(?P<E_UserHostnameofSIPURIPAIhdr>[^,]+),(?P<E_UsernameprmofProxyAuthhdr>[^,]+),(?P<E_DisplaynameofTelURIPAIhdr>[^,]+),(?P<E_INVITEContacthdr>[^,]+),(?P<E_200OKINVITEContacthdr>[^,]+),(?P<E_RedirectingReasonPKCallFwdOrig>[^,]+),(?P<E_UserinfoofTelURIPAIhdr>[^,]+),(?P<E_ContractorNumberPSigInfohdr>[^,]+),(?P<E_ACKReceivedfor200OK>[^,]+),(?P<E_StatusMsgforCallRelease>[^,]+),(?P<E_ReasonhdrvalueQ850>[^,]+),(?P<E_NAPTStatusoftheSIPSGforSig>[^,]+),(?P<E_NAPTStatusoftheSIPSGforMedia>[^,]+),(?P<E_OriginalPeerSDPAddressforNAPT>[^,]+),(?P<E_UUISendingCount>[^,]+),(?P<E_UUIReceivingCount>[^,]+),(?P<E_ServiceInformation>[^,]+),(?P<E_ICID>[^,]+),(?P<E_GeneratedHost>[^,]+),(?P<E_OriginatingIOI>[^,]+),(?P<E_TerminatingIOI>[^,]+),(?P<E_PKAdnhdrNumber>[^,]+),(?P<E_IPAddressforFQDNcalls>[^,]+),(?P<E_TransportProtocol>[^,]+),(?P<E_DirectMediaCall>[^,]+),(?P<E_InboundSMMIndicator>[^,]+),(?P<E_OutboundSMMIndicator>[^,]+),(?P<E_OriginatingChargeArea>[^,]+),(?P<E_TerminatingChargeArea>[^,]+),(?P<E_FeatureTaginContactHdr>[^,]+),(?P<E_FeatureTaginAcceptContactHdr>[^,]+),(?P<E_PChargingFunctionAddress>[^,]+),(?P<E_PCalledPartyId>[^,]+),(?P<E_PVisitedNetworkId>[^,]+),(?P<E_DirectMediawithNAPTCall>[^,]+),(?P<E_IngressSMMProfileName>[^,]+),(?P<E_EgressSMMProfileName>[^,]+),(?P<IncomingCallingNumber>[^,]+),(?P<AMACallType>[^,]+),(?P<MessageBillingIndicatorMBI>[^,]+),(?P<LATA>[^,]+),(?P<RouteIndexUsed>[^,]+),(?P<CallingPartyPresentationRestric>[^,]+),(?P<IncomingISUPChargeNumber>[^,]+),(?P<IncomingISUPNatureOfAddress>[^,]+),(?P<DialedNumberNatureofAddress>[^,]+),(?P<GlobalCallIDGCID>[^,]+),(?P<ChargeFlag>[^,]+),(?P<AMAslpID>[^,]+),(?P<AMABAFModule>[^,]+),(?P<AMASetHexABIndication>[^,]+),(?P<ServiceFeatureID>[^,]+),(?P<FEParameter>[^,]+),(?P<SatelliteIndicator>[^,]+),(?P<PSXBillingInfo>[^,]+),(?P<OriginatingTDMTrunkGroupType>[^,]+),(?P<TerminatingTDMTrunkGroupType>[^,]+),(?P<IngressTrunkMemberNumber>[^,]+),(?P<EgressTrunkGroupID>[^,]+),(?P<EgressSwitchID>[^,]+),(?P<IngressLocalATMAddress>[^,]+),(?P<IngressRemoteATMAddress>[^,]+),(?P<EgressLocalATMAddress>[^,]+),(?P<EgressRemoteATMAddress>[^,]+),(?P<PSXCallType>[^,]+),(?P<OutgoingRouteTrunkGroupID>[^,]+),(?P<OutgoingRouteMessageID>[^,]+),(?P<IncomingRouteID>[^,]+),(?P<CallingName>[^,]+),(?P<CallingNameType>[^,]+),(?P<IncomingCallingPartyNumberingPln>[^,]+),(?P<OutgoingCallingPartyNumberingPln>[^,]+),(?P<CallingPartyBusinessGroupID>[^,]+),(?P<CalledPartyBusinessGroupID>[^,]+),(?P<CallingPartyPPDN>[^,]+),(?P<TicksfromSetupMsgtoLastRouteAtt>[^,]+),(?P<BillingNumberNatureofAddress>[^,]+),(?P<IncomingCallingNmbrNatureofAddr>[^,]+),(?P<EgressTrunkMemberNumber>[^,]+),(?P<SelectedRouteType>[^,]+),(?P<CumulativeRouteIndex>[^,]+),(?P<ISDNPRICallingPartySubaddress>[^,]+),(?P<OutgoingTrunkGroupNumberinEXM>[^,]+),(?P<IngressLocalSignalingIPAddress>[^,]+),(?P<IngressRemoteSignalingIPAddress>[^,]+),(?P<RecordSequenceNumber>[^,]+),(?P<TransmissionMediumRequirement>[^,]+),(?P<InformationTransferRate>[^,]+),(?P<USIUserInfoLayer1>[^,]+),(?P<UnrecogRawISUPCallingPartyCat>[^,]+),(?P<FSDEgressReleaseLinkTrunking>[^,]+),(?P<FSDTwoBChannelTransfer>[^,]+),(?P<CallingPartyBusinessUnit>[^,]+),(?P<CalledPartyBusinessUnit>[^,]+),(?P<FSDRedirecting>[^,]+),(?P<FSDIngressReleaseLinkTrunking>[^,]+),(?P<PSXID>[^,]+),(?P<PSXCongestionLevel>[^,]+),(?P<PSXProcessingTimemilliseconds>[^,]+),(?P<ScriptName>[^,]+),(?P<IngressExternalAccountingData>[^,]+),(?P<EgressExternalAccountingData>[^,]+),(?P<AnswerSupervisionType>[^,]+),(?P<IngressSipReferorSipReplacesFeat>[^,]+),(?P<EgressSipReferorSipReplacesFeat>[^,]+),(?P<NetworkTransfersFeatSpecificData>[^,]+),(?P<CallCondition>[^,]+),(?P<TollIndicator>[^,]+),(?P<GenericNumber>[^,]+),(?P<GenericNumberPresResIndicator>[^,]+),(?P<GenericNumberNumberingPlan>[^,]+),(?P<GenericNumberNatureofAddress>[^,]+),(?P<GenericNumberType>[^,]+),(?P<OriginatingTrunkType>[^,]+),(?P<TerminatingTrunkType>[^,]+),(?P<VPNCallingPublicPresenceNumber>[^,]+),(?P<VPNCallingPrivatePresenceNumber>[^,]+),(?P<ExternalFurnishChargingInfo>[^,]+),(?P<AnnouncementId>[^,]+),(?P<NetworkDataSourceInformation>[^,]+),(?P<NetworkDataPartitionID>[^,]+),(?P<NetworkDataNetworkID>[^,]+),(?P<NetworkDataNCOS>[^,]+),(?P<ISDNaccessIndicator>[^,]+),(?P<NetworkCallReferenceCallIdentity>[^,]+),(?P<NetworkCallRefSigPointCode>[^,]+),(?P<IngressMIMEProtSpecificData>[^,]+),(?P<EgressMIMEProtSpecificData>[^,]+),(?P<VideoDataBndwCallDurIPEndpoint>[^,]+),(?P<SVSCustomer>[^,]+),(?P<SVSVendorDeprecatedin722>[^,]+),(?P<RemoteGSXBillingIndicator>[^,]+),(?P<CallToTestPSX>[^,]+),(?P<PSXOverlapRouteRequests>[^,]+),(?P<CallSetupDelay>[^,]+),(?P<RequestLatencymsec>[^,]+),(?P<DownstreamLatencymsec>[^,]+),(?P<ResponseLatencymsec>[^,]+),(?P<UpstreamLatencymsec>[^,]+),(?P<OverloadStatus>[^,]+),(?P<reserved251>[^,]+),(?P<reserved252>[^,]+),(?P<MLPPPrecedenceLevel>[^,]+),(?P<reserved254>[^,]+),(?P<reserved255>[^,]+),(?P<reserved256>[^,]+),(?P<reserved257>[^,]+),(?P<reserved258>[^,]+),(?P<reserved259>[^,]+),(?P<reserved260>[^,]+),(?P<reserved261>[^,]+),(?P<GlobalChargeReference>[^,]+)

This only works if I limit the regex to a few fields. But not for all (262) fields.

Is there an easier way to extract the fields that start with 'START' or work with a simple file delimiter instead of the cumbersome regex? Is there a way to troubleshoot why the regex in transforms.conf doesn't work?

Thanks in advance, Paul

0 Karma

woodcock
Esteemed Legend

I have done exactly this (with better RegEx) with up to 172 fields and it works flawlessly. I did have to go through MANY iterations of the RegEx doing a search for ... | where isnull(MyLastField) over and over and over until all events had MyLastField.

pvdijssel
Engager

found the error in my regex. I replaced the + with * so the regex whould work even when the fields where empty.

Thanks Woodcock for the support!

0 Karma

woodcock
Esteemed Legend

OK, please click "Accept" on an answer to close the question.

0 Karma

woodcock
Esteemed Legend

Assuming that the first field of your CDR indicates type (values={"START", "ATTEMPT", "STOP"}), you can do it like this (adjust accordingly if your order/format is different):

Your inputs.conf on your Forwarders is fine but delete the one on your indexers.

props.conf (Splunk Search Head):

[cdrs]
SHOULD_LINEMERGE=false
KV_MODE = none
REPORT-fields = start_fields, attempt_fields, stop_fields

transforms.conf (Splunk Search Head):

[start_fields]
REGEX = ^(START)([^,]*), ...
FORMAT = CDR_TYPE::$1 SecondField::$2 ...

[attempt_fields]
REGEX = ^(ATTEMPT)([^,]*), ...
FORMAT = CDR_TYPE::$1 SecondField::$2 ...

[stop_fields]
REGEX = ^(STOP)([^,]*), ...
FORMAT = CDR_TYPE::$1 SecondField::$2 ...
0 Karma

pvdijssel
Engager

Here's some sample cdr's:

START,sbx41,0x0001448A0000000C,260726403,GMT+05:30-Calcutta,08/20/2013,11:06:38.7,0,8,10,VoIP,IP-TO-IP,DEFAULT,,,9988002222,,0,
,0,,0,,PER6187_EGRL_1,1,sbx41:PER6187_TG_AS_1,10.54.9.179,10.54.80.7,PER6187_TG_IAD,,10.54.8.179:1046/10.54.80.7:9078,,10.54.9.
179:1046/10.54.80.7:7046,0,,,0x00800002,,,,2,"SIP,1-13231@10.54.80.7,%22sipp%22;tag=13231SIPpTag001,%22sut%22
;tag=gK00800467,0,,,,sip:9988002222@10.54.8.179:5060,,,,sip:sipp@10.54.80.7:7080,sip:9988002222@10.
54.8.179:5060,,,,,,,0,0,,0,0,,,,,,,,1,0,0,1,,,,,,,,,SMM_ING_IN,SMM_ING_OUT",12,12
STOP,sbx41,0x0001448A0000000C,260726403,GMT+05:30-Calcutta,08/20/2013,11:06:38.7,0,8,10,08/20/2013,11:07:08.8,1,3001,16,VoIP,
IP-TO-IP,DEFAULT,,,9988002222,,0,,0,,0,,PER6187_EGRL_1,1,sbx41:PER6187_TG_AS_1,10.54.9.179,10.54.80.7,PER6187_TG_IAD,,10.54.8.
179:1046/10.54.80.7:9078,,10.54.9.179:1046/10.54.80.7:7046,,,,,0,,,0x00800002,,,,,2,"SIP,1-13231@10.54.80.7,%22sipp%22;tag=13231SIPpTag001,%22sut%22;tag=gK00800467,0,,,,sip:9988002222@10.54.8.179:
5060,,,,sip:sipp@10.54.80.7:7080,sip:9988002222@10.54.8.179:5060,,,,1,BYE,,0,0,,0,0,,,,,,,,1,0,0,1,,,,,,,,,SMM_ING_IN,SMM_ING_OUT",
12,12,0,5,,,0x0a,9988002222,1,1,,1,0,0,0,PER6187_TG_AS_1,"SIP,3_125032777@10.54.9.179,%22sipp%22;
tag=gK000005f7,;tag=13230,0,,,,sip:+19988002222@10.54.80.7:8090;user=phone,,,,
sip:sipp@10.54.9.179:5060,sip:10.54.80.7:8090;transport=UDP,,,,,BYE

One file can hold multiple START, ATTEMPT and STOP cdr's. And appear at random.

@ Woodcock: isn't it easier to just search for the START, ATTEMPT or STOP field, and leave the rest up to DELIMS and FIELDS to extract? I've tried to get the following to work in transforms.conf, but it didn't work (splunk doc wasn't to clear on this item imho)

[start_fields]
REGEX = ^(START)([^,]*)
FORMAT = CDR_TYPE::$1
DELIMS = ","
FIELDS = "Gateway_Name", "Accounting_ID", "Start_Time_in_System_Ticks", "Node_Time_Zone",
0 Karma

woodcock
Esteemed Legend

You are mixing 2 different ways of doing it: you can either do REGEX/FORMAT or DELIMS/FIELDS. We need to use the former because we need to discriminate between the 3 types of CDRs. So "No": "easier" has nothing to do with it; it is impossible.

0 Karma

pvdijssel
Engager

Agree, this config is incorrect. That's why I replied yesterday starting with a blank sheet. And I'm guessing your response is directed to the post from October 15th (?)

Please discard all previous posts, and have a look at the comments I made yesterday.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share some sample data.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...