Splunk Search

(Field) Extracting lines from a single file based on the leading word

pvdijssel
Engager

Hi,

I have a device generating CDR's. Within this CDR file there are multiple type of CDR's. Each type start with: START, ATTEMPT, STOP,. Because each type has a different length and populates different fields I can't work with 1 single field extraction within the transforms.conf.

This is how I have it currently set-up:
Splunk Forwarder (inputs.conf):
[batch://opt/cdrs]
move_policy = sinkhole
index = cdrs
sourcetype = cdrs

Splunk Indexer (inputs.conf:
[cdrs]
SHOULD_LINEMERGE=false
KV_MODE = none
REPORT-fields = cdrs_extractions

Splunk Searcher (props.conf):
[cdrs]
SHOULD_LINEMERGE=false
KV_MODE = none
REPORT-fields = cdrs_extractions

Splunk Searcher (transforms.conf):
[cdrs_extractions]
DELIMS = ","
FIELDS = "Type", etc....

Any idea how to solve this within the inputs.conf, props.conf or transforms.conf. I could also use a shell script to break the files into 3 seperate files, but I like to keep it all in Splunk.

0 Karma

pvdijssel
Engager

OK, I haven't got the time untill last week to work on this issue. Some things have changed though:

[machine that generate logs/cdrs] --> [dumps files to Splunk server into /tmp/cdrs/ using SSH]:

[root@Splunk local]# ll /tmp/cdrs/
total 64
-rw-r--r--. 1 root root  3052 May  9 09:38 cdr.20160509093852.1000016.ACT
-rw-r--r--. 1 root root 11681 May  9 09:54 cdr.20160509095452.1000017.ACT

This is what my props.conf look like:

[root@Splunk-IX local]# cat props.conf
[cdrs]
SHOULD_LINEMERGE=false
KV_MODE = none
REPORT-cdrs = start_fields

Which will push it to transforms.conf which currently only regexxes the CDR's that start with 'START':

[root@Splunk local]# cat transforms.conf
[start_fields]
REGEX = (?=[^S]*(?:START|S.*START))^\w+,(?P<GatewayName>[^,]+),(?P<AccountingID>\d+\w+),(?P<StartTimeSystemTicks>[^,]+),(?P<NodeTimeZone>[^,]+),(?P<StartTimeMMDDYYYY>[^,]+),(?P<StartTimeHHMMSSs>[^,]+),(?P<TicksfromSetupMsgtoPolicyRespons>[^,]+),(?P<TicksfromSetupMsgtoAlertProcProg>[^,]+),(?P<TicksfromSetupMsgtoServiceEst>[^,]+),(?P<ServiceDelivered>[^,]+),(?P<CallDirection>[^,]+),(?P<ServiceProvider>[^,]+),(?P<TransitNetworkSelectionCode>[^,]+),(?P<CallingNumber>[^,]+),(?P<CalledNumber>[^,]+),(?P<ExtraCalledAddressDigits>[^,]+),(?P<NumberofCalledNumTranslation>[^,]+),(?P<CalledNumberBeforeTranslation1>[^,]+),(?P<TranslationType1>[^,]+),(?P<CalledNumberBeforeTranslation2>[^,]+),(?P<TranslationType2>[^,]+),(?P<BillingNumber>[^,]+),(?P<RouteLabel>[^,]+),(?P<RouteAttemptNumber>[^,]+),(?P<RouteSelected>[^,]+),(?P<EgressLocalSignalingIPAddr>[^,]+),(?P<EgressRemoteSignalingIPAddr>[^,]+),(?P<IngressTrunkGroupName>[^,]+),(?P<IngressPSTNCircuitEndPoint>[^,]+),(?P<IngressIPCircuitEndPoint>[^,]+),(?P<EgressPSTNCircuitEndPoint>[^,]+),(?P<EgressIPCircuitEndPoint>[^,]+),(?P<OriginatingLineInformation>[^,]+),(?P<JurisdictionInfParameter>[^,]+),(?P<CarrierCode>[^,]+),(?P<CallGroupID>[^,]+),(?P<TicksfromSetupMsgtoRxofEXM>[^,]+),(?P<TicksfromSetupMsgtoGenofEXM>[^,]+),(?P<CallingPartyNatureofAddress>[^,]+),(?P<CalledPartyNatureofAddress>[^,]+),(?P<IngressProtVariantSpecificData>[^,]+),(?P<I_ProtocolVariant>[^,]+),(?P<I_CallID>[^,]+),(?P<I_FromField>[^,]+),(?P<I_ToField>[^,]+),(?P<I_RedirectAttemptCount>[^,]+),(?P<I_Reserved>[^,]+),(?P<I_DisplaynameofSIPURIPAIhdr>[^,]+),(?P<I_UserfPKCallForwardingLasthdr>[^,]+),(?P<I_UserHostnameofSIPRequestURIhdr>[^,]+),(?P<I_UserHostnameofSIPURIPAIhdr>[^,]+),(?P<I_UsernameparameterProxyAuthhdr>[^,]+),(?P<I_DisplaynameofTelURIPAIhdr>[^,]+),(?P<I_INVITEContacthdr>[^,]+),(?P<I_200OKINVITEContacthdr>[^,]+),(?P<I_RedirectingReasonPKCallFwdOrig>[^,]+),(?P<I_UserinfoofTelURIPAIhdr>[^,]+),(?P<I_ContractorNumberPSigInfohdr>[^,]+),(?P<I_ACKReceivedfor200OK>[^,]+),(?P<I_StatusMsgforCallRelease>[^,]+),(?P<I_ReasonhdrvalueQ850>[^,]+),(?P<I_NAPTStatusSIPSGforSignaling>[^,]+),(?P<I_NAPTStatusSIPSGforMedia>[^,]+),(?P<I_OriginalPeerSDPAddressforNAPT>[^,]+),(?P<I_UUISendingCount>[^,]+),(?P<I_UUIReceivingCount>[^,]+),(?P<I_ServiceInformation>[^,]+),(?P<I_ICID>[^,]+),(?P<I_GeneratedHost>[^,]+),(?P<I_OriginatingIOI>[^,]+),(?P<I_TerminatingIOI>[^,]+),(?P<I_PKAdnhdrNumber>[^,]+),(?P<I_IPAddressforFQDNcalls>[^,]+),(?P<I_TransportProtocol>[^,]+),(?P<I_DirectMediaCall>[^,]+),(?P<I_InboundSMMIndicator>[^,]+),(?P<I_OutboundSMMIndicator>[^,]+),(?P<I_OriginatingChargeArea>[^,]+),(?P<I_TerminatingChargeArea>[^,]+),(?P<I_FeatureTaginContacthdr>[^,]+),(?P<I_FeatureTaginAcceptContacthdr>[^,]+),(?P<I_PChargingFunctionAddress>[^,]+),(?P<I_PCalledPartyId>[^,]+),(?P<I_PVisitedNetworkId>[^,]+),(?P<I_DirectMediawithNAPTCall>[^,]+),(?P<I_IngressSMMProfileName>[^,]+),(?P<I_EgressSMMProfileName>[^,]+),(?P<IngressSignalingType>[^,]+),(?P<EgressSignalingType>[^,]+),(?P<IngressFarEndSwitchType>[^,]+),(?P<EgressFarEndSwitchType>[^,]+),(?P<CarrierCodewhoOwnsiTGFarEnd>[^,]+),(?P<CarrierCodewhoOwnseTGFarEnd>[^,]+),(?P<CallingPartyCategory>[^,]+),(?P<DialedNumber>[^,]+),(?P<CarrierSelectionInformation>[^,]+),(?P<CalledNumberNumberingPlan>[^,]+),(?P<GenericAddressParameter>[^,]+),(?P<EgressTrunkGroupName>[^,]+),(?P<EgressProtocolVariant>[^,]+),(?P<E_ProtocolVariant>[^,]+),(?P<E_CallID>[^,]+),(?P<E_FromField>[^,]+),(?P<E_ToField>[^,]+),(?P<E_RedirectAttemptCount>[^,]+),(?P<E_Reserved>[^,]+),(?P<E_DisplaynameofSIPURIPAIhdr>[^,]+),(?P<E_UserPrmofPKCallFwdLasthdr>[^,]+),(?P<E_UserHostnameSIPReqURIhdr>[^,]+),(?P<E_UserHostnameofSIPURIPAIhdr>[^,]+),(?P<E_UsernameprmofProxyAuthhdr>[^,]+),(?P<E_DisplaynameofTelURIPAIhdr>[^,]+),(?P<E_INVITEContacthdr>[^,]+),(?P<E_200OKINVITEContacthdr>[^,]+),(?P<E_RedirectingReasonPKCallFwdOrig>[^,]+),(?P<E_UserinfoofTelURIPAIhdr>[^,]+),(?P<E_ContractorNumberPSigInfohdr>[^,]+),(?P<E_ACKReceivedfor200OK>[^,]+),(?P<E_StatusMsgforCallRelease>[^,]+),(?P<E_ReasonhdrvalueQ850>[^,]+),(?P<E_NAPTStatusoftheSIPSGforSig>[^,]+),(?P<E_NAPTStatusoftheSIPSGforMedia>[^,]+),(?P<E_OriginalPeerSDPAddressforNAPT>[^,]+),(?P<E_UUISendingCount>[^,]+),(?P<E_UUIReceivingCount>[^,]+),(?P<E_ServiceInformation>[^,]+),(?P<E_ICID>[^,]+),(?P<E_GeneratedHost>[^,]+),(?P<E_OriginatingIOI>[^,]+),(?P<E_TerminatingIOI>[^,]+),(?P<E_PKAdnhdrNumber>[^,]+),(?P<E_IPAddressforFQDNcalls>[^,]+),(?P<E_TransportProtocol>[^,]+),(?P<E_DirectMediaCall>[^,]+),(?P<E_InboundSMMIndicator>[^,]+),(?P<E_OutboundSMMIndicator>[^,]+),(?P<E_OriginatingChargeArea>[^,]+),(?P<E_TerminatingChargeArea>[^,]+),(?P<E_FeatureTaginContactHdr>[^,]+),(?P<E_FeatureTaginAcceptContactHdr>[^,]+),(?P<E_PChargingFunctionAddress>[^,]+),(?P<E_PCalledPartyId>[^,]+),(?P<E_PVisitedNetworkId>[^,]+),(?P<E_DirectMediawithNAPTCall>[^,]+),(?P<E_IngressSMMProfileName>[^,]+),(?P<E_EgressSMMProfileName>[^,]+),(?P<IncomingCallingNumber>[^,]+),(?P<AMACallType>[^,]+),(?P<MessageBillingIndicatorMBI>[^,]+),(?P<LATA>[^,]+),(?P<RouteIndexUsed>[^,]+),(?P<CallingPartyPresentationRestric>[^,]+),(?P<IncomingISUPChargeNumber>[^,]+),(?P<IncomingISUPNatureOfAddress>[^,]+),(?P<DialedNumberNatureofAddress>[^,]+),(?P<GlobalCallIDGCID>[^,]+),(?P<ChargeFlag>[^,]+),(?P<AMAslpID>[^,]+),(?P<AMABAFModule>[^,]+),(?P<AMASetHexABIndication>[^,]+),(?P<ServiceFeatureID>[^,]+),(?P<FEParameter>[^,]+),(?P<SatelliteIndicator>[^,]+),(?P<PSXBillingInfo>[^,]+),(?P<OriginatingTDMTrunkGroupType>[^,]+),(?P<TerminatingTDMTrunkGroupType>[^,]+),(?P<IngressTrunkMemberNumber>[^,]+),(?P<EgressTrunkGroupID>[^,]+),(?P<EgressSwitchID>[^,]+),(?P<IngressLocalATMAddress>[^,]+),(?P<IngressRemoteATMAddress>[^,]+),(?P<EgressLocalATMAddress>[^,]+),(?P<EgressRemoteATMAddress>[^,]+),(?P<PSXCallType>[^,]+),(?P<OutgoingRouteTrunkGroupID>[^,]+),(?P<OutgoingRouteMessageID>[^,]+),(?P<IncomingRouteID>[^,]+),(?P<CallingName>[^,]+),(?P<CallingNameType>[^,]+),(?P<IncomingCallingPartyNumberingPln>[^,]+),(?P<OutgoingCallingPartyNumberingPln>[^,]+),(?P<CallingPartyBusinessGroupID>[^,]+),(?P<CalledPartyBusinessGroupID>[^,]+),(?P<CallingPartyPPDN>[^,]+),(?P<TicksfromSetupMsgtoLastRouteAtt>[^,]+),(?P<BillingNumberNatureofAddress>[^,]+),(?P<IncomingCallingNmbrNatureofAddr>[^,]+),(?P<EgressTrunkMemberNumber>[^,]+),(?P<SelectedRouteType>[^,]+),(?P<CumulativeRouteIndex>[^,]+),(?P<ISDNPRICallingPartySubaddress>[^,]+),(?P<OutgoingTrunkGroupNumberinEXM>[^,]+),(?P<IngressLocalSignalingIPAddress>[^,]+),(?P<IngressRemoteSignalingIPAddress>[^,]+),(?P<RecordSequenceNumber>[^,]+),(?P<TransmissionMediumRequirement>[^,]+),(?P<InformationTransferRate>[^,]+),(?P<USIUserInfoLayer1>[^,]+),(?P<UnrecogRawISUPCallingPartyCat>[^,]+),(?P<FSDEgressReleaseLinkTrunking>[^,]+),(?P<FSDTwoBChannelTransfer>[^,]+),(?P<CallingPartyBusinessUnit>[^,]+),(?P<CalledPartyBusinessUnit>[^,]+),(?P<FSDRedirecting>[^,]+),(?P<FSDIngressReleaseLinkTrunking>[^,]+),(?P<PSXID>[^,]+),(?P<PSXCongestionLevel>[^,]+),(?P<PSXProcessingTimemilliseconds>[^,]+),(?P<ScriptName>[^,]+),(?P<IngressExternalAccountingData>[^,]+),(?P<EgressExternalAccountingData>[^,]+),(?P<AnswerSupervisionType>[^,]+),(?P<IngressSipReferorSipReplacesFeat>[^,]+),(?P<EgressSipReferorSipReplacesFeat>[^,]+),(?P<NetworkTransfersFeatSpecificData>[^,]+),(?P<CallCondition>[^,]+),(?P<TollIndicator>[^,]+),(?P<GenericNumber>[^,]+),(?P<GenericNumberPresResIndicator>[^,]+),(?P<GenericNumberNumberingPlan>[^,]+),(?P<GenericNumberNatureofAddress>[^,]+),(?P<GenericNumberType>[^,]+),(?P<OriginatingTrunkType>[^,]+),(?P<TerminatingTrunkType>[^,]+),(?P<VPNCallingPublicPresenceNumber>[^,]+),(?P<VPNCallingPrivatePresenceNumber>[^,]+),(?P<ExternalFurnishChargingInfo>[^,]+),(?P<AnnouncementId>[^,]+),(?P<NetworkDataSourceInformation>[^,]+),(?P<NetworkDataPartitionID>[^,]+),(?P<NetworkDataNetworkID>[^,]+),(?P<NetworkDataNCOS>[^,]+),(?P<ISDNaccessIndicator>[^,]+),(?P<NetworkCallReferenceCallIdentity>[^,]+),(?P<NetworkCallRefSigPointCode>[^,]+),(?P<IngressMIMEProtSpecificData>[^,]+),(?P<EgressMIMEProtSpecificData>[^,]+),(?P<VideoDataBndwCallDurIPEndpoint>[^,]+),(?P<SVSCustomer>[^,]+),(?P<SVSVendorDeprecatedin722>[^,]+),(?P<RemoteGSXBillingIndicator>[^,]+),(?P<CallToTestPSX>[^,]+),(?P<PSXOverlapRouteRequests>[^,]+),(?P<CallSetupDelay>[^,]+),(?P<RequestLatencymsec>[^,]+),(?P<DownstreamLatencymsec>[^,]+),(?P<ResponseLatencymsec>[^,]+),(?P<UpstreamLatencymsec>[^,]+),(?P<OverloadStatus>[^,]+),(?P<reserved251>[^,]+),(?P<reserved252>[^,]+),(?P<MLPPPrecedenceLevel>[^,]+),(?P<reserved254>[^,]+),(?P<reserved255>[^,]+),(?P<reserved256>[^,]+),(?P<reserved257>[^,]+),(?P<reserved258>[^,]+),(?P<reserved259>[^,]+),(?P<reserved260>[^,]+),(?P<reserved261>[^,]+),(?P<GlobalChargeReference>[^,]+)

This only works if I limit the regex to a few fields. But not for all (262) fields.

Is there an easier way to extract the fields that start with 'START' or work with a simple file delimiter instead of the cumbersome regex? Is there a way to troubleshoot why the regex in transforms.conf doesn't work?

Thanks in advance, Paul

0 Karma

woodcock
Esteemed Legend

I have done exactly this (with better RegEx) with up to 172 fields and it works flawlessly. I did have to go through MANY iterations of the RegEx doing a search for ... | where isnull(MyLastField) over and over and over until all events had MyLastField.

pvdijssel
Engager

found the error in my regex. I replaced the + with * so the regex whould work even when the fields where empty.

Thanks Woodcock for the support!

0 Karma

woodcock
Esteemed Legend

OK, please click "Accept" on an answer to close the question.

0 Karma

woodcock
Esteemed Legend

Assuming that the first field of your CDR indicates type (values={"START", "ATTEMPT", "STOP"}), you can do it like this (adjust accordingly if your order/format is different):

Your inputs.conf on your Forwarders is fine but delete the one on your indexers.

props.conf (Splunk Search Head):

[cdrs]
SHOULD_LINEMERGE=false
KV_MODE = none
REPORT-fields = start_fields, attempt_fields, stop_fields

transforms.conf (Splunk Search Head):

[start_fields]
REGEX = ^(START)([^,]*), ...
FORMAT = CDR_TYPE::$1 SecondField::$2 ...

[attempt_fields]
REGEX = ^(ATTEMPT)([^,]*), ...
FORMAT = CDR_TYPE::$1 SecondField::$2 ...

[stop_fields]
REGEX = ^(STOP)([^,]*), ...
FORMAT = CDR_TYPE::$1 SecondField::$2 ...
0 Karma

pvdijssel
Engager

Here's some sample cdr's:

START,sbx41,0x0001448A0000000C,260726403,GMT+05:30-Calcutta,08/20/2013,11:06:38.7,0,8,10,VoIP,IP-TO-IP,DEFAULT,,,9988002222,,0,
,0,,0,,PER6187_EGRL_1,1,sbx41:PER6187_TG_AS_1,10.54.9.179,10.54.80.7,PER6187_TG_IAD,,10.54.8.179:1046/10.54.80.7:9078,,10.54.9.
179:1046/10.54.80.7:7046,0,,,0x00800002,,,,2,"SIP,1-13231@10.54.80.7,%22sipp%22;tag=13231SIPpTag001,%22sut%22
;tag=gK00800467,0,,,,sip:9988002222@10.54.8.179:5060,,,,sip:sipp@10.54.80.7:7080,sip:9988002222@10.
54.8.179:5060,,,,,,,0,0,,0,0,,,,,,,,1,0,0,1,,,,,,,,,SMM_ING_IN,SMM_ING_OUT",12,12
STOP,sbx41,0x0001448A0000000C,260726403,GMT+05:30-Calcutta,08/20/2013,11:06:38.7,0,8,10,08/20/2013,11:07:08.8,1,3001,16,VoIP,
IP-TO-IP,DEFAULT,,,9988002222,,0,,0,,0,,PER6187_EGRL_1,1,sbx41:PER6187_TG_AS_1,10.54.9.179,10.54.80.7,PER6187_TG_IAD,,10.54.8.
179:1046/10.54.80.7:9078,,10.54.9.179:1046/10.54.80.7:7046,,,,,0,,,0x00800002,,,,,2,"SIP,1-13231@10.54.80.7,%22sipp%22;tag=13231SIPpTag001,%22sut%22;tag=gK00800467,0,,,,sip:9988002222@10.54.8.179:
5060,,,,sip:sipp@10.54.80.7:7080,sip:9988002222@10.54.8.179:5060,,,,1,BYE,,0,0,,0,0,,,,,,,,1,0,0,1,,,,,,,,,SMM_ING_IN,SMM_ING_OUT",
12,12,0,5,,,0x0a,9988002222,1,1,,1,0,0,0,PER6187_TG_AS_1,"SIP,3_125032777@10.54.9.179,%22sipp%22;
tag=gK000005f7,;tag=13230,0,,,,sip:+19988002222@10.54.80.7:8090;user=phone,,,,
sip:sipp@10.54.9.179:5060,sip:10.54.80.7:8090;transport=UDP,,,,,BYE

One file can hold multiple START, ATTEMPT and STOP cdr's. And appear at random.

@ Woodcock: isn't it easier to just search for the START, ATTEMPT or STOP field, and leave the rest up to DELIMS and FIELDS to extract? I've tried to get the following to work in transforms.conf, but it didn't work (splunk doc wasn't to clear on this item imho)

[start_fields]
REGEX = ^(START)([^,]*)
FORMAT = CDR_TYPE::$1
DELIMS = ","
FIELDS = "Gateway_Name", "Accounting_ID", "Start_Time_in_System_Ticks", "Node_Time_Zone",
0 Karma

woodcock
Esteemed Legend

You are mixing 2 different ways of doing it: you can either do REGEX/FORMAT or DELIMS/FIELDS. We need to use the former because we need to discriminate between the 3 types of CDRs. So "No": "easier" has nothing to do with it; it is impossible.

0 Karma

pvdijssel
Engager

Agree, this config is incorrect. That's why I replied yesterday starting with a blank sheet. And I'm guessing your response is directed to the post from October 15th (?)

Please discard all previous posts, and have a look at the comments I made yesterday.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share some sample data.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...