Getting Data In

Keep specific part of a textfile / email and discard the rest

eichfuss
Path Finder

Hi there,

I know the docs and the search function in answers.splunk.com. But I think I sit on the line. Hope someone can get me in the right direction or can help me with my problem.

I want to log emails and with all the header in the mail I just want to index a part of the mail. Here is an example of a similar mail.
I just want the part from "Object: Sensor A" till "Time: 2013-01-27 11:58:23" and push the rest to the Null-Queue.

Thanks a lot
Cheers, Sven

##################################

Content-Type: multipart/alternative; boundary=Apple-Mail-3A77049A-4A01-443F-B1DB-C1AA16C7497D
Content-Transfer-Encoding: 7bit
Subject: blablablablabla
From: Doc Snider blablabla@blablabl.de
Message-Id: 92D35476-1711-4B3451-A4B5-8D14534351E@gmail.com
Date: Mon, 27 Jan 2014 11:30:57 +0100
To: doc@blablabla.de
Mime-Version: 1.0 (1.0)
X-Mailer: iPhone Mail (11A501)

--Apple-Mail-3A77049A-4A01-443F-B1DB-C1AA16C7497D
Content-Type: text/plain;
charset=utf-8
Content-Transfer-Encoding: quoted-printable

Here are the infos

Object: Sensor A
Temperature: 42
Humidity: 32
Time: 2013-01-27 11:58:23

here is more uninteresting text.
blablablablablabla

############################################
Tags (3)
0 Karma
1 Solution

kristian_kolb
Ultra Champion

I guess you could (permanently) remove the unwanted stuff with a sed script, invoked through SEDCMD in props.conf, like so;

props.conf

[your_email_sourcetype]
SEDCMD = s/(?m).*[\r\n](Object:.*[\r\n]Time:\s[\d-]+\s[\d:]+)/\1/g

Just ensure that the events get indexed with the correct timestamp as well - as there seems to be different timestamps in the header and the message. So perhaps you should also add the following to the stanza above;

TIME_FORMAT = %Y-%m-%d %H:%M:%S
TIME_PREFIX = Time:+\s
MAX_TIMESTAMP_LOOKAHEAD = 400

Read more here;

http://docs.splunk.com/Documentation/Splunk/6.0.1/Data/Anonymizedatausingconfigurationfiles

View solution in original post

kristian_kolb
Ultra Champion

I guess you could (permanently) remove the unwanted stuff with a sed script, invoked through SEDCMD in props.conf, like so;

props.conf

[your_email_sourcetype]
SEDCMD = s/(?m).*[\r\n](Object:.*[\r\n]Time:\s[\d-]+\s[\d:]+)/\1/g

Just ensure that the events get indexed with the correct timestamp as well - as there seems to be different timestamps in the header and the message. So perhaps you should also add the following to the stanza above;

TIME_FORMAT = %Y-%m-%d %H:%M:%S
TIME_PREFIX = Time:+\s
MAX_TIMESTAMP_LOOKAHEAD = 400

Read more here;

http://docs.splunk.com/Documentation/Splunk/6.0.1/Data/Anonymizedatausingconfigurationfiles

eichfuss
Path Finder

Thanks a lot Kristian,
that`s the way.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...