Getting Data In

How to extract data from the raw data of each event before sent to indexer?

lsmkelvin
New Member

Hi all,

I am new to Splunk. I was stuck on how to extract data from the original log before indexing them.

Below is my original log

160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020
160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020

I want to extract some of the information(let say "IP address", "date" and "time") and use Heavy Forwarder to send to indexer for index.

Can anyone please kindly help me to figure it out?

Best regards,

Kelvin

Tags (1)
0 Karma

lsmkelvin
New Member

Yes, you are right, i want to remove some us-use data before forwarding to indexer.

For example:
Original Log
160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020

After forward to indexer:
2013-05-21 15:46:50 /lbHealthMon/index.jsp 0.0020

0 Karma

dwaddle
SplunkTrust
SplunkTrust

If I understand your question, I might suggestion you are thinking too much in a 'relational database' mindset. You do not need to do any preprocessing of the data prior to indexing it in order to associate "160.19.104.25" with the name "IP address", same with date and time.

Splunk will by default produce a full text index of all of the "tokens" from your event. "160.19.104.25" is a token, as are "2013-05-21", "15:46:50", and "GET" (and so on). There is no need to tell it in advance that 160.19.104.25 is the IP address.

When you run a search, Splunk will apply various rules at that time to associate field names with values. These rules can include regular expressions, searching for key=value, or delimeter-based operations.

The date and time are parsed at index time in order to create an epoch time for the event, which is stored in the index. This is key to Splunk's whole time-series data approach.

The net of it is that you can still do searches on stuff like ip_address=160.19.104.25, and Splunk will use the full-text index in combination with your rules for field extraction to find your results. But, it does not require you to define at index time the rules (or schema) for finding these results.


It's also possible I have entirely misinterpreted your question. If so, please elaborate/clarify. 🙂

lsmkelvin
New Member

Anyway, thanks all of you take attention on my question.
^^

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Yeah, I completely misunderstood what you were saying. In typical Splunk lingo the word "extract" is strongly related with the idea of pulling data out of an event and giving it a name. I jumped to the wrong conclusion. Glad you were able to get sedcmd to work.

0 Karma

lsmkelvin
New Member

Some data is not useful or meaningful to Splunk for analysis, however, the un-use data which is useful for other purpose.
In my case, i just want to analysis the every URL's response time with the time stamp. If i index every single line, the costs is expensive.

However, it seem i got the answer with using "sedcmd".
http://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedatausingconfigurationfiles

0 Karma

amiritc
New Member

Hi dear splunker.
if you have solved your problem , could you please help with your solution. I have your problem now

0 Karma

Ayn
Legend

Not that it can't be done, but why would you need to remove data? To save license costs?...

0 Karma

lsmkelvin
New Member

Maybe my question is not clear, anyway, thanks for you reply. Let me try to explain with the example in below.

Original Log before forward:
"160.19.104.25 2013-05-21 15:46:50 160.80.38.178:15010 GET /lbHealthMon/index.jsp HTTP/1.1 200 322 - 5249409c3873c79f:-7d44c4c8:13e78cbed15:-7fc5-000000000002f28c 28 1369122410946 0.0020"

After forward to indexer:
"2013-05-21 15:46:50 /lbHealthMon/index.jsp 0.0020"

I just want to remove the un-used data before indexing.

Thanks so much for your kindly help!

Best regards.

0 Karma

sbrant_splunk
Splunk Employee
Splunk Employee

By extract, do you mean that you want to remove the data prior to indexing?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...