Getting Data In

Split already indexed data into new events?

TonyLeeVT
Builder

Does anyone know of a way to create new events from already indexed data? Here is my issue:

1) I am monitoring a directory where random files with random file names are deposited for parsing
2) I need to index the data to figure out the sourcetype and set it
3) Once already indexed (used to determine the sourcetype), I cannot seem to split the event anymore (using line_breaker or anything else)

It would be ideal if I could split the log file into separate events after I index it to determine sourcetype.

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

I would recommend you address the randomness of files, filenames and directories and make your life a heck of a lot easier.
What somesoni2 states above is correct, but depending on how you determine the sourcetype, you may be able to do it at index time using props/transforms mechanisms available in Splunk (take a look here).

I would either try to separate sourcetypes out by directories or filename conventions (maybe preprocessing the log files outside of Splunk and moving them to a better directory structure) or try the sourcetype override approach.

There are other approaches, for example what's discussed here, but you are making your life harder at search time.

0 Karma

TonyLeeVT
Builder

Trust me I would... I am limited by another tool. I run the tool and it creates XML files with random file names all in one directory. I am stuck with what I have unfortunately. Unless I want to do some pre-processing with an external script. Painful.

My current approach is inputs.conf (monitor) --> props.conf (line_break intelligently and send to transforms) --> transforms.conf (determine sourcetype based on content) --> props.conf (extract fields, but no more line_breaking occurs).

I think I am stuck at the mercy of search time manipulation. I wish Splunk would allow users to create new events from existing indexed data/events.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I find processing XML using Splunk tools to be painful and prefer to let a Python script do the parsing for me. I have the script convert the XML into key=value form which Splunk processes effortlessly. As a bonus, the resulting data is usually less verbose than XML so we use less of our license.

---
If this reply helps you, Karma would be appreciated.
0 Karma

TonyLeeVT
Builder

Thanks for the recommendation Rich. That's just it though. XML in Splunk should not be painful and should not require external python to convert to kv pair. Some of the tools contained within the xmlutils app should already be native within Splunk if they ever hope to make ground on improving XML parsing. The ability to generate new events from already indexed data would also help XML parsing. 😐

Thanks to everyone who commented on this thread, but it should remain open until already indexed data can be split and indexed as new events.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You may be able to use the API to extract events, split them, then submit the new events for indexing, but then you'd be paying to index the same data twice.

---
If this reply helps you, Karma would be appreciated.
0 Karma

somesoni2
Revered Legend

The data once ingested can't be altered. You would be able to change the format/update the content at search-time (indexed data will remain same). If you could provide some sample indexed events and show what you want to do with that data, we can suggest some search-time options.

0 Karma

TonyLeeVT
Builder

That is what I was afraid of... The data is XML array data which does not always parse (and more so display) nicely within Splunk. Some sanitized data is seen below. I am using the xmlutils app to separate the XML arrays into separate events the best I can. ex: xmlsplit field="networkInfo" | xmlkvrecursive fatten=true

<networkArray><networkInfo><adapter>{-7C00-4AD3-8950-B7C134CF12C4}</adapter>
<description>Check Point Virtual Network Adapter For Endpoint VPN Client</description>
<MAC>d0-53-0d</MAC>
<ipArray><ipInfo><ipv6Address>fe64:1d71</ipv6Address>
</ipInfo>
</ipArray>
<ipGatewayArray/>
<dhcpServerArray/>
<dhcpLeaseExpires>1970-01-01T00:00:00Z</dhcpLeaseExpires>
<dhcpLeaseObtained>1970-01-01T00:00:00Z</dhcpLeaseObtained>
</networkInfo>
<networkInfo><adapter>{4AC2-8217-AB804EE1B4CA}</adapter>
<description>Intel(R) PRO/1000 MT Network Connection</description>
<MAC>-0f-8f</MAC>
<ipArray><ipInfo><ipv6Address>80f:ab39</ipv6Address>
</ipInfo>
<ipInfo><ipAddress>152.144</ipAddress>
<subnetMask>255.0</subnetMask>
</ipInfo>
</ipArray>
<ipGatewayArray><ipGateway>152.2</ipGateway>
</ipGatewayArray>
<dhcpServerArray><dhcpServer>152.254</dhcpServer>
</dhcpServerArray>
<dhcpLeaseExpires>2016-02-04T19:42:30Z</dhcpLeaseExpires>
<dhcpLeaseObtained>2016-02-04T19:12:30Z</dhcpLeaseObtained>
</networkInfo>

Is there a better way?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...