Splunk Search

When to use Hunk - when to use Splunk

pinVie
Path Finder

Hello all,

I am currently struggling a bit with understanding the difference between Splunk and Hunk, and hope the someone can explain it to me.
I already read quite some documents about Hunk and know that Hunk uses Hadoop as a "replacement" for Indexers/Forwarders. So the technical Layers should be clear.

In my understanding Hunk is an easier possibility to make use of huge amount of Data stored in Hadoop. What I am missing are the use-cases in which I would use Hunk.

Of course I would use Hunk if I already have a lot of data stored in Hadoop - but would I use Hunk for a completely new "Big Data" environment or would I directly start with Splunk?

What I understood so far is that Hunk+Hadoop fits better for long running analysis or reports covering huge amounts of data. What would be a classical Use-Case best implemented in Hunk ? I thought about something like "all Users who created a connection to an IP that was never before seen in my network" but I see no reason why not doing this in Splunk.

Thank you very much !

0 Karma
1 Solution

kschon_splunk
Splunk Employee
Splunk Employee

Hunk has two main advantages with regard to Splunk Enterprise. One is that it is often the case that it takes much less disk space to store a given dataset in Hadoop than it does to store the same data in Splunk Enterprise. For an example, see this blog post:
http://blogs.splunk.com/2015/09/23/hunk-size-matters/

This means that you need less total hardware, and so your costs are likely lower. Of course, your total cost of operation will depend on a number of factors, including whether your organization has existing Hadoop expertise.

The other advantage is that Hunk allows you to keep data in Hadoop in a large number of formats, and still run performant queries and analyses on it. Unlike Splunk Enterprise, there is no intake process—the data stays where you put it. If you use other software systems that analyze data in Hadoop, this means you don’t need to keep two copies of the data around—Hunk and your other systems can have the same backing store. If you are starting a new Big Data environment in which you only want to use Splunk products, then this advantage does not matter to you. If you will eventually want to use Hive, Pig, Mahout, etc. on the same data, then this is convenient.

Splunk Enterprise has some significant advantages as well.
• Some searches, particularly searches that return few events, can run much faster.
• For long running searches, the latency (i.e. the time until you get back your first results) is generally much better.
• Splunk Enterprise can do real-time searches, which Hunk cannot.
• Splunk Enterprise and Splunk Forwarders can be configured to ingest most data sources without additional software. If your data is not already in Hadoop and you don’t want to put it in Splunk (non-virtual) indexes and use archiving, you’ll need to find a mechanism to continuously move it to Hadoop.

As for the question of vendor lock-in, there are a number of ways to retrieve your original data from Splunk enterprise. Here is an overview:
http://docs.splunk.com/Documentation/Splunk/6.2.2/Search/Exportsearchresults

It is true that data being actively managed by Splunk is in a proprietary format, but as you can see there are many options for exporting it.

I hope this helps.

View solution in original post

kschon_splunk
Splunk Employee
Splunk Employee

Hunk has two main advantages with regard to Splunk Enterprise. One is that it is often the case that it takes much less disk space to store a given dataset in Hadoop than it does to store the same data in Splunk Enterprise. For an example, see this blog post:
http://blogs.splunk.com/2015/09/23/hunk-size-matters/

This means that you need less total hardware, and so your costs are likely lower. Of course, your total cost of operation will depend on a number of factors, including whether your organization has existing Hadoop expertise.

The other advantage is that Hunk allows you to keep data in Hadoop in a large number of formats, and still run performant queries and analyses on it. Unlike Splunk Enterprise, there is no intake process—the data stays where you put it. If you use other software systems that analyze data in Hadoop, this means you don’t need to keep two copies of the data around—Hunk and your other systems can have the same backing store. If you are starting a new Big Data environment in which you only want to use Splunk products, then this advantage does not matter to you. If you will eventually want to use Hive, Pig, Mahout, etc. on the same data, then this is convenient.

Splunk Enterprise has some significant advantages as well.
• Some searches, particularly searches that return few events, can run much faster.
• For long running searches, the latency (i.e. the time until you get back your first results) is generally much better.
• Splunk Enterprise can do real-time searches, which Hunk cannot.
• Splunk Enterprise and Splunk Forwarders can be configured to ingest most data sources without additional software. If your data is not already in Hadoop and you don’t want to put it in Splunk (non-virtual) indexes and use archiving, you’ll need to find a mechanism to continuously move it to Hadoop.

As for the question of vendor lock-in, there are a number of ways to retrieve your original data from Splunk enterprise. Here is an overview:
http://docs.splunk.com/Documentation/Splunk/6.2.2/Search/Exportsearchresults

It is true that data being actively managed by Splunk is in a proprietary format, but as you can see there are many options for exporting it.

I hope this helps.

jaredlaney
Contributor

@pinVie - IMHO, I think you're asking the right question. Here is my take on the pros/cons:

Hunk
Pros:
1. Easy to use with pre-existing Hadoop clusters
2. No forwarder or indexer setup required
Cons:
3. Has inherent delay in search results that comes with traditional map reduce jobs
4. You have to manage a Hadoop cluster unless you use something like Amazon (Even there you'll have to manage it)

Splunk
Pros:
1. Speed (We did testing of the same scenario. In our scenario, with the same computer power, Splunk won)
2. Splunk Manages many of the details for you
3. Only need to manage one stack and not an outside cloud stack
4. Heard that Splunk is map/reduce written in c++ with python front end and it will be hard to beat the implementation
5. More mature product even though the Hunk team is very responsive

Cons:
1. Configuration will be more complex on the Splunk side. You will probably need an indexer cluster and forwarders to get your data into splunk.
2. Vendor Lock in. I've had customers say that if they want to do other things with the data they cannot manipulate the data once it is in Splunk and they were worried about their vendor lock in

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...