Getting Data In

I have a question about raw data and index data in indexer. Please help me to understand.

seksit
Explorer

Hi friend,

I've a server and already install splunk. This server has many log file (tar.gz) that import from another server.

I would like to use splunk monitor this log via directory such as /var/log/2016/01, /var/log/2016/02.

If splunk monitoring the directory, splunk will store the raw data (double raw data) from log file?

Please help me to understand it.

Thank you

sorry for my english

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Seksit-
I think you should understand how Splunk processes and stores files. That should lead to a better understanding of whats going on and how it relates to your use case.

When you 'monitor' a file or directory, irregardless of if the file is manually copied or generated by an app, Splunk will read the files and index them. The indexing process take the 'raw' data and reads it in and performs various operations such as assigning sourcetypes, placing it in a defined index, extracting timestamps and hostnames. Files are written to buckets(files on disk) on the indexers, and associated metadata is created and stored with the buckets. When you search in Splunk, this is what is searched. Typically the indexed data is compressed as white space and unneeded characters are removed.

So with that in mind, once you have indexed the monitored files, they can be deleted or rotated out. Of course, you need to consider your retention and legal compliance policies if you can delete the files.

On another note, compressed files and Splunk are a sticky point. Splunk's unarchiving tool is single threaded. So when Splunk encounters a tar/zip/gzip/tgz file, it has to extract it before it can read it. If you are dealing with a lot of files at once, this will create a slow down on your system and use more memory.

0 Karma

renjith_nair
Legend

That's my understanding and that's what I was trying to convey to seksit's question as well. The question was not asked by me but seksit 🙂

Happy Splunking!
0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Updated, misread the first commen!

0 Karma

Murali2888
Communicator

Hi seksit,

In your case, splunk will index the data from the log files ( present in the directory such as /var/log/2016/01, /var/log/2016/02) in the splunk index directory $SPLUNK_HOME/var/lib/splunk/ in compressed format.

In simple words, this is a copy of the source data but the size and format of the data is not same. Splunk stores the data in a series of index files.

For more read on how splunk indexes, please refer http://docs.splunk.com/Documentation/Splunk/6.3.1511/Indexer/HowSplunkstoresindexes

Hope this solves your queries to some extend.

0 Karma

renjith_nair
Legend

If you have configured Splunk to monitor a directory, Splunk picks up the files irrespective of whether it's copied manually or generated by some apps. Splunk checks the first bytes to check if the file was indexed previously and stores the events. If you want to exclude some files from a directory, that's also possible.

Happy Splunking!
0 Karma

renjith_nair
Legend

Sorry but what you mean by double raw data? Splunk picks up files from the directory and indexes it. It won't pick up the same file twice;Splunk checks first few bytes of file to see if it was already indexed

Happy Splunking!
0 Karma

seksit
Explorer

Hi renjith.nair Thank you for your advice.

That log file import by manual don't use splunk forwarder (copy from external HDD).

If splunk monitor directory splunk will store raw data in splunk directory?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...