Getting Data In

What are the default sourcetypes and how are they determined?

Yancy
Path Finder

Sometimes Splunk sets the sourcetype on an incoming file as breakable_text or too_small. What determines these sourcetypes? Are there other common sourcetypes that Splunk sets?

Tags (2)
1 Solution

hulahoop
Splunk Employee
Splunk Employee

Hi Yancy,

You have several options for configuring sourcetype when configuring a data input.

  1. If a sourcetype is not set, Splunk will attempt to auto-recognize the data format and assign one. This is why you sometimes get breakable_text or too_small as the sourcetype.
  2. Set a manual sourcetype. Name it anything your heart desires.
  3. Choose from a list of sourcetypes already known to Splunk (e.g. syslog, weblogic_stdout, access_combined). This just means you get some configuration out of the box for these sourcetypes, such as field extractions, timestamp recognition, host identification).

The options above are available when configuring a data input from the Manager UI. But what if you want to do something more advanced? For example, if you have a directory full of logs and the logs have several different data formats? Or what if your syslog server is collecting data from multiple sources with different formats?

More advanced sourcetype configuration is detailed here: http://www.splunk.com/base/Documentation/4.0.11/Knowledge/Aboutsourcetypes (The link refers to version 4.0 but concept and configuration are applicable to 3.x and 4.1.)

Why is it important to get the sourcetyping correct? Organizing your data into sensible sourcetypes makes it easier to apply other configuration such as field extractions and lookups, and may also simplify rules for access controls. It will also make for a more powerful and succinct search experience. For example, if you have a repository of web access logs, db2 error logs and syslog, wouldn't it be nice if you could simply search on just db2 error logs, or just syslog? Sourcetyping will allow you to do so.

View solution in original post

hulahoop
Splunk Employee
Splunk Employee

Hi Yancy,

You have several options for configuring sourcetype when configuring a data input.

  1. If a sourcetype is not set, Splunk will attempt to auto-recognize the data format and assign one. This is why you sometimes get breakable_text or too_small as the sourcetype.
  2. Set a manual sourcetype. Name it anything your heart desires.
  3. Choose from a list of sourcetypes already known to Splunk (e.g. syslog, weblogic_stdout, access_combined). This just means you get some configuration out of the box for these sourcetypes, such as field extractions, timestamp recognition, host identification).

The options above are available when configuring a data input from the Manager UI. But what if you want to do something more advanced? For example, if you have a directory full of logs and the logs have several different data formats? Or what if your syslog server is collecting data from multiple sources with different formats?

More advanced sourcetype configuration is detailed here: http://www.splunk.com/base/Documentation/4.0.11/Knowledge/Aboutsourcetypes (The link refers to version 4.0 but concept and configuration are applicable to 3.x and 4.1.)

Why is it important to get the sourcetyping correct? Organizing your data into sensible sourcetypes makes it easier to apply other configuration such as field extractions and lookups, and may also simplify rules for access controls. It will also make for a more powerful and succinct search experience. For example, if you have a repository of web access logs, db2 error logs and syslog, wouldn't it be nice if you could simply search on just db2 error logs, or just syslog? Sourcetyping will allow you to do so.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...