Getting Data In

Why is a deleted sourcetype still getting indexed?

proylea
Contributor

I have removed a sourcetype from my inputs.conf

[monitor:///data01/.../current/logs/*.log]
disabled = 0
sourcetype = log4j
index = oms
blacklist = gc\.(web|Node)[1-4]\.log

It's been changed to split up the sourcetypes as follows

[monitor:///data01/.../current/logs/fxoms*.log]
disabled = 0
sourcetype = oms
index = oms
blacklist = gc\.(web|Node)[1-4]\.log

[monitor:///data01/.../current/logs/fxlm*.log]
disabled = 0
sourcetype = lm
index = oms
blacklist = gc\.(web|Node)[1-4]\.log

[monitor:///data01/.../current/logs/tomcat*.log]
disabled = 0
sourcetype = tomcat
index = oms
blacklist = gc\.(web|Node)[1-4]\.log

[monitor:///data01/.../current/logs/*gc*.log]
disabled = 0
sourcetype = sun_jvm
crcSalt = <SOURCE>
index = jmx

[monitor:///data01/app/oms-holiday-adapter/current/logs/*.log]
disabled = 0
sourcetype = oms
index = oms

[monitor:///data01/app/oms-client-account-adapter/current/logs/*.log]
disabled = 0
sourcetype = oms
index = oms

I have restarted the forwarder and can now see the 3 new sourcetypes ln oms and tomcat, but I am still getting a couple of log files being ingested with the sourcetype log4j.

There is no longer any reference to log4j in the config on the host

How is it doing this?

0 Karma
1 Solution

Jeremiah
Motivator

It sounds like you are dealing with a single forwarder and a single inputs.conf file. However, it doesn't hurt to check the following:

1) another inputs.conf on the same host that is still configured to send the sourcetype. You can use btool to check and see what is monitored on your host:

splunk cmd btool inputs list

2) Other forwarders sending the same sourcetype as well. Even if the host field matches, you could potentially have another forwarder with the same host=hostname stanza in the inputs.conf. I've seen this a few times when people have copied their forwarder installs or inputs.conf files around.

If neither of those options pan out, assuming you modified the right file and restarted the forwarder, you might be seeing events that were timestamped in the future (or had misinterpreted timestamps that were indexed with a future date). This would result in you seeing data that looks "current" but was actually indexed in the past. Try this search:

sourcetype=log4j | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | table _time, indextime, _raw

Do _time and indextime match?

View solution in original post

Jeremiah
Motivator

It sounds like you are dealing with a single forwarder and a single inputs.conf file. However, it doesn't hurt to check the following:

1) another inputs.conf on the same host that is still configured to send the sourcetype. You can use btool to check and see what is monitored on your host:

splunk cmd btool inputs list

2) Other forwarders sending the same sourcetype as well. Even if the host field matches, you could potentially have another forwarder with the same host=hostname stanza in the inputs.conf. I've seen this a few times when people have copied their forwarder installs or inputs.conf files around.

If neither of those options pan out, assuming you modified the right file and restarted the forwarder, you might be seeing events that were timestamped in the future (or had misinterpreted timestamps that were indexed with a future date). This would result in you seeing data that looks "current" but was actually indexed in the past. Try this search:

sourcetype=log4j | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | table _time, indextime, _raw

Do _time and indextime match?

proylea
Contributor

The times are different
_time indextime _raw
2016-02-04 14:45:44 2016-02-03 10:02:57 03:45:44.937 ,TRACE,FeedLogger......

What does this mean?
Welcome to the twilight zone
There are no other inputs with log4j configured so how is this one coming in with that sourcetype.
I'm confused

0 Karma

Jeremiah
Motivator

The first timestamp is the time that Splunk extracted from the event. The second timestamp is the time that Splunk indexed the event.

The way to think about this is that Splunk "received" the event on 2016-02-03 10:02:57 (_indextime or indextime here).

However, when Splunk reads the event, it tries to interpret the timestamp in the event (03:45:44.937) and index it at that time (2016-02-04 14:45:44). Now you are searching Splunk and have caught up to the event. No new data is coming in, you're just catching up to events that were indexed with a future timestamp.

This is pretty common when you deal with data in different time zones or with partial timestamps like this event, which only has a time and no date. You might need to work on the props.conf configuration on your indexers to get the timestamp extraction working correctly, or talk to the developer/app admin to have them log a more complete timestamp.

0 Karma

proylea
Contributor

Thanks Jeremiah I think I understand, great explanation

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...