I am running Splunk version 4.2.1.
I have a saved search that runs nightly. This was one of my first queries in Splunk, so there is likly room for improvement. I have a Perl script that executes later in the morning to create a report using the .csv file. I use the collect statement to index the search data, but the stash file has been filling the disk of late. The resulting tash file has been as large as 20Gb.
(%ASA-4-106023 OR %ASA-4-733100 OR %ASA-3-710003 OR %ASA-6-106015 OR %ASA-6-106006 OR %ASA-6-725006 OR "invalid user" OR "% Failed User Login" OR "%AAA-W-REJECT" OR "%EMWEB-1-LOGIN_FAILED:" OR "Authent. Failure:" OR %ASA-4-106023 OR %ASA-4-733100 OR %ASA-3-710003 OR %ASA-6-106015 OR %ASA-6-106006 OR "unable to connect" OR "%SNMP-W-SNMPAUTHFAIL" NOT (search OR 106023 OR 106015 OR "Topology Change" OR "%ASA-4-733100" OR "%ASA-3-710003" OR igmp ))| collect | dedup _time | sort -host _time | fields _raw | outputcsv singlefile=true loginfails.csv
It looks like the collect statement is duplicating data. For example, 21 unique login failure attempts has generated 4,273,831 events. We are quite certain that the switch in question has not sent 4.2M syslog events to Splunk.
To prove this, I have created a simple query looking for login failures for a specific user for a specific day.
username AND REJECTED
I end up with 4,273,831 matching events, most are at the exact same second in time. When I modify the query to the following, I get 21 matching events.
username AND REJECTED | dedup _time
What is common in both these queries is that the sourcetype=stash and index=summary. While this query was executing, I did not see any stash files in $SPLUNK_HOME/var/spool/splunk where I normally see them.
If I change the query to "index=main username AND REJECTED" or "sourcetype=udp:514 username AND REJECTED", I get only 1 matching event.
I cannot reconcile that "username AND REJECTED | dedup _time" produces 21 results and that
"index=main username AND REJECTED" produces only 1 result. I am also wondering if there is a bug in Splunk that would cause the summary index to fill with 4,273,831 matching events.
Any help unravelling this would be appreciated.
... View more