Getting Data In

Indexing Multi-File Formats inside Zip Files Using Upload/Oneshoot Method and AutoSourcetyping

averlie_lina
New Member

Hello Everyone

For Endpoint Security Analysis Purposes we Gather Logs from Machines using Tools that Generate archives With lots of Files in it with Different Formats Like XML, JSO, SQLite, txt, Log, evt, evtx, bin, etc...

My Aim is to have all these Data Indexed Manually (Using Web Upload Method or CLI oneshoot) for the Team to use Splunk Search Capabilities to Simplify analysis Process

Therefore I'm trying to do the Flowing :-

  • I Created Sourcetypes for Each a sample of File-types inside these archives (TextLog and XML)
  • I created [source::] Stanzas For Files Inside Archives to Assign these Previously Created Sourcetypes Automatically

after I uploaded the Zip file the Results was that Some Extensions were indexed Successful and Some were having a sourcetype of "unknown1"

One of the Successfully indexed File-types was "*.bin" (Which is a plain Text Log file With time stamped lines) and Source Filed Was as Follows:

KYPD_GSD_OFFICE_2018_04_22_16_06.zip:.\ThisIsaSample Folder Logs/Documents and Settings/All Users/Application Data/setupdownloader.1524402598.bdinstall.bin

and one of the Unsuccessful Ones were "*.XML" (a Typical XML File with no time Stamps) and source Filed Was as Follows :-

KYPD_GSD_OFFICE_2018_04_22_16_06.zip:.\ThisIsaSample Folder Logs/output.xml

I tried the Flowing Props.conf Settings on System/Local Folder:-

[KYPD_XML]
LINE_BREAKER = (<\?\w++.*\?>)|<\/\w+>(\s*)|\/>(\s*)|<\w+>(\s*)<\w+
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = true
MAX_EVENTS = 9999999
TRUNCATE = 0
KV_MODE = none
BREAK_ONLY_BEFORE = (?!)
#LEARN_MODEL = false
#=================================================================
[source::(?i)KYPD_[\w\-]+_[\d_]+[.]zip[:].[\\\/]ThisIsaSample Folder Logs[\\\/]output.xml]
sourcetype = KYPD_XML
priority = 100
#=================================================================
[KYPD_TXTLog]
DATETIME_CONFIG = CURRENT
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
TRUNCATE = 0
LEARN_MODEL = false
LEARN_SOURCETYPE = false
#=================================================================
[source::(?i)KYPD_[\w\-]+_[\d_]+[.]zip[:].[\\\/]ThisIsaSample Folder Logs...(.txt|.bdx|.log|.log.1|.bin|.dbg|.dbg.old|_debug.txt(.old)?)]
sourcetype = KYPD_TXTLog
priority = 99

I tried Also the Following But Still the XML file "output.xml" Get "unknown1" as a Source type:-

[KYPD_XML]
LINE_BREAKER = (<\?\w++.*\?>)|<\/\w+>(\s*)|\/>(\s*)|<\w+>(\s*)<\w+
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
category = MachineLogs
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = true
MAX_EVENTS = 9999999
TRUNCATE = 0
KV_MODE = none
BREAK_ONLY_BEFORE = (?!)
#LEARN_MODEL = false
#=================================================================
[source::…output.xml] # also tried ([source::*output.xml], [source::*.zip[:][.]…output.xml], [source::*.zip[:][.]\\…/output.xml] and [source::*.zip[:][.]\\ThisIsaSample Folder Logs/output.xml] )
sourcetype = KYPD_XML
priority = 100

to be Honest I'm about To Give up the whole Idea...

I tried Many things but I cannot Understand Why the XML file is not Getting the Sourcetype Automatically...

I Appreciate if you Can Tell Me if I'm Missing Something Here..

Tags (2)
0 Karma

averlie_lina
New Member

Hello All

I tried to Extract the Files and Created Monitor Stanza For it and It Worked.

But the Upload Method is Much more Convenient in Our Case

I noticed Also that the Previously Uploaded Files Doesn't Index any New Data (after Deleting and Creating New Index or Deleting Events Using the |Delete Search Command).

after some Readings I Got to Know that Splunk Doesn't Re-index Duplicate Files (CRCing the first 265 bytes of a file) and one can Configure crcSalt in inputs.conf.

But I'm not sure if this Can work with Web Uploaded or CLI (oneshot) indexed Files.

the other thing that I Came Across is (I'm also not sure also if this is Right as it was in the Forums) that Splunk Caches the Automatically Assigned SourceType of a file for 5 Minutes will will not recalculate it Before that time Elapses.

I spend Hours trying to Modify the Props.conf <source::> Stanzas and Kept on Retrying Deleting indexed Events
(also Deleting the Entire Index and Recreating it) and Re indexing it for the Same Archive Files with no luck, and I See that May be this is the Reason.

Appreciate if Someone Correct me if I'm Wrong, or Have Solution in mind.

Thanks

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...