Hi All,
I am monitoring files that land in the same directory that I wish to be considered as different source types. The way
I want to distinguish them is with their names. There will be three different source types and they will be csv files.
The naming conventions will be time_*.csv, pulse_*.csv, and flow_*.csv
.
I actually have this working using the following in inputs.conf:
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\pulse_*.csv]
sourcetype = DGC_PULSE
index=main
host_segment = 4
crcSalt = <SOURCE>
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\flow_*.csv]
sourcetype = DGC_FLOW
index=main
host_segment = 4
crcSalt = <SOURCE>
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\time_*.csv]
sourcetype = DGC_TIME
index=main
host_segment = 4
crcSalt = <SOURCE>
This works exactly as I want. The use of crcSalt turns out to be necessary as many of the files have meta information that
is identical and this forces the indexer to consider them all.
As I said, the above works fine as long as the files to be monitored are landed as .csv files. My requirements have changed
and I will now be landing *.zip files containing the desired .csv files.
It is not clear to me why, but splunk is not indexing the zip files using the above configuration. Everything I read would seem
to indicate that it should index the zip files. Perhaps the monitor stanza is excluding the zip files - I haven't been able to figure
that one out.
I can say that if the monitor stanza is left open([monitor://C:\tpg\leamcsv\dualgamma_logs\...\]
), it will index the contents of the zip files, but that leaves me unable to distingush
the different sourcetypes(at least not in the way that I was doing).
After doing some research I read that attempting to index multiple sourcetypes from a common directory could lead to inconsistent
results(I dont have that link handy at the moment). At any rate, the suggestion was to use a more open qualification as I mentioned
in the previous paragraph and assign the sourcetype on a per event basis or in props.conf. I chose to do this in props.conf. I
am using the following configuration:
inputs.conf:
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\]
index=main
host_segment = 4
crcSalt = <SOURCE>
props.conf
[source::...\pulse_*\.csv]
sourcetype=DGC_PULSE
[source::...\flow_*\.csv]
sourcetype=DGC_FLOW
[source::...\time_*\.csv]
sourcetype=DGC_TIME
The problem I see now is that none of my expected sourcetypes are assigned. Instead, I get csv, csv1, csv2, etc... for sourcetypes.
I suspect the issue is with my regular expressions I have used in props.conf. From everything I have read, these look like they
are correct, but I haven't been able to figure out what I am missing.
Does any have any suggestions about my approach, and/or what might be wrong with my regular expressions?
Thanks
Copy and paste these into the identified conf files. Then restart each instance they are deployed to. Be sure to change your time format in the props.conf.
inputs.conf
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\]
sourcetype = DGC_TIME
index=main
host_segment = 4
crcSalt = <SOURCE>
transforms.conf
[extract_pulse_sourcetype]
SOURCE_KEY = MetaData:Source
REGEX = pulse_.*\.csv
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::DGC_PULSE
[extract_flow_sourcetype]
SOURCE_KEY = MetaData:Source
REGEX = flow_.*\.csv
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::DGC_FLOW
props.conf
[DGC_TIME]
TRANSFORMS-transform_1 = extract_pulse_sourcetype
TRANSFORMS-transform_2 = extract_flow_sourcetype
TIME_FORMAT = timeformat
SHOULD_LINEMERGE = false|true
Did that work for you?
How about...
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\pulse_*]
sourcetype = DGC_PULSE
index=main
host_segment = 4
crcSalt = <SOURCE>
that would work regardless if they are .zip or .csv
Are they being bundled inside of a single .zip?
If so:
inputs.conf
[monitor://C:\tpg\leamcsv\dualgamma_logs\...\]
sourcetype = DGC_TIME
index=main
host_segment = 4
crcSalt = <SOURCE>
transforms.conf
[transform_name1]
SOURCE_KEY = MetaData:Source
REGEX = pulse_*\.csv
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::DGC_PULSE
[transform_name2]
SOURCE_KEY = MetaData:Source
REGEX = flow_*\.csv
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::DGC_FLOW
props.conf
[DGC_TIME]
TRANSFORMS-transform_name = transform_name1, transform_name2
TIME_FORMAT = timeformat
SHOULD_LINEMERGE = false|true
I will post a new answer in the answer field so I can get the code bit to work.
looks like I had the same problem you had the last one should have been "pulse_.*\csv" as you put in your last comment.
the source names look something like this:
time_DGC_DG14_23_2013_10_09_09_07_37.csv
so are you saying the regex ought to look something like this:
pulse_.csv or pulse_..csv or pulse_.\csv? None of those seem obvious to me.
iPad isn't letting me select code "_.*\.csv"
Ah, also the last bit goes in the props.conf.
What we are doing is saying by default, all data from the inputs path are to be known as source type DGC_TIME. Then in the props.conf (by way of the transforms.conf) we say that if the source matches pulse_.csv that it's source type should be DGC_PULSE, if it matches flow_.csv then it should be source type DGC_FLOW
And I just noticed I did not escape the . So, replace _.csv in regex with _..csv
Btw, you can replace transform_name 1,2 with anything you want, I was just using it as a filler name. Just make sure the names get put into the props.conf
What are the source names?
this is what I am using in transforms.conf:
[transform_name1]
SOURCE_KEY = MetaData:Source
REGEX = pulse_*.csv
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::DGC_PULSE
and this is what I am using in props.conf:
[DGC_PULSE]
TRANSFORMS-transform_name = transform_name1
I am not sure about this one - not sure about the mapping of the stanza name to sourcetype although I must admit I haven't look at the doc on this yet...
Thanks for the input. I tried this and am still getting csv, csv_1, etc for sourcetype. I did splunk clean all on both my splunk instance and my universal forwarder.
I think I understand what you have suggested and it looks very similar to what I was initially trying. Is it substantially different?
I am guessing that it is still failing on the regexes being used.
After you do this, you will need to either go to yoursplunkrul:8000/info and click reload EAI Objects where ever these configs are deployed to: UF (will need instance restart), Indexer, ect.
You may even want to restart the instance just for good measure.