Getting Data In

Ingest only rows containing certain text from log file

joesrepsolc
Communicator

Have a very large log file (20,000+ lines per log file) and I only need the rows that contain "tell_group.pl" in them. Some start the line with that text, others have a "+ " before it. Hoping to build a props.conf that only ingest these lines from the log into a single event (1 log file = 1 event). So for each source file, I need all the lines (full line) that contain "tell_group.pl"

ROWS
ROWS
ROWS
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
# --------------------------------------------------------------------
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0
ROWS
ROWS
ROWS

THANKS IN ADVANCE!

Joe

0 Karma

darrenfuller
Contributor

Try this:

[answers786699]
disabled = false
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail
TRUNCATE = 10000

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g
SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g

Explanation:

1) ingest the whole file as a single event...

This is done with this line:
LINE_BREAKER = ([\r\n]+)\*\*\*\*xxxfail

Which tells splunk to only break when it reaches a carriage return followed by the exact string "****xxxfail" . If your files could be larger than 10000 lines, then also adjust the "TRUNCATE =" to be larger than your largest file (and probably include a buffer above that)... In the unlikely event that you do have ****xxxfail in your data, just change this to be an even more ridiculous and unlikely string... like It\sturns\sout\sthat\sthe\searth\sis\sflat or something

2) Remove all lines that don't have "tell_group.pl" somewhere in the line.

This is accomplished with the three SEDCMD lines .. they operate as follows:

SEDCMD-01-Remove_lines_part_1 = s/[\r\n]+(?!.*(tell_group\.pl)).*//g

This removes all lines from the file that do not have tell_group.pl in them ... When this line is applied by itself, the above file ingests as so:

ROWS




tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245


tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350


tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164


tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"

+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0

That first regex will work on all lines except the first line in the file (and it leaves a bunch of empty lines as well). To get rid of those, i used a variation of the first SEDCMD, only with the [\r\n]+ at the end of the match.

SEDCMD-02-Remove_lines_part_2 = s/^(?!.*(tell_group\.pl)).*[\r\n]//g

after this is done, we are left with:

tell_group.pl MSG "NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                  : $MEDSA_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_A TABLE UPDATES REQUIRED THIS RUN                 : 1245
tell_group.pl MSG "NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                  : $MEDSB_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_B TABLE UPDATES REQUIRED THIS RUN                 : 350
tell_group.pl MSG "NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                  : $MEDSC_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_C TABLE UPDATES REQUIRED THIS RUN                 : 164
tell_group.pl MSG "NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                  : $MEDSD_AMINF"
+ tell_group.pl MSG NUMBER OF ALTCARRIER_D TABLE UPDATES REQUIRED THIS RUN                 : 0

Which i believe answers your requirements. Hope this helps
./Darren

0 Karma

supreet
Explorer

Nice, I tried this and looks like it is working. Question: Does this mean only a part of my log file will be ingested so I am not using the whole log's disk space in my License ? Actually I only want to ingest a part of my debug logs (which are huge). Also, can we line break the events after this conversion so we have different events again after ingestion. @darrenfuller @woodcock 

0 Karma

woodcock
Esteemed Legend

If this is a one-time effort, use the add oneshot command and filter it first, something like this:

grep "tell_group.pl" /Your/Source/Path/And/Filname/Here > /tmp/ERASEME.txt
$SPLUNK_HOME/bin/splunk add oneshot /tmp/ERASEME.txt -sourcetype YourSourcetypeHere -index YourIndexHere -rename-source "/Your/Source/Path/And/Filname/Here"
rm -f /tmp/ERASEME.txt
0 Karma

supreet
Explorer

For me, it is going to be ongoing thing and not a one time effort. So wondering if there is a way to achieve this

0 Karma

darrenfuller
Contributor

Is there a timestamp anywhere in the file or should the props just use the index time?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...