Splunk Search

Writing a search query that validates file format

mahbs
Path Finder

Hi Guys,

I have a question regarding file validation. Is it possible in Splunk to validate the structure/format of a file? For instance, if you have a table with the following columns:

Name | Surname | Date

This table is ingested into Splunk. How would I write a query to check that in a different file, it contains the same headers, and the format of the values in the second file is in the correct format. For example, a date value should be in the correct format such as dd/mm/yyyy as opposed to mm/dd/yyyy.

So is there a way in Splunk to write a query that validates this?

Thanks

0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

Yes, and there are probably 73 different ways to do this.

One method may be to use regular expressions and some built-in eval functions. This should be fairly flexible, though depending on your skill with regex, there may be some difficulties [See note 1].

Obviously, there will be quite a bit of work in deciding "what's proper" and what isn't. But let's say that dates should be dd/mm/yyyy. You can check any sort of date format with the eval function strptime. For instance, here's a handful of different date "styles" and if they fit into d/m/y or m/d/y. This is run anywhere - just paste it in and press enter and it should give you output.

| makeresults 
| eval Dt="4/3/2017 1/1/2017 12/21/2017 2/30/2017 30/2/2017" 
| makemv delim=" " Dt 
| mvexpand Dt 
| eval MMDDYYYYDate = strptime(Dt, "%m/%d/%Y"), DDMMYYYYDate = strptime(Dt, "%d/%m/%Y")
| eval MMDDYYYYFits = if(isnull(MMDDYYYYDate),"No","Yes"), DDMMYYYYFits = if(isnull(DDMMYYYYDate),"No","Yes")

The first four lines are just "setting the stage". The 5th line tries to fit that created field Dt into a couple of date styles. The last line then checks if it worked - a sample of the logic to find out if it matches or not.

Likewise, you can do at least some very rudimentary checking of strings for certain formatting, but without knowing what exact formatting you need to check for, I'll just throw out a couple of examples.

Let's suppose you have a field that needs to have only 12 to 14 capital letters or numbers (no lowercase, no spaces, etc...), you could test that with something like

.... | rex field="myfield" "^(?<IsValid>[A-Z0-9]{12,14})$"

So, a few sites to get more information about specific use cases:
First, this Regular Expressions example site is very useful for a lot of pattern matching examples for particular use cases.
As always, Regex 101 is also a great site - less about showing you patterns and more about letting you test them easily, but it's fabulous nevertheless.

Note 1: We love REGEX here. Ask away (though generally one question per regex or at least task). Also, if you get on the Splunk Slack group (google for instructions on signing up) there's a channel dedicated to regex where there's a handful of very knowledgeable folks who love to help!

Happy Splunking!
-Rich

View solution in original post

0 Karma

Richfez
SplunkTrust
SplunkTrust

Yes, and there are probably 73 different ways to do this.

One method may be to use regular expressions and some built-in eval functions. This should be fairly flexible, though depending on your skill with regex, there may be some difficulties [See note 1].

Obviously, there will be quite a bit of work in deciding "what's proper" and what isn't. But let's say that dates should be dd/mm/yyyy. You can check any sort of date format with the eval function strptime. For instance, here's a handful of different date "styles" and if they fit into d/m/y or m/d/y. This is run anywhere - just paste it in and press enter and it should give you output.

| makeresults 
| eval Dt="4/3/2017 1/1/2017 12/21/2017 2/30/2017 30/2/2017" 
| makemv delim=" " Dt 
| mvexpand Dt 
| eval MMDDYYYYDate = strptime(Dt, "%m/%d/%Y"), DDMMYYYYDate = strptime(Dt, "%d/%m/%Y")
| eval MMDDYYYYFits = if(isnull(MMDDYYYYDate),"No","Yes"), DDMMYYYYFits = if(isnull(DDMMYYYYDate),"No","Yes")

The first four lines are just "setting the stage". The 5th line tries to fit that created field Dt into a couple of date styles. The last line then checks if it worked - a sample of the logic to find out if it matches or not.

Likewise, you can do at least some very rudimentary checking of strings for certain formatting, but without knowing what exact formatting you need to check for, I'll just throw out a couple of examples.

Let's suppose you have a field that needs to have only 12 to 14 capital letters or numbers (no lowercase, no spaces, etc...), you could test that with something like

.... | rex field="myfield" "^(?<IsValid>[A-Z0-9]{12,14})$"

So, a few sites to get more information about specific use cases:
First, this Regular Expressions example site is very useful for a lot of pattern matching examples for particular use cases.
As always, Regex 101 is also a great site - less about showing you patterns and more about letting you test them easily, but it's fabulous nevertheless.

Note 1: We love REGEX here. Ask away (though generally one question per regex or at least task). Also, if you get on the Splunk Slack group (google for instructions on signing up) there's a channel dedicated to regex where there's a handful of very knowledgeable folks who love to help!

Happy Splunking!
-Rich

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...