Splunk Search

FTP file staistics

srajanbabu
Explorer

I am new to spluk, I have the below sample log and would like to arrive statistics on userwise how many files/Bytes retrived/FTPed. Please guide me to accomplish this task.

0004 rajan.#### :: 04/03/13 00:00:34 :: User rajan logged in
0004 rajan.000f :: 04/03/13 00:00:27 :: User logged off, Processing will begin
0004 rajan.000f :: 04/03/13 00:00:27 :: Transmission type set to Binary
0004 rajan.000f :: 04/03/13 00:00:27 :: Customer retrieved file xxxxxxxxx_G3INF0000000108647.CSV.PGP 62619 bytes transferred
0004 rajan.000f :: 04/03/13 00:00:27 :: Customer has successfully retrieved file xxxxxxxxx_G3INF0000000108647.CSV.PGP (123 records/62619 bytes)
0004 rajan.000f :: 04/03/13 00:00:27 :: FTPWATCH: (saf_file) SAF_INFO returned WINT1TBY,153F,123,62619
0004 rajan.000f :: 04/03/13 00:00:29 :: 125 Storing data set xxxxxxxxx.T000029.rajan.S000f
0004 rajan.000f :: 04/03/13 00:00:29 :: 250 Transfer completed successfully.
0004 rajan.000f :: 04/03/13 00:00:31 :: 125 Storing data set xxxxxxxxx.T000030.WINT1TBY.S000f
0004 rajan.000f :: 04/03/13 00:00:31 :: 250 Transfer completed successfully.
0004 rajan.000f :: 04/03/13 00:00:31 :: File xxxxxxxxx_G3INF0000000108647.CSV.PGP was an SAF from WINT1TBY.153F
0004 rajan.000f :: 04/03/13 00:00:31 :: No delete option was set for file xxxxxxxxx_G3INF0000000108647.CSV.PGP
0004 rajan.000f :: 04/03/13 00:00:31 :: Transmission type set to Binary
0004 rajan.000f :: 04/03/13 00:00:31 :: Customer retrieved file xxxxxxxxx_20130402.CSV.PGP 5656 bytes transferred
0004 rajan.000f :: 04/03/13 00:00:31 :: Customer has successfully retrieved file xxxxxxxxx_20130402.CSV.PGP (12 records/5656 bytes)
0004 rajan.000f :: 04/03/13 00:00:31 :: FTPWATCH: (saf_file) SAF_INFO returned WINT1TBY,741F,12,5656
0004 rajan.000f :: 04/03/13 00:00:33 :: 125 Storing data set xxxxxxxxx.T000032.rajan.S000f
0004 rajan.000f :: 04/03/13 00:00:33 :: 250 Transfer completed successfully.

0 Karma
1 Solution

kristian_kolb
Ultra Champion

The example query below will give you the number of files, records and bytes transferred per user. Assumptions;

  • no fields are previously extracted and will thus be made inline with rex,
  • all relevant information can be found in the 'Customer has successfully retrieved...' messages
  • user name is found in beginning of the line (between '0004 ' and '.000f')
  • timestamps are already being parsed correctly

index=blah sourcetype=bleh "Customer has successfully retrieved file"
| rex "^\S+\s(?<userid>\S+)\s"
| rex "\((?<record_count>\d+)\srecords/(?<byte_count>\d+)\sbytes\)$"
| stats count as FileCount sum(record_count) as RecordCount sum(byte_count) as ByteCount by userid

EDIT: userid regex now includes all characters between the first and second space.

Hope this helps,

Kristian

View solution in original post

kristian_kolb
Ultra Champion

The example query below will give you the number of files, records and bytes transferred per user. Assumptions;

  • no fields are previously extracted and will thus be made inline with rex,
  • all relevant information can be found in the 'Customer has successfully retrieved...' messages
  • user name is found in beginning of the line (between '0004 ' and '.000f')
  • timestamps are already being parsed correctly

index=blah sourcetype=bleh "Customer has successfully retrieved file"
| rex "^\S+\s(?<userid>\S+)\s"
| rex "\((?<record_count>\d+)\srecords/(?<byte_count>\d+)\sbytes\)$"
| stats count as FileCount sum(record_count) as RecordCount sum(byte_count) as ByteCount by userid

EDIT: userid regex now includes all characters between the first and second space.

Hope this helps,

Kristian

srajanbabu
Explorer

Awesome, thanks a lot

Rajan

0 Karma

kristian_kolb
Ultra Champion

Change your second rex to;

| rex "\s(?<file_name>\S+)\s\((?<record_count>\d+)\srecords/(?<byte_count>\d+)\sbytes\)$"

and rewrite your stats to;

| stats count as FileCount list(file_name) as FileName sum(record_count) as RecordCount sum(byte_count) as ByteCount by userid

Combined, this will add a list of the file names transferred.

/K

0 Karma

srajanbabu
Explorer

yea, that works except the filename is missing in the seach result. Below is the regex, the search result to include filename, which appears next to "Customer has successfully retrieved file", can you help me how do i include that in the search result.

index=main sourcetype=summa "Customer has successfully retrieved file" | rex "^\S+\s(?<userid>\S+)\." | rex "\((?<record_count>\d+)\srecords/(?<byte_count>\d+)\sbytes\)$" | stats count as FileCount sum(record_count) as RecordCount sum(byte_count) as ByteCount by userid

0 Karma

kristian_kolb
Ultra Champion

Well, maybe you figured out how to do it yourself, but if you DON'T want the 'second' part, i.e. '.123X', you can write the regex as;

rex "^\S+\s(?<userid>[^.]+)\.\S+\s"

which reads as;

from the start;
- one or more non-space characters (the 0004)
- one space character (space or tab)
- one or more non-dot characters (this is what we capture as userid)
- a single dot (hey, a dot)
- one or more non-space character (000F, 871F etc)
- one space character (space or tab)

/K

0 Karma

srajanbabu
Explorer

I am able make out the regex what I want. I an done thanks a lot.

0 Karma

srajanbabu
Explorer

is the user-id, the search is much better with new regex, thanks a lot for taking time to answer.

0 Karma

kristian_kolb
Ultra Champion

updated the regex to capture more characters. see above.

0 Karma

kristian_kolb
Ultra Champion

which part exactly is the username?

  • rajan
  • rajan.000f
0 Karma

srajanbabu
Explorer

The prvivded search works for user id end with .000f only, typical userid will suffixed with some numbers such as YAHOO3SN.871F, ANALF4DY.874F,FISGMEFX.880F etc.,

0 Karma

kristian_kolb
Ultra Champion

No. Or well. IF the user information is located where I assumed, the search will give the statistics per user.

Note that my regex stipulates that userid's should contain only 'word' characters (\w). So if they contain #.-! etc, they won't be extracted.

0 Karma

srajanbabu
Explorer

Thanks for your quick response, that was useful. However this query gives result for particular user only. I am looking to compute the statistics for all the users.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...