Splunk Search

Extract data from URL string

Vfinney
Observer

I am trying to extract the file types, file names, and URLs from proxy logs for monitoring purposes. Here is what I'm looking for. Thanks in advance for any and all assistance.

URL Filetype Filename
http://dmp.truoptik.com .gif sync
http://r14---sn-bvvbax jpl.gvt1 .exe Chrome_updater
http://workforce-ks.com/ .pdf 2019-One-Stop-Advisory-Council-Meeting-Packet

Proxy logs examples:
7/30/19
1:29:52.000 PM

Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 "DOL\sroth@KDOL_Web_Auth" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",79.30,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbax jpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.3809.87_75.0.3770.142_chrome_updater.exe?cms_redirect=yes&mip=165.201.56.130&mm=28&mn=sn-bvvbax-hjpl&ms=nvh&mt=1564511187&mv=m&mvi=13&nh=EAE&pl=16&shardbypass=yes "DOL\dingels@KDOL_Web_Auth" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",39205.53,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19
1:33:50.000 PM

Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... "DOL\nstruckhoff@KDOL_Web_Auth" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",15566.88,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19
1:33:11.000 PM

Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf "DOL\njanco@KDOL_Web_Auth" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",6364.55,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

0 Karma
1 Solution

jacobpevans
Motivator

Greetings @Vfinney,

Please try the following run-anywhere search.

| makeresults
| eval _raw="7/30/19 1:29:52.000 PM Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 \"DOL\sroth@KDOL_Web_Auth\" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",79.30,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -"
| append [ makeresults | eval _raw="7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbaxjpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.... \"DOL\dingels@KDOL_Web_Auth\" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",39205.53,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:50.000 PM Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... \"DOL\nstruckhoff@KDOL_Web_Auth\" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",15566.88,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:11.000 PM  Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf \"DOL\njanco@KDOL_Web_Auth\" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",6364.55,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| rex field=_raw     "GET (?<Full_URL>https?://[^\s]+)"
| rex field=Full_URL "(?<URL>https?://[^/]+/)"
| rex field=Full_URL "/(?<Filename>[^/]+)(?<Filetype>\.(gif|exe|pdf))\??"
| table URL Filename Filetype

These are the results:

URL                                    Filename                                              Filetype
http://dmp.truoptik.com/               sync                                                  .gif
http://r14---sn-bvvbaxjpl.gvt1.com/    76.0.3809.87_75.0.3770.142_chrome_updater             .exe
http://workforce-ks.com/               08.01.2019-One-Stop-Advisory-Council-Meeting-Packet   .pdf
http://ts.intra.dol.ks.gov/            EmployeeRecognition                                   .pdf

Assumptions:
- URL is always preceded by "GET " and does not contain spaces.
- Filename does not contain spaces or "/" symbol
- Filetype is either .gif, .exe, or .pdf. You can add | and the new extension after gif|exe|pdf to add others.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

jacobpevans
Motivator

Greetings @Vfinney,

Please try the following run-anywhere search.

| makeresults
| eval _raw="7/30/19 1:29:52.000 PM Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 \"DOL\sroth@KDOL_Web_Auth\" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",79.30,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -"
| append [ makeresults | eval _raw="7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbaxjpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.... \"DOL\dingels@KDOL_Web_Auth\" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",39205.53,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:50.000 PM Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... \"DOL\nstruckhoff@KDOL_Web_Auth\" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",15566.88,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:11.000 PM  Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf \"DOL\njanco@KDOL_Web_Auth\" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",6364.55,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| rex field=_raw     "GET (?<Full_URL>https?://[^\s]+)"
| rex field=Full_URL "(?<URL>https?://[^/]+/)"
| rex field=Full_URL "/(?<Filename>[^/]+)(?<Filetype>\.(gif|exe|pdf))\??"
| table URL Filename Filetype

These are the results:

URL                                    Filename                                              Filetype
http://dmp.truoptik.com/               sync                                                  .gif
http://r14---sn-bvvbaxjpl.gvt1.com/    76.0.3809.87_75.0.3770.142_chrome_updater             .exe
http://workforce-ks.com/               08.01.2019-One-Stop-Advisory-Council-Meeting-Packet   .pdf
http://ts.intra.dol.ks.gov/            EmployeeRecognition                                   .pdf

Assumptions:
- URL is always preceded by "GET " and does not contain spaces.
- Filename does not contain spaces or "/" symbol
- Filetype is either .gif, .exe, or .pdf. You can add | and the new extension after gif|exe|pdf to add others.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...