Splunk Search

Extract data from URL string

Vfinney
Observer

I am trying to extract the file types, file names, and URLs from proxy logs for monitoring purposes. Here is what I'm looking for. Thanks in advance for any and all assistance.

URL Filetype Filename
http://dmp.truoptik.com .gif sync
http://r14---sn-bvvbax jpl.gvt1 .exe Chrome_updater
http://workforce-ks.com/ .pdf 2019-One-Stop-Advisory-Council-Meeting-Packet

Proxy logs examples:
7/30/19
1:29:52.000 PM

Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 "DOL\sroth@KDOL_Web_Auth" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",79.30,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbax jpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.3809.87_75.0.3770.142_chrome_updater.exe?cms_redirect=yes&mip=165.201.56.130&mm=28&mn=sn-bvvbax-hjpl&ms=nvh&mt=1564511187&mv=m&mvi=13&nh=EAE&pl=16&shardbypass=yes "DOL\dingels@KDOL_Web_Auth" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",39205.53,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19
1:33:50.000 PM

Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... "DOL\nstruckhoff@KDOL_Web_Auth" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",15566.88,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

7/30/19
1:33:11.000 PM

Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf "DOL\njanco@KDOL_Web_Auth" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,"-",-,-,-,-,"-",-,-,-,"-",-,-,"-","-",-,-,-,-,"-","-","-","-","-","-",6364.55,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> -

0 Karma
1 Solution

jacobpevans
Motivator

Greetings @Vfinney,

Please try the following run-anywhere search.

| makeresults
| eval _raw="7/30/19 1:29:52.000 PM Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 \"DOL\sroth@KDOL_Web_Auth\" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",79.30,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -"
| append [ makeresults | eval _raw="7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbaxjpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.... \"DOL\dingels@KDOL_Web_Auth\" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",39205.53,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:50.000 PM Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... \"DOL\nstruckhoff@KDOL_Web_Auth\" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",15566.88,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:11.000 PM  Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf \"DOL\njanco@KDOL_Web_Auth\" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",6364.55,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| rex field=_raw     "GET (?<Full_URL>https?://[^\s]+)"
| rex field=Full_URL "(?<URL>https?://[^/]+/)"
| rex field=Full_URL "/(?<Filename>[^/]+)(?<Filetype>\.(gif|exe|pdf))\??"
| table URL Filename Filetype

These are the results:

URL                                    Filename                                              Filetype
http://dmp.truoptik.com/               sync                                                  .gif
http://r14---sn-bvvbaxjpl.gvt1.com/    76.0.3809.87_75.0.3770.142_chrome_updater             .exe
http://workforce-ks.com/               08.01.2019-One-Stop-Advisory-Council-Meeting-Packet   .pdf
http://ts.intra.dol.ks.gov/            EmployeeRecognition                                   .pdf

Assumptions:
- URL is always preceded by "GET " and does not contain spaces.
- Filename does not contain spaces or "/" symbol
- Filetype is either .gif, .exe, or .pdf. You can add | and the new extension after gif|exe|pdf to add others.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

jacobpevans
Motivator

Greetings @Vfinney,

Please try the following run-anywhere search.

| makeresults
| eval _raw="7/30/19 1:29:52.000 PM Jul 30 13:29:52 10.140.24.233 Jul 30 13:29:52 Access_Logs_Splunk: Info: 1564511389.352 80 10.140.6.27 TCP_MISS/204 793 GET http://dmp.truoptik.com/239e300e6dca3b53/sync.gif?dm=ib.adnxs.com&fck=6298473322644763945 \"DOL\sroth@KDOL_Web_Auth\" DIRECT/dmp.truoptik.com - DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",79.30,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -"
| append [ makeresults | eval _raw="7/30/19 1:29:42.000 PM Jul 30 13:29:42 10.140.24.233 Jul 30 13:29:42 Access_Logs_Splunk: Info: 1564511379.248 324 10.140.10.21 TCP_MISS/206 1587824 GET http://r14---sn-bvvbaxjpl.gvt1.com/edgedl/release2/chrome/AOnIEhGH7WaH0jVMgWzb_TU_76.0.3809.87/76.0.... \"DOL\dingels@KDOL_Web_Auth\" DIRECT/r14---sn-bvvbax-hjpl.gvt1.com application/octet-stream DEFAULT_CASE_12-KDOL_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",39205.53,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:50.000 PM Jul 30 13:33:50 10.140.24.234 Jul 30 13:33:50 Access_Logs_Splunk: Info: 1564511627.609 1779 10.140.4.14 TCP_MISS/200 3461685 GET http://workforce-ks.com/wp-content/uploads/2015/05/08.01.2019-One-Stop-Advisory-Council-Meeting-Pack... \"DOL\nstruckhoff@KDOL_Web_Auth\" DIRECT/workforce-ks.com application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",15566.88,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| append [ makeresults | eval _raw="7/30/19 1:33:11.000 PM  Jul 30 13:33:11 10.140.24.234 Jul 30 13:33:11 Access_Logs_Splunk: Info: 1564511588.080 44 10.140.4.104 TCP_MISS/200 35005 GET http://ts.intra.dol.ks.gov/Files/PDF/EmployeeRecognition.pdf \"DOL\njanco@KDOL_Web_Auth\" DIRECT/ts.intra.dol.ks.gov application/pdf DEFAULT_CASE_12-Social_Media_Access_Policy-KDOL_Web_Identity-NONE-NONE-NONE-DefaultGroup <-,-,-,\"-\",-,-,-,-,\"-\",-,-,-,\"-\",-,-,\"-\",\"-\",-,-,-,-,\"-\",\"-\",\"-\",\"-\",\"-\",\"-\",6364.55,0,-,\"-\",\"-\",-,\"-\",-,-,\"-\",\"-\",-,-,\"-\"> -" ]
| rex field=_raw     "GET (?<Full_URL>https?://[^\s]+)"
| rex field=Full_URL "(?<URL>https?://[^/]+/)"
| rex field=Full_URL "/(?<Filename>[^/]+)(?<Filetype>\.(gif|exe|pdf))\??"
| table URL Filename Filetype

These are the results:

URL                                    Filename                                              Filetype
http://dmp.truoptik.com/               sync                                                  .gif
http://r14---sn-bvvbaxjpl.gvt1.com/    76.0.3809.87_75.0.3770.142_chrome_updater             .exe
http://workforce-ks.com/               08.01.2019-One-Stop-Advisory-Council-Meeting-Packet   .pdf
http://ts.intra.dol.ks.gov/            EmployeeRecognition                                   .pdf

Assumptions:
- URL is always preceded by "GET " and does not contain spaces.
- Filename does not contain spaces or "/" symbol
- Filetype is either .gif, .exe, or .pdf. You can add | and the new extension after gif|exe|pdf to add others.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...