Hello,
I am trying to extract the entire URL up to the point where it includes a question mark. Generally the data will look like this:
{"cf_app_id":"6304b330-c026-4ea2-a6cf-41226d5357ad","cf_app_name":"app","cf_ignored_app":false,"cf_org_id":"ff8d1329-74e1-4d13-852f-5cea389de951","cf_org_name":"apporg","cf_origin":"firehose","cf_space_id":"79a0055d-36ba-4051-b3ea-825023d617b2","cf_space_name":"prod-web","deployment":"p-isolation-segment-dbd885e4d164ead74648","event_type":"LogMessage","ip":"192.168.1.1","job":"isolated_router","job_index":"6cbb1296-8dac-4f14-859e-63292ea984e8","message_type":"OUT","msg":"app.web.state.bizsunit.company.com - [2019-07-09T03:38:28.088+0000] \"POST /api/contact/contactQuestions HTTP/1.1\" 200 52 3861 \"https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134\" \"192.168.1.1:35268\" \"192.168.9.12:1010\" x_forwarded_for:\"192.168.8.1, 192.168.8.11\" x_forwarded_proto:\"https\" vcap_request_id:\"af7f0c6f-eff3-48c4-5f33-2e50c81e1104\" response_time:0.650701403 app_id:\"6304b330-c026-4ea2-a6cf-41226d5357ad\" app_index:\"1\" x_request_id:\"6bdccae0-300f-4acf-9772-4264b18b7db4\" x_b3_traceid:\"20fc67710967dfb2\" x_b3_spanid:\"20fc67710967dfb2\" x_b3_parentspanid:\"-\"\n","origin":"gorouter","source_instance":"1","source_type":"RTR","timestamp":1562643508739740164}
https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact?devicecd=PC\u0026zip=78250
I want to extract these as the URL:
https:/app.web.state.bizsunit.company.com/apps2
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/stuff
https:/app.web.state.bizsunit.company.com/apps2/biz/ecomm/contact
I tried this
(?<msg_uri>https:\/\/[^?]+)
Which seems to match regexr but not correctly with Splunk
Any help is appreciated!
Try this:
https?:(<url>\/\/?[^\s\?]+)
Maybe you're missing escaping the ?
| rex field=_raw "(?<url>https[^\?]*)"