Hello Team,
I want to extract the useragent information. Using apache server
I added the data as apache logs, ISS. the data is formatted this way:
66.249.xx.xx - - - [02/Jun/2012:04:02:12 -0400] "GET /robots.txt HTTP/1.1" 200 1792 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
However, I've added a couple of files, and useragent is never listed in the field column. I am interested in extracting these: Mozilla/5.0 (compatible; Googlebot/2.1)
Any recommendations?
Where in the Splunk Manager can I set access_combined. I am a non programmer and new to Splunk so am having difficulties setting up
Your log files do not need to contain the name of the field. useragent
is the field name.
If the logs are apache logs, you can set the sourcetype to access_combined
or access_combined_wcookie
in inputs.conf (or via the Splunk Manager GUI). These sourcetypes are predefined in Splunk for apache logs; I think that access_combined
may be the right choice for your situation.
When you use a predefined sourcetype, Splunk will automatically perform the field extractions that are defined for that sourcetype. The access_combined sourceytpe defines a field named useragent
.
BTW, this change will only affect new data that is added to Splunk. The data that has already been indexed will not change to the access_combined sourcetype.
Lisa is ultimately correct with this answer. Correct source typing is the solution. You may want to consider attending Splunk training courses.
There are several choices:
1. Extract the field at search time with | rex command like so: source=/opt/log* | rex field=_raw "\"-\"\s+(?
2. Use the interactive field extractor. See http://docs.splunk.com/Documentation/Splunk/4.3.3/User/InteractiveFieldExtractionExample
3. Or use Splunk's Manager view to create an extracted field.
You can use the | rex command or use the Interactive field extractor to extract your own field.
http://docs.splunk.com/Documentation/Splunk/4.3.3/User/InteractiveFieldExtractionExample
Thanks for your answer.
But I just realized my logs files don't have the word useragent listed.
How can I extract the Googlebot information instead?