Hi,
I've spoken too soon. Please allow me to repost my question;
how I could extract country codes within series of numerical values with no fix length? The country code is within a field with starting 001001(prefix fixed length - 6 digits) then followed by the country code but without fixed length, then lastly followed by the MIN(mobile identification number) also not fixed in length. I just need the country codes inside?
as I was trying the accepted posted answer, it didn't work. I gotten far as this but this only leads me running in circles. This only returns all event with complete country code and mobile nos w/c it all started. I need to be able to create a report/graphs on those country codes basically.
index=xyz [|inputlookup country_code_lookup.csv | eval x="tel:001001".cc."" | fields x | rename x as MCA]
I just need the country codes inside but I'm out of wits on how to go about it, if the country code and MIN are not fixed in length. BTW, I have a lookup table but the country code is not fixed in length in the lookup table as well and I have tried to prefix a couple of zeros in the lookup table but it is not feasible because the actual data does not have leading zeros. Here are a couple of sample data:
tel:001001323353
tel:001001974555
tel:00100196659261
tel:001001966505998
tel:001001966015201
tel:001001338141015
tel:001001955009976
tel:001001965601621
tel:0010013203532
tel:00100163170000
tel:0010014647016
tel:00100197551559
tel:001001333532000
tel:0010013033532090
tel:001001323532000
I had this answer but I'm not sure how to go about it:
If there is no way of determining where the country code ends, you'd have to provide a list of all unique country codes that should be possible to match. Like
001001(1|21|33|35|47|46)
and so on.
Hi, as Ayn said in his response to your earlier question, you'd have to create a rather long regex of all possible country codes, since you do not know if it's one, two or three digits (sometimes even four or five) long. For the sake of simplicity you might not want to get into the sub country codes like Pitcairn (located under NewZealand) or Zanzibar (which is located under Tanzania).
If you do want that level of precision, where there is possible ambiguity, you would probably want to specify the country codes in the order from most specific to least specific, in order to not classify Jamaica (1876) as part of the US/Canada (1)
So assuming you have a field in your events called tel
, that contains values like 001001201111212, 00100118002322, 0010018765545499 etc, you'd extract the country code like this;
... | rex field=tel "001001(?<cc>(1876|1869|1|20|211|212|213|216|218|220))" | ...
which would give you 20, 1, and 1876 as values for cc
, respectively.
Then you'd probably want to create a lookup for the country codes as well;
cc country
1 North America
1876 Jamaica
1869 StKitts_Nevis
1868 Trinidad_Tobago
1809 Dominica
1829 Dominica
1849 Dominica
20 Egypt
211 South Sudan
212 Morocco
213 Algeria
...
etc
It should be in the format of a csv file, and you could configure the look to run automatically through props.conf settings, or on-demand with the lookup
search command.
See http://en.wikipedia.org/wiki/List_of_country_calling_codes for a full list of country codes.
For more info on lookups, see;
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup
http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Addfieldsfromexternaldatasources#Set_up...
Hope this helps,
K