I have a csv file that has only one column without any header. The data set includes values for userid, property1, property2, property3 and then again userid, propperty1, property2, property3 and so on. How can I extract fields useird, property1, property2 and property3 ?
Tried something like below (e.g. for userid), does not work
.....| rex field=_raw "(?<userid>^(.*)\n)"
Hi ashabc,
take a look at this answer http://answers.splunk.com/answers/305727/why-is-my-rex-statement-unable-to-extract-the-fiel.html#ans... to learn about pcregextest
And how you can Test regex in Splunk.
cheers, MuS
Hi ashabc,
take a look at this answer http://answers.splunk.com/answers/305727/why-is-my-rex-statement-unable-to-extract-the-fiel.html#ans... to learn about pcregextest
And how you can Test regex in Splunk.
cheers, MuS
Based on the just provided examples you can try this:
| gentimes start=-1 | eval foo="user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora" | rex max_match=0 field=foo "user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" | table userID, property1, property2
or use the internal pcregextest
like this:
$SPLUNK_HOME/bin/splunk cmd pcregextest mregex="user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" test_str="user1
> 101253
> DTZ
> Penrith, Cumberland
> user2
> 2151614
> FCC
> Balnd, Temora"
Original Pattern: 'user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)'
Expanded Pattern: 'user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)'
Regex compiled successfully. Capture group count = 3. Named capturing groups = 3.
SUCCESS - match against: 'user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora'
#### Capturing group data #####
Group | Name | Value
--------------------------------------
1 | userID | 101253
2 | property1 | DTZ
3 | property2 | Penrith, Cumberland
Its kind of work.
What I still don't get it is you used eval foo="data_string". Its OK for 2 sets of sample data. When I have thousands of data in the csv file, how can I tackle that?
He was using gentimes and the eval as a way to test the methodology. If you do the search as
... | rex max_match=0 field=foo "user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" | table userID, property1, property2
It should work. Certainly change the "field=" part of the rex command to fit the sourcetype and field that contains the data.
Can you post an example of the data? Does the data just contain the values or is there something unique to each line that could be useful to key in on for the extraction process.
Here is sample data for 2 users. It basically contains a set of strings and numbers. The userid will be string, followed by some other form of id (property1) in number form, then 2 other properties, both strings and so on.
user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora