I have a large CSV data file (CDR) that has some 300 fields. Looks something like:
value1,value2,value3,...,value51,"subvalue52.1,subvalue52.2.,...subvalue51.20",value53,...,value300
The gotcha is field52. field51 is properly extracted, but field52 isn't. I'm not worried yet about the subextraction--right now, I just want field52 to be the whole thing inside the quotes.
from transforms.conf:
[my-report-stanza-name]
DELIMS = ","
FIELDS = f1,f2,f3,...,f300 (where f1-300 are LONG NAMES)
Is it because my f1-f300 are LONG?
Do I have the syntax for DELIMS wrong? (like, is that saying the delim char can be any of " OR , OR ' ?)
Once I do get this right, what's the best way to subextract f52?
adTHANKSvance gang!
-tv
It appears from my testing that there is a line length limitation in the "FIELDS =" definition. So, I am now extracting them as short names ("F001","F002",etc) and then doing FIELDALIAS'es on them to have longer names.
Now all fields (including the CSV embedded inside another field in quotes) are properly extracted. I then am sub-extracting the embedded field with another stanza.
To be more precise:
props.conf:
[myBigCSV]
REPORT-foo = BigCSV, SubCSV
transforms.conf
[BigCSV]
DELIMS = ","
FIELDS = "F001","F002","F003"
FIELDALIAS-F001 = F001 AS MyFirstBigFieldName
[SubCSV]
SOURCE_KEY = F003
DELIMS = ","
FIELDS = "F003a","F003b","F003c"
FIELDALIAS-F003c = F003c AS MyThirdSubField
A very elegant and easy to maintain config.
-tv
It appears from my testing that there is a line length limitation in the "FIELDS =" definition. So, I am now extracting them as short names ("F001","F002",etc) and then doing FIELDALIAS'es on them to have longer names.
Now all fields (including the CSV embedded inside another field in quotes) are properly extracted. I then am sub-extracting the embedded field with another stanza.
To be more precise:
props.conf:
[myBigCSV]
REPORT-foo = BigCSV, SubCSV
transforms.conf
[BigCSV]
DELIMS = ","
FIELDS = "F001","F002","F003"
FIELDALIAS-F001 = F001 AS MyFirstBigFieldName
[SubCSV]
SOURCE_KEY = F003
DELIMS = ","
FIELDS = "F003a","F003b","F003c"
FIELDALIAS-F003c = F003c AS MyThirdSubField
A very elegant and easy to maintain config.
-tv
You have DELIMS setup correctly - but how are subfields delimited? Commas?
You will probably have to write a custom field extraction for the big f52, and all of the sub fields too.
That seems like a great data set.
Good luck!