Getting Data In

Exporting SSNs to a CSV file trims leading zeros. How do we preserve them?

ddrillic
Ultra Champion

When we export Social Security Numbers to a csv file, the leading zeros of the SSN are being trimmed. We wonder how we can preserve these leading zeros.

Any ideas?

Tags (3)
0 Karma
1 Solution

sundareshr
Legend

Its probably because csv converts it to numbers. Try changing the format in splunk to string. Something like this
.... rex mode=sed field=ssn "s/(\d{3})(\d{2})(\d{4})/\1-\2-\3/g"

View solution in original post

0 Karma

jwiedemann_splu
Splunk Employee
Splunk Employee

In the testing I've done, the limitation is mostly (if not exclusively) with in Excel. Splunk is correctly preserving the leading zeros in the CSV that is generated... for example 041234567. However, through testing, it can be seen that Splunk is not/cannot (even by force using a tostring()) output those values to CSV with quotes... for example "041234567". However, the reason I say that this is an Excel limitation is because whether your CSV data is or is not wrapped with quotes... 041234567 vs "041234567" Excel will convert both these to numeric and trim the leading zero. I'm fairly certain that this behavior will be consistent regardless of the source system that generates the CSV, Splunk or anything else.

With all that being said, presuming the ultimate goal is to get the CSV to not trim the leading zeros, you have only a few options:
1. Force the data in Splunk to look like an SSN/String by reinserting the dashes as proposed in the SED command above
2. Use the text import wizard of Excel and open the CSV that way. This gives you the option to forcibly specify a column as text in which case the leading zeros will display
3. Apply column formatting in excel to the column to re-display the leading zeros. Simplest option is to highlight the column -> format cells -> special -> social security number

Admittedly option #3 is partially flawed in that it will display 041234567 and 41234567 in exactly the same format (041-23-4567) which is arguably bad or wrong as you can't easily tell that the second value is actually a bogus SSN because it only has 8 digits... not 9.

If anyone else can find a better solution, or think's I've missed or misstated something above, I'm all ears for a better option.

0 Karma

ddrillic
Ultra Champion

jwiedemann, very kind of you to explain the issue so clearly.

The bigger question is about data typing especially with Hunk, where strong typing exists via Hive. I opened a distinct thread for it at data typing and hunk .

Any feedback would be appreciated.

0 Karma

sundareshr
Legend

Its probably because csv converts it to numbers. Try changing the format in splunk to string. Something like this
.... rex mode=sed field=ssn "s/(\d{3})(\d{2})(\d{4})/\1-\2-\3/g"

0 Karma

sundareshr
Legend

@ddrillic Assuming this worked for you, I converted it to an answer so you can accept it.

0 Karma

ddrillic
Ultra Champion

Great. I really wondered about it... so, we can convert a comment to an answer, right?

0 Karma

ddrillic
Ultra Champion

I wonder also whether we can add quotes and "force" the number to be a string in the csv file. Any ideas?

0 Karma

sundareshr
Legend

You could try a | eval ssn=tostring(ssn) see if CSV sees that as string. I might. Then you dont have to deal with formatting.

0 Karma

ddrillic
Ultra Champion

Interesting - the | eval ssn=tostring(ssn) doesn't have any effect on the data type. The UI still shows the ssn as numerical values.

When doing | eval ssn="\"" + tostring(ssn) + "\"", I get quotes around the ssn in the csv file.

0 Karma

ddrillic
Ultra Champion

Interesting -

I ran -

basesearch | eval rex mode=sed field=ces_ssn "s/(\d{5})(\d{2})(\d{4})/\1-\2-\3/g"

It says -

Error in 'eval' command: The expression is malformed. 
0 Karma

sundareshr
Legend

There should not be an eval before the rex

0 Karma

ddrillic
Ultra Champion

Gorgeous - thank you!!!

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...