Getting Data In

How to extract data from a CSV file with multiple lines, but the timestamp on a different line??

anasar
New Member

Hi,

My log has a timestamp and a CSV rows. Eg. given 2 records.

Sun Feb 14 07:01:05 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.111,120
def, 1002203,200,0.112,150
ghi, 1002206,500,0.113,160

Sun Feb 14 07:04:55 EST 2016

customer_name,cust_id, response_code, response_time, size
abc, 1002304,200, 0.114,110
def, 1002203,200,0.118,190
ghi, 1002206,500,0.117,130

How do I index them with the timestamp mentioned for all records in the CSV?? pls help.

0 Karma
1 Solution

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

View solution in original post

Lowell
Super Champion

There's no real easy way to do this. The problem here is that your file is a mix between a traditional "log" file (with timestamps existing between events) and a tabular (CSV) data format. This isn't a typical use, and therefore isn't well supported. A few thoughts/ideas listed below.

Options:

  1. Modify the log-producing program to format so that there's one timestamp per row (line).
  2. Pre-process the CSV file so that the timestamp repeats for each row. This would be relatively simple to do in Python, Perl, AWK, or whatever tool you prefer.
  3. Allow Splunk ingest each chunk of tabular data into a single event; then use some search-time magic to split each event into multiple events--one per csv row. I know a wrote a custom search command a few years back that did this exact thing. (I may be able to find it lying around; I never got around to actually releasing it.) But this probably isn't the best answer and may not perform as well, depending on your use case.

There could be some other clever approaches to this as well, but most of them will be either a pain to implement, hard to understand, or have a serious limitations. I'd try options 1 or 2 first.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...