Getting Data In

How to define a sourcetype based on a TSV file with a long list of fields?

dominiquevocat
SplunkTrust
SplunkTrust

I have datasets in TSV format where there is no header in the file. I tried to use the wizard to import the data, base it on TSV, define the header and set the (long list) of headers. For some reason the custom headers were not accepted. Has someone a sample props.conf for a TSV file with a custom header that works? 😞

Maybe the header is too long, don't know.
Here are the header fields as comma separated list:
accept_language,browser,browser_height,browser_width,c_color,campaign,channel,click_action,click_action_type,click_context,click_context_type,click_sourceid,click_tag,code_ver,color,connection_type,cookies,country,ct_connect_type,curr_factor,curr_rate,currency,cust_hit_time_gmt,cust_visid,daily_visitor,date_time,domain,duplicate_events,duplicate_purchase,duplicated_from,evar1-250,event_list,exclude_hit,first_hit_page_url,first_hit_pagename,first_hit_referrer,first_hit_time_gmt,geo_city,geo_country,geo_dma,geo_region,geo_zip,hier1-5,hier3,hier4,hier5,hit_source,hit_time_gmt,hitid_high,hitid_low,homepage,hourly_visitor,ip,ip2,j_jscript,java_enabled,javascript,language,last_hit_time_gmt,last_purchase_num,last_purchase_time_gmt,mcvisid,mobile* post_mobile*,mobile_id,monthly_visitor,mvvar1-3,namespace,new_visit,os,p_plugins,page_event,page_event_var1,page_event_var2,page_event_var3,page_type,page_url,pagename,paid_search,partner_plugins,persistent_cookie,plugins,post_ page_event,post_ page_type,post_browser_height,post_browser_width,post_campaign,post_channel,post_cookies,post_currency,post_cust_hit_time_gmt,post_cust_visid,post_evar1-75,post_event_list,post_hier1-5,post_java_enabled,post_keywords,post_mvvar1-3,post_page_event_var1,post_page_event_var2,post_page_event_var3,post_page_url,post_pagename,post_pagename_no_url,post_partner_plugins,post_persistent_cookie,post_product_list,post_prop1-75,post_purchaseid,post_referrer,post_search_engine,post_state,post_survey,post_t_time_info,post_tnt,post_tnt_action,post_transactionid,post_visid_high,post_visid_low,post_visid_type,post_zip,prev_page,product_list,product_merchandising,prop1-75,purchaseid,quarterly_visitor,ref_domain,ref_type,referrer,resolution,s_resolution,sampled_hit,search_engine,search_page_num,secondary_hit,service,social*,post_social*,sourceid,state,stats_server,t_time_info,tnt,tnt_action,tnt_post_vista,transactionid,truncated_hit,ua_color,ua_os,ua_pixels,user_agent,user_hash,user_server,userid,username,va_closer_detail,va_closer_id,va_finder_detail,va_finder_id,va_instance_event,va_new_engagement,video*,post_video*,visid_high,visid_low,visid_new,visid_timestamp,visid_type,visit_keywords,visit_num,visit_page_num,visit_referrer,visit_search_engine,visit_start_page_url,visit_start_pagename,visit_start_time_gmt,weekly_visitor,yearly_visitor,zip

0 Karma
1 Solution

dominiquevocat
SplunkTrust
SplunkTrust

there were several things off...

there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

View solution in original post

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

there were several things off...

there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

0 Karma

woodcock
Esteemed Legend

Your configuration files look fine but I would keep only the FIELD_NAMES and INDEXED_EXTRACTIONS = TSV lines (change tsv to TSV) and remove everything else. Then double-check this list:

  • The sourcetype matches mysourcetype exactly (casing, punctuation, etc.).
  • The props.conf and transforms.conf configuration files are deployed to the Indexers or Heavy Forwarders (or Universal Forwarders in some cases, such as INDEXED_EXTRACTIONS = TSV).
  • The inputs.conf configuration file is deployed to the Forwarder.
  • You must restart/bounce all Splunk instances on the servers where you deploy these files.
  • There are no configuration errors during restart (watch the response text during startup on one server of each type).
  • You are verifying proper current function by looking at NEW data (post-deploy/post-bounce), not previously indexed data (which is immutable).
0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

yeah, there were several things off...

  • there are differences in the onboarding and in this case the one in "Data inputs » Files & directories" worked better then the one available from the "Data inputs" dialog.
  • the extra "," in header delimiter stems from trying to get the headers to match (and looking in the wrong place i.e. suspecting an issue with long lists of headers etc) and is unnecessary...
  • the headers as documented online are not matching but rather change from time to time and are delivered in a separate .tsv file (rejoice, rejoice)
  • the preview of the data import failing to reflect the way data is after import (i.e. correct ... )

It seems to work nicely with the settings... how do i close this issue when no reply is quite correct 🙂

0 Karma

woodcock
Esteemed Legend

Answer your own questions and then click "Accept" on it.

0 Karma

woodcock
Esteemed Legend

What is in your configuration files right now?

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust
[mysourcetype]
FIELD_DELIMITER = tab
FIELD_NAMES = accept_language,browser,browser_height,browser_width,c_color,campaign,channel,click_action,click_action_type,click_context,click_context_type,click_sourceid,click_tag,code_ver,color,connection_type,cookies,country,ct_connect_type,curr_factor,curr_rate,currency,cust_hit_time_gmt,cust_visid,daily_visitor,date_time,domain,duplicate_events,duplicate_purchase,duplicated_from,evar1-250,event_list,exclude_hit,first_hit_page_url,first_hit_pagename,first_hit_referrer,first_hit_time_gmt,geo_city,geo_country,geo_dma,geo_region,geo_zip,hier1-5,hier3,hier4,hier5,hit_source,hit_time_gmt,hitid_high,hitid_low,homepage,hourly_visitor,ip,ip2,j_jscript,java_enabled,javascript,language,last_hit_time_gmt,last_purchase_num,last_purchase_time_gmt,mcvisid,mobile* post_mobile*,mobile_id,monthly_visitor,mvvar1-3,namespace,new_visit,os,p_plugins,page_event,page_event_var1,page_event_var2,page_event_var3,page_type,page_url,pagename,paid_search,partner_plugins,persistent_cookie,plugins,post_ page_event,post_ page_type,post_browser_height,post_browser_width,post_campaign,post_channel,post_cookies,post_currency,post_cust_hit_time_gmt,post_cust_visid,post_evar1-75,post_event_list,post_hier1-5,post_java_enabled,post_keywords,post_mvvar1-3,post_page_event_var1,post_page_event_var2,post_page_event_var3,post_page_url,post_pagename,post_pagename_no_url,post_partner_plugins,post_persistent_cookie,post_product_list,post_prop1-75,post_purchaseid,post_referrer,post_search_engine,post_state,post_survey,post_t_time_info,post_tnt,post_tnt_action,post_transactionid,post_visid_high,post_visid_low,post_visid_type,post_zip,prev_page,product_list,product_merchandising,prop1-75,purchaseid,quarterly_visitor,ref_domain,ref_type,referrer,resolution,s_resolution,sampled_hit,search_engine,search_page_num,secondary_hit,service,social*,post_social*,sourceid,state,stats_server,t_time_info,tnt,tnt_action,tnt_post_vista,transactionid,truncated_hit,ua_color,ua_os,ua_pixels,user_agent,user_hash,user_server,userid,username,va_closer_detail,va_closer_id,va_finder_detail,va_finder_id,va_instance_event,va_new_engagement,video*,post_video*,visid_high,visid_low,visid_new,visid_timestamp,visid_type,visit_keywords,visit_num,visit_page_num,visit_referrer,visit_search_engine,visit_start_page_url,visit_start_pagename,visit_start_time_gmt,weekly_visitor,yearly_visitor,zip
HEADER_FIELD_DELIMITER = tab
INDEXED_EXTRACTIONS = tsv
disabled = false
0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

hm, the preview and guidance when using the older import wizard seems to fail me, the indexed data looks fine 😕

0 Karma

MuS
Legend

Why do you set HEADER_FIELD_DELIMITER if there is no header in your file?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...