Getting Data In

Index based on Raw Data?

Cuyose
Builder

I run a python script to get data into an indexer from mdb files, this basically creates events with source, host, sourcetype and raw data. We are almost always concerned with only reporting on the raw data. millions of rows are generated in csv format and I have created custom fields within the raw data with splunk having no problem identifying those.

The issue is doing a search to find 2 distinct values in those 12 million + rows takes forever, it parses all 12 million rows before returning the values.

Tags (2)
0 Karma

Cuyose
Builder

index = perfdata | dedup LR_Run_Name

Where LR_Run_Name is in the raw data and we extracted the field value. I checked the fields.conf and there are no indexed values in there on these fields we extracted, they all look like this

Out of millions of rows there are only a handful of unique values in the indexed raw data.

[sourcetype]
INDEXED = True
INDEXED_VALUE = False

Would I add something like?
[LR_Run_Name]
INDEXED = True
INDEXED_VALUE = False

0 Karma

sideview
SplunkTrust
SplunkTrust

Can you paste the exact search you're using?

In a nutshell, if Splunk is having to read all the data off disk, then the most likely reason is that your searchterms are either not in the initial search clause... ie you're doing something like

`sourcetype=foo | <some other command(s)> | search <searchterms>`

Then there are a lot of other strange possibilities, like, to take a random example, you could have a foo="bar" term, and you could have it in the initial search clause, but then for some reason something could have configured INDEXED_VALUE in fields.conf to be false for that field.

In any event, without seeing the search it's hard to speculate on the answer, but there most likely is an answer, and it's probably fixable.

0 Karma

Supriya
Path Finder

Could you please help me out how to search multiple words from raw data

0 Karma

Simeon
Splunk Employee
Splunk Employee

What is the keyword you are searching and exact query? How often does it exist in the raw data?

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...