About eregon

eregon

Good morning fellow Splunkthiasts! I have an index with 100k+ events per minute (all of them having the same sourcetype), approximately 100 fields are known in this dataset. Some of these events are duplicit, while others are unique. My aim is to understand the duplication and be able to explain what events exactly get duplicated. I am detecting duplicities using this SPL: index="myindex" sourcetype="mysourcetype" | eventstats count AS duplicates BY _time, _raw Now I need to identify what fields or their combination make the difference, under what circumstances the event is ingested twice. I tried to use predict command, however it is somehow producing new values for "duplicates" field, but it does not disclose the rule by which it makes the decision. In other words, I am not interested in prediction itself, I want to know the predictors. Is something like that possible in SPL?

eregon · ‎02-22-2024

Hello fellow Splunkthusiasts! TL;DR: Is there any way to connect one indexer cluster to two distinct license servers? Our company has two different licenses: one acquired directly by the company (we posses the license file) the other was acquired by a corporate group to which our company belongs, it is provided to us through group's license server (it is actually some larger license split to several pools, one of them being available to us). The obvious solution is to have one IDXC for each license with SHs searching both clusters. However, both licenses together are approximately 100GB/day, therefore building two independent indexer clusters feels like a waste of resources. What is the best way to approach this?

eregon · ‎09-27-2023

Hello fellow Splunkthiasts! I need some insights to understand how comparison functions in mstats could be used. Consider the following query: | mstats latest(cpu_metric.*) as * WHERE index="osnix_metrics" sourcetype=cpu_metric CPU=all BY host | where pctUser > 50 As expected, it returns a list of hosts having latest CPU usage value higher than 50%. However, according to mstats command reference, I can have comparison expression within WHERE clause and I'd expect it would be more efficient to rewrite the above query like this: | mstats latest(cpu_metric.*) as * WHERE index="osnix_metrics" sourcetype=cpu_metric CPU=all pctUser > 50 BY host Unfortunately, this doesn't return any results. I tried to refer to metric before aggregation with no luck: | mstats latest(cpu_metric.*) as * WHERE index="osnix_metrics" sourcetype=cpu_metric CPU=all cpu_metric.pctUser > 50 BY host What am I missing here?

eregon · ‎06-02-2023

@isoutamo , thanks for your reply. Actually, the point of the question is not "how to craft a regex". Rather than that, I am asking: how variable this extra data usually is? is there any recommended format of this extra data that I could stick to while writing my regexes? Is it even a good idea to try to presume any specific format of the added metadata? Maybe it would be better to have more regexes: one for actual log data (where line-start must not be referred to using ^) and another for the extra metadata? As you can see, the question is not application specific. The situation actually may occur even for applications covered by a TA from Spkunkbase. Maybe I could rephrase like this: "how to migrate apps to Docker and its logging driver without breaking existing extractions".

eregon · ‎06-02-2023

Hello fellow Splunthusiasts! I have some applications running on classic VMs, I am happily splunking their logs and everything works fine. Recently we started to deploy the same applications to Docker containers. To collect logs, I use Docker's native Splunk logging driver and receive the data through HEC. The logging driver adds its own stuff to app's log (either prepends a prefix or it wraps the app's log into JSON, the extra information identifies the container instance). Due to this, some of my field extractors stopped working, as the format of the data actually ingested has changed. What are the best practices for writing extractors universally, so one configuration works with all ways of collecting logs? Just a side note: the point is to have extractors in props.conf in an app distributed from DS, therefore my question is about what should be addressed in regular expression itself. Using | rex field=xxx is not an option here.

eregon · ‎04-20-2023

Hi @richgalloway , thanks for fast responses! Actually, I have tried omitting the latest value and Splunk shows me something else. As a run-everywhere example, I run the following SPL: | tstats count where index=_internal The line below the SPL edit box shows: XXX events (4/19/23 8:00:00.000 PM to 4/20/23 8:12:11.000 PM) (which corresponds to time picker being set to "Last 24 hours") Now I change the SPL by adding earliest, omitting latest (leaving time picker untouched): | tstats count where index=_internal earliest=-h@h The status line now shows: XXX events (4/20/23 7:00:00.000 PM to 1/19/38 4:14:07.000 AM) Also, my original (site specific) SPL actually returns the future events. Is it possible this behavior has changed in recent versions of Splunk? (I am on 9.0.4)

eregon · ‎04-20-2023

For some answers all you need to do is ask - then you realize yourself. The answer to my question is: to search for any future events with no upper limit, just omit the latest=<...> time modifier (use only earliest=<...>) in your search.

eregon · ‎04-20-2023

Dear fellow Splunkthusiasts! I have found out one of old scheduled searches in my installation is failing with this error message: Invalid value "+18y@y" for time term 'latest' Looking closer, it turned out the search fails with any value beyond latest=01/19/2038:04:14:07 . I have noticed this value as expiration date for perpetual licenses as well. I understand this is the maximum time that could be represented by four-byte signed integer as a number of seconds since 1970-01-01 00:00:00. My question is: how do I specify - using time modifiers in SPL - that my time range includes future with no upper limit? I don't want to hard-code the above-mentioned time into my search, as that limit may (and surely will) change in the future, not to mention it is not very self-explanatory.

eregon · ‎03-17-2023

I can see two possible issues that might be causing this "No data sent" message. First thing to check: is your telemetry.conf really the file that is in use? Try the following CLI command to check what is the full configuration of running instance (that means the final configuration after compiling all config files in all possible locations together): /opt/splunk/bin/splunk btool --debug telemetry list Check that the actual configuration matches your expectations - if it differs somewhere, you can see from what file the active value for certain option came from. Second thing to check: the list of reports sent is nothing else than some data indexed within Splunk (in _telemetry index, to be specific). If you for any reason don't have the data in this index, you'll see "no data sent" even if it actually went through. Check you are examining the report from the correct instance and that _telemetry index is enabled and has meaningful configuration regarding retention limits.

eregon · ‎06-16-2022

Hi @MKozanic , unfortunately not yet. I got some hints from Splunk expert at .conf, so I'll try and see. However, you mention you have this issue on one of your instances - does that mean you have some instances where | rest works as expected?

eregon · ‎03-29-2022

Good morning fellow Splunkthiasts! I am trying to build some dashboard using Splunk REST, unfortunately I can not get the data from certain endpoints when using | rest SPL command, while CURL approach returns what is expected. To be specific, I want to read /services/search/jobs/<SID>/summary endpoint. Following SPL returns 0 results: | rest /services/search/jobs/1648543133.8/summary When called externally, the endpoint works as expected: [2022-03-29 10:46:25] root@splunk1.lab2.local:~# curl -k -u admin:pass https://localhost:8089/services/search/jobs/1648543133.8/summary --get | head % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 15578 100 15578 0 0 661k 0 --:--:-- --:--:-- --:--:-- 661k <?xml version='1.0' encoding='UTF-8'?> <results preview='0'> <meta> <fieldOrder> <field>_bkt</field> <field>_cd</field> <field>_eventtype_color</field> <field>_indextime</field> <field>_kv</field> <field>_raw</field> The same happens with /services/search/jobs/<SID>/results and /services/search/jobs/<SID>/events. When I call /services/search/jobs/ or /services/search/jobs/<SID>, data is returned by both SPL and CURL. I tried this on several Splunk instances with versions ranging from 8.2.3 back to 7.3.3, always using account with admin role - the behavior is always exactly the same. Any hints what I might be missing?

eregon · ‎06-04-2020

Hello @lloydknight and thank you for the links! I found many others, but these are really close to what I want and I missed them. They don't fully solve my problem, but brought me to an idea what to try next. Let me explain: What I want is almost the same thing as your first link addresses, however I need to transform both of my "streams" (one should be anonymized, the other should be filtered) due to limitations described in my original post. Example: source log is something like this: Boring line 1 Boring line 2 Interesting line with sensitive contents 3 Boring line 4 Interesting line with sensitive contents 5 As a result, "operational" index should contain: Boring line 1 Boring line 2 Interesting line with XXXXX 3 Boring line 4 Interesting line with XXXXX 5 while "sensitive" index should have: Interesting line with sensitive contents 3 Interesting line with sensitive contents 5 The problem is I don't know how to duplicate the data and transform each of the copies in its own way before routing it to one or another index - all of this done on HFW. Considering your links, it is not possible in one step. I'll rethink my options and will post an update soon.

eregon · ‎05-22-2020

Good afternoon fellow splunkthiasts, I need your help with data anonymization. Situation: Application on server with UFW produces a log - most of it is a boring operational stuff, however certain records contain a field considered to be sensitive. Log records are necessary for ordinary Ops admins (who need to see all records, but don't need to see the actual sensitive field value) and privileged troubleshooters, who need to see the sensitive data, too. Architecture: data is produced on a server with UFW, will be stored on indexer cluster and there is one heavy-forwarder available in my deployment. Limitations: 1. Due to limited bandwidth between UFW and Splunk servers, it is preferred not to increase volume of data transferred from UFW (bandwidth between HFW and indexers is fine). 2. Due to time-constrained validity of the sensitive field, delays introduced by search->modify->index again every few minutes are not acceptable. 3. Indexing the sensitive records twice is OK. Indexing whole log twice would be too expensive fun. Proposed solution: UFW will forward the log to heavy-forwarder where it should be duplicated. One copy of the data should be anonymized and forwarded to index "operational", while the other one should be filtered (only records with sensitive field are kept) and then forwarded to index "sensitive". Problem: I know how to route data, how to anonymize data, how to filter data before routing, but I am not sure how to connect the dots in described manner. To be specific, I don't know how to duplicate the data on HFW and make sure each copy is treated differently. Can you help, or possibly propose some better solution?

eregon · ‎08-21-2019

Perfect, @richgalloway ! My mistake was to expect these settings in props.conf. I did not realize inputs.conf is much more logical place for such configuration, that's why I did not find it myself. Thanks for your help!

eregon · ‎08-20-2019

Good evening fellow Splunkthiasts, can anyone explain in detail, how Splunk breaks the events when it finds the end of file with no EOL (or whatever LINE_BREAKER is set to)? Specifically I am concerned whether and how it recognizes the difference between "there is nothing more to read, the last event can be indexed as is" and "the last event has not been completely written yet, indexing should wait a bit more". After some attempts and failures it seems to me that each event data is being read into some kind of buffer and when EOF with no EOL is reached, Splunk waits up to 10 seconds for more data coming in. If there is no further input within this time-limit, event is considered to be complete as it is stored in the buffer and is pushed to further processing. Example: this Bash one-liner writes a single line to the file, but Splunk ingests two events, although there is no newline character: (echo -n $(date '+%Y-%m-%d %H:%M:%S' "This is a log record from very sssssllloooo"; sleep 11s; echo "oooooow source.") > monitoredFile.log Note: I am getting the same behavior when I redirect the one-liner's output to TCP input instead of monitored file. I have two questions then: Is my understanding of the situation correct? If it is, can the 10s delay be adjusted? How? Can it be done for some specific sourcetype or host only? Thanks in advance!

eregon · ‎07-01-2019

The link is not working anymore and I was not able to find that answer by keywords anymore. Does anyone have a working link, by any chance?

eregon · ‎06-24-2019

The closest Splunk feature I could find is SEDCMD option in props.conf and it could possibly solve my trouble, if I am able to read a value in parent-level httpSample tag and then insert it into subsequent lines.

eregon · ‎06-24-2019

Hi @DavidHourani , thanks for your advice! I did read about scripted inputs and unfortunately it is not what I am searching for (as mentioned at the end of my question). In my specific case I see these disadvantages: it introduces lags (data is ingested once in a period) - compared to monitor:// method reacting virtually immediately introduces unnecessary load (executing the script periodically even when no perftests are running/no data is produced) requires additional measures to prevent re-reading the same data: the source simply appends new data to the end of existing log file and my script works in a stream manner - running the current script periodically would read it whole over and over again; I would have to implement some kind of file pointer similar to what monitor:// method already does, or try to tweak SmartMeter's logging behaviour

eregon · ‎06-24-2019

Dear fellow Splunkthusiasts, is there a way to put my own script manipulating the data in between the forwarder and indexer? To be specific: I have XML logs from SmartMeter/jMeter looking like this: <?xml version="1.0" encoding="UTF-8"?> <testResults version="1.2"> <httpSample t="86" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage" rc="200" rm="OK #subresults:3" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="3999" sc="1" ec="0" ng="2" na="2" hn="sm-generator2"> <httpSample t="37" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage-0" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="1578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2"> <responseHeader class="java.lang.String"></responseHeader> <requestHeader class="java.lang.String"></requestHeader> <responseData class="java.lang.String"></responseData> <responseFile class="java.lang.String"></responseFile> <cookies class="java.lang.String"></cookies> <method class="java.lang.String">GET</method> <queryString class="java.lang.String"></queryString> <java.net.URL>https://some.host/path/</java.net.URL> </httpSample> <httpSample t="17" it="0" lt="17" ts="1553000000001" s="true" lb="openLoginPage-1" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="" by="578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2"> <responseHeader class="java.lang.String"></responseHeader> <requestHeader class="java.lang.String"></requestHeader> <responseData class="java.lang.String"></responseData> <responseFile class="java.lang.String"></responseFile> <cookies class="java.lang.String">some_cookie_name=some_cookie_value</cookies> <method class="java.lang.String">GET</method> <queryString class="java.lang.String"></queryString> <java.net.URL>https://some.host/path/</java.net.URL> </httpSample> </httpSample> ... That is way too verbose for my needs, so I wrote a script transforming the XML to the following: httpSession sessionId="123" t="86" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage" rc="200" rm="OK #subresults:3" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="3999" sc="1" ec="0" ng="2" na="2" hn="sm-generator2" httpRequest sessionId="123" t="37" it="0" lt="37" ts="1553000000000" s="true" lb="openLoginPage-0" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="ISO-8859-1" by="1578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2" method="GET" url="https://some.host/path/" httpRequest sessionId="123" t="17" it="0" lt="17" ts="1553000000001" s="true" lb="openLoginPage-1" rc="200" rm="OK" tn="10.20.30.40:1234_TestCases 1-2" dt="text" de="" by="578" sc="1" ec="0" ng="2" na="2" hn="sm-generator2" cookies="some_cookie_name=some_cookie_value" method="GET" url="https://some.host/path/" Please note the output is enriched by sessionId field holding the relationship of session and requests, which can't be simply done by sed. I would like to collect the original log in XML format by universal forwarder, have it processed by my script (possibly on HFW?) and finally index the simplified output. Is something like that possible? Scripted outputs are not exactly what I am looking for as this method would introduce data lags and a need to prevent re-reading the same data (both is solved with monitor:// input method).

eregon · ‎06-05-2019

@jshekell can you provide some details on what you are trying to achieve? What are you missing in your solution?

eregon · ‎06-04-2019

@harsmarvania57 and @skalliger , thank you for your comments, it gives me at least some level of understanding. However now it seems that even upgrading the DB Connect to newer version will be a pain, won't it? Manual only describes the process of migration from version 2 to version 3, but I don't see any instructions to upgrade 3.1.3 (or older) to most recent 3.1.4 without losing the config. Have you ever been into such demand?

eregon · ‎06-04-2019

DavidHourani: hmmm, nice link, but how can anyone trust it is actually a thing? 🙂 You brought me to pretty amazing idea: I'll answer all questions here by something like "No matter what you ask, never trust any answers, only trust yourself!". Victory in Karma contest will be mine in no time! 🙂

eregon · ‎06-04-2019

MD5 (splunk-7.1.5-fd4da3d4caf1-linux-2.6-x86_64.rpm) = e5ce0f7bcd686f5ee315147c16e546a4 Your installer is fine. However, you should not trust anyone just because they are saying this in the discussion, so here is how you can get the correct checksum yourself: Go to Splunk downloads page (https://www.splunk.com/en_us/download/splunk-enterprise.html). Click Older releases link. Find the version you need and click it. You'll get to the page titled "Thank You for Downloading Splunk Enterprise" Cancel the download (unless you want to redownload the installer). There is "Download MD5" link in right column of this page. Click it. You'll be offered to download *.md5 file - it is a plaintext file containing one line similar to the first line of my answer.

eregon · ‎06-04-2019

Good morning fellow splunkthusiasts, does anyone know, why DB Connect app is incompatible with deployment server? I (mistakenly) deployed DB Connect 3.1.4 from my deployment server to the heavy forwarder and it seems it works fine. Just after I did so, I noticed the manual says: DB Connect is incompatible with deployment server. Do not attempt to distribute DB Connect using deployment server. (see https://docs.splunk.com/Documentation/DBX/3.1.4/DeployDBX/Distributeddeployment) What can possibly go wrong?

eregon · ‎04-30-2019

There is no general consensus / best practice on what to use, it depends on what you want to find out. To choose the aggregation properly, you need to understand what it means. Actually, it is just pure maths. Splunk has fill ratio values on per minute basis (or maybe per every few seconds, I am not sure about that), however the graph presents them aggregated. That means several values in Splunk logs (all values in certain time window, that means per 5min, per 1h, per 1d, ...) are aggregated into one single value presented to user in graph. In another words, if you choose to display maximum, you will get the upper limit: you know the queue fill ratio did not exceed this value during the respective timeframe. It could be useful, let's say, to prove your hardware is such an overkill that your queues can never ever get full. To check you have no I/O trouble, average/median/90percentile are much more appropriate.

Posts	30
Solutions	1
Karma Given	10
Karma Received	6
Member Since	‎12-04-2018

Online Status	Offline
Date Last Visited	a month ago

Identify predictor fields

One indexer vs. two license servers

Using comparison function within mstats

What are Best practices for writing field extracto...

How to achieve maximum value for "latest" time mod...

Why is | rest command not returning any data where...

Anonymize data and keep original aside

How does Splunk handle end of file?

Complex data manipulation before indexing

Why is DB Connect incompatible with deployment ser...

Identify predictor fields

One indexer vs. two license servers

Using comparison function within mstats

Re: Best practices for writing field extractors

What are Best practices for writing field extracto...

Re: Maximum value for "latest" time modifier

Re: Maximum value for "latest" time modifier

How to achieve maximum value for "latest" time mod...

Re: Instrumentation License Usage Data

Re: Why is | rest command not returning any data w...

Why is | rest command not returning any data where...

Re: Anonymize data and keep original aside

Anonymize data and keep original aside

Re: How does Splunk handle end of file?

How does Splunk handle end of file?

Re: Concurrency by ....................

Re: Complex data manipulation before indexing

Re: Complex data manipulation before indexing

Complex data manipulation before indexing

Re: trying to list CPU and Mem in a pie chart (bro...

Re: Why is DB Connect incompatible with deployment...

Re: need MD5 sum of splunk enterprise 7.1.5 .rpm

Re: need MD5 sum of splunk enterprise 7.1.5 .rpm

Why is DB Connect incompatible with deployment ser...

Re: Looking at Queue Fill ratios in DMC - which ag...