About nclancy_splunk

nclancy_splunk · ‎02-25-2020

Apply 8.0.1 verified using the tests above that it no longer blocks as dependency on CMSlave lock has been fixed Solution: Apply maintenance release 8.0.1 Fix is covered in JIRA SPL-177646 dependency on CMSlave lock has been fixed Description: Splunk unable to upload to S3 with Smartstore through Proxy Reported as IDX cluster smartstore enabled with problems during upload to S3 and S2 enabled IDX cluster showing upload and download failure Root cause: Configurations changed in server.conf max_concurrent_uploads/downloads to unusually high values which caused pressure via proxy between indexers and S3. Too many connections to S3 from indexers (via proxy server) and Splunk CORE (CMSlave lock algorithm deficiency) couldn’t handle it. Hence the request to upload/download could not proceed and eventually failing on uploads. A sudden increase in uploads/downloads, has lead to a throttling affect via the proxy from indexer to the S3 storage system and overloaded splunk CORE and this backed up (and timed out) on the indexer side preventing downloads/uploads. Workaround (if can't migrate to 8.0.1 straight away) Setting these back to default values helped - max_concurrent_uploads/downloads.

nclancy_splunk · ‎02-25-2020

Symptoms and tests to confirm The entire cluster becomes unstable with the Cluster Master showing flapping of indexers from up to down. With farm of two layer proxy servers. You will see intermittent HTTP errors uploading to smart store. 10-07-2019 15:13:42.821 +0100 ERROR RetryableClientTransaction - transactionDone(): groupId=(nil) rTxnId=… transactionId=…. success=N HTTP-statusCode=502 HTTP-statusDescription="network error" retries=0 retry=N no_retry_reason="no retry policy" remainingTxns=0 10-07-2019 15:13:42.821 +0100 ERROR CacheManager - action=upload, cache_id="bid|_internal~….|", status=failed, unable to check if receipt exists at path=_internal/db/…/receipt.json(0,-1,), error="network error" 10-07-2019 15:13:42.821 +0100 ERROR CacheManager - action=upload, cache_id="bid|_internal~…|", status=failed, elapsed_ms=15016 Crashlogs with: [build 7651b7244cf2] 2019-10-07 11:17:36 Received fatal signal 6 (Aborted). Cause: Signal sent by PID 2599 running under UID 0. Crashing thread: cachemanagerUploadExecutorWorker-180 Testing: ./splunk cmd splunkd rfs – ls --starts-with volume:XXXXXXX Returns no results because of Connection Timeout with Bad Gateway 502 Testing: wget on aws s3 instance returns bad gateway. To confirm the issue with a repro Step 1. change below parameters values in sever.conf to 200 [cachemanager] max_concurrent_downloads = 200 max_concurrent_uploads = 200 Step 2. Block the connection from peers to S3 using echo "127.0.0.1 s3-us-west-2.amazonaws.com" >> /etc/hosts What was observed - 1. Peers unable to upload the buckets to remote storage(which is obvious) 2. Peers constantly retrying to upload the buckets 3. Peers were marked Down by CM since peers could not heartbeat to the CM as they were constantly busy retrying the upload of buckets with so many threads in parallel, which is causing extra pressure on CMSLave lock. Below is the pstack i collected from one of the indexer - Thread which is holding the CMSlave lock while making a S3 Head request to check if file is present or not on S3: Thread 14 (Thread 0x7f8b04dff700 (LWP 8834)): 0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 1 0x00005639b0d24e27 in EventLoop::run() () 2 0x00005639b0dece00 in TcpOutboundLoop::run() () 3 0x00005639b08928e9 in RetryableClientTransaction::_run_sync(bool) () 4 0x00005639b0930c44 in S3StorageInterface::fileExists(StorageObject const&, Str*, RemoteRetryPolicy*) () 5 0x00005639b04eb4b0 in cachemanager::CacheManagerBackEnd::isRemoteBucketPresent(cachemanager::CacheId const&, Pathname const&, bool, ScopedPointer*) const () 6 0x00005639b04f2bc1 in cachemanager::CacheManagerBackEnd::isBucketStable(cachemanager::CacheId const&, cachemanager::CacheManagerBackEnd::CheckScope, bool, ScopedPointer*) () 7 0x00005639b03435c7 in DatabaseDirectoryManager::isBucketStable(cachemanager::CacheId const&, cachemanager::CacheManagerBackEnd::CheckScope, bool, bool, ScopedPointer*) () 8 0x00005639b0f92f64 in CMSlave::manageReplicatedBucketsTimeoutS2_locked() () 9 0x00005639b0f93c9d in CMSlave::service(bool) () 10 0x00005639b00e09f3 in CallbackRunnerThread::main() () 11 0x00005639b0dedfa9 in Thread::callMain(void*) () 12 0x00007f8b0d9614a4 in start_thread (arg=0x7f8b04dff700) at pthread_create.c:456 13 0x00007f8b0d6a3d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 While other threads such heartbeat thread and other operations are waiting for this lock to be released - Heartbeat thread waiting for the lock- Thread 60 (Thread 0x7f8afa7ff700 (LWP 9053)): 0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 1 0x00007f8b0d963bb5 in GI_pthread_mutex_lock (mutex=0x7f8b0d0818f8) at ../nptl/pthread_mutex_lock.c:80 2 0x00005639b0dedcd9 in PthreadMutexImpl::lock() () 3 0x00005639b0f71f55 in CMSlave::getHbInfo(Str&, Str&, unsigned int&, CMPeerStatus::ManualDetention&, bool&, long&, unsigned long&) () 4 0x00005639b1005b8c in CMHeartbeatThread::when_expired(Interval*) () 5 0x00005639b0df634c in TimeoutHeap::runExpiredTimeouts(MonotonicTime&) () 6 0x00005639b0d24d86 in EventLoop::run() () 7 0x00005639b01225da in CMServiceThread::main() () 8 0x00005639b0dedfa9 in Thread::callMain(void*) () 9 0x00007f8b0d9614a4 in start_thread (arg=0x7f8afa7ff700) at pthread_create.c:456 10 0x00007f8b0d6a3d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 Even searches might be blocked on this lock - Thread 81 (Thread 0x7f8afe1ff700 (LWP 10428)): 0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 1 0x00007f8b0d963bb5 in GI_pthread_mutex_lock (mutex=0x7f8b0d0818f8) at ../nptl/pthread_mutex_lock.c:80 2 0x00005639b0dedcd9 in PthreadMutexImpl::lock() () 3 0x00005639b0f948ec in CMSlave::writeBucketsToSearch(unsigned long, Clustering::SiteId const&, Clustering::SummaryAction, Str&) () 4 0x00005639b13a0822 in DispatchCommand::dumpClusterSlaveBuckets(SearchResultsInfo&) () 5 0x00005639b1429152 in StreamedSearchDataProvider::handleStreamConnectionImpl(HttpCompressingServerTransaction&, SearchResultsInfo*, Str*) () 6 0x00005639b142bbb5 in StreamedSearchDataProvider::handleStreamConnection(HttpCompressingServerTransaction&) () 7 0x00005639b0c38d4d in MHTTPStreamDataProvider::streamBody() () 8 0x00005639b07db115 in ServicesEndpointReplyDataProvider::produceBody() () 9 0x00005639b07d28ff in RawRestHttpHandler::getBody(HttpServerTransaction*) () 10 0x00005639b0d558fb in HttpThreadedCommunicationHandler::communicate(TcpSyncDataBuffer&) () 11 0x00005639b0119e42 in TcpChannelThread::main() () 12 0x00005639b0dedfa9 in Thread::callMain(void*) () 13 0x00007f8b0d9614a4 in start_thread (arg=0x7f8afe1ff700) at pthread_create.c:456 14 0x00007f8b0d6a3d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 This explains why the cluster was super unstable when we had issues uploading the bucket and explains #1 and #3. This dependency on CMSlave lock has already been fixed in 8.0.1 About #2 since customer set max_concurrent_downloads/uploads = 200, there were so many concurrent uploads to S3, via proxy that it locked out and started backing up. At one time, it closed the connection on indexers and upload retries started and timeouts appeared.

nclancy_splunk · ‎08-13-2019

As a splunkworks add on which is in the public domain Splunk support will not support the add on. However please follow these steps to see if there is some help to be offered Step 1 Confirm add on is covered by splunkworks Step 2 Go to the details verify the source is available on github Step 3 If it can be shown conclusively that the issue is a Splunk CORE issue then splunk will endeavour to rectifying the CORE problem should it be proven so by Splunk Support.

nclancy_splunk · ‎08-13-2019

A number of add on's have been placed in the public domain to allow users to own and control their development. How will splunk support handle any queries or problems with these?

nclancy_splunk · ‎04-03-2018

This occurs for Splunk 6.4.4 possibly related to issue SPL-141089 which is fixed in 6.5 and other releases Problem evaluation: Prior to this fix, we didn't have a mechanism of keeping the SAML user attributes such as email and realname on the disk If the auth system were to be bounced, the cache would get cleared, which would result in the loss of the user attributes Another scenario identified during the course of this investigation: in a SHC, if the SH, which was used by the LB to log the user in, goes down, a different SH in the cluster wouldn't have the user attributes, resulting in the same problem. Resolution: Store the user's real name and email id along with the role list under the userToRoleMap_SAML stanza of authentication.conf. The role list, real name and email are all separated by "::" delimiter. If any of these string have "::" as part of them, the same is stripped off before storing in authentication.conf The GET endpoint for admin/SAML-user-role-map is also updated to now return real name and email id along with the real list. This issue has been fixed in the the following releases: 6.5.6+ 6.6.4+ 7.0.0+ http://docs.splunk.com/Documentation/Splunk/6.5.6/ReleaseNotes/6.5.6 SPL-141089, SPL-143593, SPL-142248, SPL-143592 SAML - Users realName and email being dropped from the UI on authentication bounce Support have tested this (on 7.0.2) and when you log in this section is added to authentication.conf to have the name/email mapped: ... [userToRoleMap_SAML] user@someaddress.net = admin::Tester::user@someaddress.net

nclancy_splunk · ‎04-03-2018

Authentication via SAML works and on initial login the users real name and email address are visible under the users profile both as viewed by the user and as viewed by an admin under the "Users" screen in the Authentication Settings. At some (seemingly random) point latter the users real name and email address disappear. The user is still logged in and can continue whatever they were doing but the name displayed at the top of the screen changes from thier real name to thier user id. As far as my understanding of SAML integration goes there is no continuing communication between splunk and the SAML provider after the initial authentication (until the user session times out, which is not happening here since the user continues working fine) If the user actively logs out from splunk then immediately returns their real name and email address is restored in splunk once again. Then at some point will once again disappear.

nclancy_splunk · ‎02-05-2018

From 6.6 onwards you can use ./splunk validate cluster-bundle --check-restart

nclancy_splunk · ‎09-21-2017

My take on the versioning is that it reflects the splunk version when the folder is initially created. This remains unchanged after upgrading. You would like to know if this version is used in any way in splunk as you need to know if it needs to be added to your regression testing as a significant factor when you upgrade your Splunk enterprise. The local.meta is used in two major senses It is used in migration to help the process determine if items need to be changed to remain compatible with new version or It is used to dynamically handle backwards compatibility so that a new version can work with older formats. So if any check is to be made for these local.meta files in your regression suite it is that under no circumstances are these files edited manually.

nclancy_splunk · ‎09-21-2017

For a statistical solution with Splunk we make use of multiple datamodels which have different Splunk version numbers connected though the *.meta files. Documentation is not clear on what the exact purpose of this version number is. \Splunk\common\metadata\default.meta app contents: Field datamodels statistical and user_upload: version = 6.5.0 Field datamodel internal_statistics: version = 6.6.3 Questions: - Why is this field not updated to 6.6.3 with Splunk upgrade? - Are the data models with version 6.6.3 still working for Splunk 6.5.0 still working?

nclancy_splunk · ‎07-23-2017

Changes to the cipher suites between versions of splunk mean that OOTB the two versions of splunk will not have a common cipher to share the documentation advises providing a common cipher the two versions can agree on. SSL/TLS are protocols - NOT ciphers. In particular, TLS is an evolution of SSL. The relevant change is in $SPLUNK_HOM/etc/system/default/server.conf, and is the change to cipherSuite. In 6.4.1 this is set to TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH and in 6.6.1 this is set to ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDH-ECDSA-AES256-GCM-SHA384:ECDH-ECDSA-AES128-GCM-SHA256:ECDH-ECDSA-AES128-SHA256:AES256-GCM-SHA384:AES128-GCM-SHA256:AES128-SHA256 TLSv1+HIGH (and corresponding for TLSv1.2) means all ciphers compatible with TLSv1 of HIGH strength. There is some overlap here with the ciphers compatible with SSL3.0. However, none of the SSL3.0 ciphers appear in the 6.6.1 list. To see this more clearly, take a Linux system with openssl installed (almost any Linux system will do!). Run: openssl ciphers SSLv3+HIGH openssl ciphers TLSv1+HIGH Note that these give you the same results. However, they all end with SHA. In the explicit list you provide in 6.6.1 they all end with SHA, so it's easy to see that there's no overlap with SSLv3+HIGH and the new list in 6.6.1 - leading to the behaviour observed. Any system (such as Splunk 6.1) which only supports TLS1.0 and below (including SSL3) won't be able to communicate with a Splunk 6.6.1 server with default config only suitable for TLS1.2.

nclancy_splunk · ‎02-08-2017

The key point here is that ITSI works entirely off numeric values and with this in mind if your script or routine returns a numeric value it can be used in ITSI's dashboard. So a quick google search turns up the utility sc on windows to query services and their running status sc query without arguments it returns a list of services and details about them including their current state. The state contains a numeric value that you could extract and use in ITSI. In this example a running service is showing 4 and a stopped service is showing 1. You can then assign a threshold for your KPI where above 3 is started / green and below 2 is failed and stopped. Anything in between could be orange where the service is either starting up or stopping. SERVICE_NAME: wuauserv DISPLAY_NAME: Windows Update TYPE : 20 WIN32_SHARE_PROCESS STATE : 4 RUNNING (STOPPABLE, NOT_PAUSABLE, ACCEPTS_PRESHUTDOWN) WIN32_EXIT_CODE : 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0 An example of a stopped process. C:\Users\Administrator>sc query ALG SERVICE_NAME: ALG TYPE : 10 WIN32_OWN_PROCESS STATE : 1 STOPPED WIN32_EXIT_CODE : 1077 (0x435) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0 For processes in windows you have "tasklist" so if the process is present in the list it has a 1 and if not a 0. For linux you can also use the process table to check if it is running [ps -eaf] and most services in linux have a status command so although painful you could run it for each service you need to check. Again you need to select for a numeric criteria and based on this criteria generate a number that can be passed to ITSI. e.g. $SPLUNK_HOME/bin/splunk status

nclancy_splunk · ‎02-08-2017

One requirement is monitoring the Status of Services/processes running in operating systems. Using Splunk ITSI is there a way to do provide the necessary data input to display as an ITSI monitored icon?

nclancy_splunk · ‎01-30-2017

I downvoted this post because wrong one vote for sorry

nclancy_splunk · ‎01-12-2017

This also worked where a PaloAlto firewall encrypted data coming into splunk

nclancy_splunk · ‎10-11-2016

It appears they left some documentation out for the logo. You need to make one more change and restart. The change is required in the web.conf. If you look at the web.conf.spec http://docs.splunk.com/Documentation/Splunk/6.5.0/Admin/Webconf and in particulare the parameter loginCustomLogo. Here you need to set the path to logo/.png or .jpg So for my test my file was called appLogo.png and I placed it in a newly created directory called logo in splunk/etc/apps/search/appserver/static. So I added this line to the etc/system/local/web.conf loginCustomLogo = logo/appLogo.png Restarted and the logo on the login splash screen was replaced with this.

nclancy_splunk · ‎10-11-2016

According to this doc page, https://docs.splunk.com/Documentation/Splunk/6.5.0/AdvancedDev/CustomizeLogin you can add your own logo. However it does not showup.

nclancy_splunk · ‎06-05-2016

This looks like the issue I mentioned earlier you have DBX 1.x and as it is not supported on 6.4 it stops showing modular inputs in the browser once it hits the Database Inputs module uninstall dbx 1.x and it should work.

nclancy_splunk · ‎06-01-2016

This could be an issue with the fact that when you upgraded to splunk 6.4 you still had the dbx app which is not supported in 6.4. In the screenshots I see that items stop being listed after the dbx "Database Inputs" is listed. You should uninstall this old version of dbx and if you still need Database Inputs please migrate to DB Connect 2.

Posts	18
Solutions	3
Karma Given	10
Karma Received	12
Member Since	‎05-26-2016

Online Status	Offline
Date Last Visited	‎07-10-2023 11:37 AM

Using a proxy with S3 storage cause cluster become...

How is a splunkworks add on supported by splunk

SAML Users - Missing info - users real name and em...

What's the purpose of the default.meta application...

How to add to ITSI dashboard the Status of service...

New functionality in splunk 6.5 to add a logo is n...

Re: Using a proxy with S3 storage cause cluster be...

Using a proxy with S3 storage cause cluster become...

Re: How is a splunkworks add on supported by splun...

How is a splunkworks add on supported by splunk

Re: SAML Users - Missing info - users real name an...

SAML Users - Missing info - users real name and em...

Re: Is there a "What If" command for applying the ...

Re: What's the purpose of the default.meta applica...

What's the purpose of the default.meta application...

Re: After upgrading to Splunk 6.6.0, why am I rece...

Re: How to add to ITSI dashboard the Status of ser...

How to add to ITSI dashboard the Status of service...

Re: Splunk Universal Forwader constantly crashes w...

Re: Accessing apps in 6.4.x results in "Error conn...

Re: New functionality in splunk 6.5 to add a logo ...

New functionality in splunk 6.5 to add a logo is n...

Re: Modular input not showing on settings>Data inp...

Re: Modular input not showing on settings>Data inp...