About Glasses2

Glasses2

The actual steps are: 1 find the corrupted bucket location with the dbinspect query 2 enable maintenance-mode on the IDXCM 3 take the indexer offline (where you want to repair the bucket) 4 run the fsck repair command on the stopped indexer 5 start the indexer when finished 6 disable maintenance-mode on the IDXCM 7 let the IDXCluster heal 8 repeat steps for the next bucket Large buckets 10G take about 25min to repair Goodluck

Glasses2

The other day a few alerts surfaced showing I had 6 large windows data buckets stuck "Fixup Task - In Progress". I ran a query | dbinspect index=windows corruptonly=true | search bucketId IN (windows~nnnn~guid,...) | fields bucketId, path, splunk_server, corruptReason, state and found all the primary db_<buckets> from the alerts were corrupt. You can also see it on the IDXCM bucket status. I tried a few fsck repairs commands on the indexers where the primary buckets resided, but it failed due to error >>> failReason=No bloomfilter then I tried >>> ./splunk fsck repair --one-bucket --bucket-path=/<path> --index-name=<indexName> --debug --v --backfill-never After that it cleared and splunkd.log showed >>> Successfully released lock for bucket with path... I hope this information helps.

Glasses2 · ‎04-02-2024

@siemless This may be best to discuss in the Slack Users but I could not find you so I will respond here. In some cases, I have boxes where the /opt/splunk dir is mounted to a separate drive w/ mount point /opt/splunk. In that case you can just swap the disk, I learned this method from AWS support. But that takes preplanning. In some cases, I have boxes that are jacked up, either volume issues across multiple disks or just not setup to swap disks. In that case you can use the Splunk docs >>> https://docs.splunk.com/Documentation/Splunk/9.2.1/Installation/MigrateaSplunkinstance I argued with Splunk about the documentation steps but they claim the steps are correct, although I still believe confusing. FWIW this is what I did... 1 >Create a new host with new OS (in my case I rename /re-IP to the original afterward). 2 > Install the same version of Splunk on new host (I used a .tar), set systemd, set same admin pwd, then stop Splunkd, maybe test a restart and reboot, to verify. 3 > Stop Splunkd on old host, tar up /opt/splunk, copy over the old.tar to new box, untar over the new install, then start Splunkd. That worked for me, and going fwd all new hosts will be configured for the disk-swappable process. Good luck

Glasses2 · ‎03-01-2024

In some cases, I have Splunk installed on server builds where /opt/splunk is a mount point and separate disk/LVM. That is /opt/splunk dir is on a different disk than the OS. Would anyone know if there are any problems, detaching the LVM from a RHEL 7 box and then attaching the LVM to a RHEL 9 box? Thank you

Glasses2 · ‎03-01-2024

Ok thank you for the clarity... I think someone should revise those steps then, its ambiguous.

Glasses2 · ‎03-01-2024

Thank you for the reply. RE: "swap" method, yeah I thought about that and also share your apprehension. RE: " deploy new component", yeah I agree with that method for idxc peers and shc members... But check this out... please LMK what you think Per Splunk docs >>> docs.splunk.com/Documentation/Splunk/9.2.0/Installation/MigrateaSplunkinstance "Migrate a Splunk Enterprise instance from one physical machine to another" "When to migrate" "Your Splunk Enterprise installation is on an operating system that either your organization or Splunk no longer supports, and you want to move it to an operating system that does have support." "How to migrate" The Steps say >>> Stop Splunk Enterprise services on the host from which you want to migrate. Copy the entire contents of the $SPLUNK_HOME directory from the old host to the new host. Copying this directory also copies the mongo subdirectory. Install Splunk Enterprise on the new host. The way I read this is... 1) Stop Splunk on the old box 2) tar up the /opt/splunk on the old box e.g. > tar -cjvf $(hostname)_splunk-og.tar.bz2 --exclude=./var/* --exclude=./$(hostname)*bz2 ./ 3) move and untar the .bz2 file on the new box in /opt e.g. > tar -xjvf <hostname>_splunk-og.tar.bz2 -C /opt/splunk 4 ) install a clean copy (downloaded from Splunk) of the same version of Splunk on top of the old copies Apparently someone that documented this believes this is the way to go... What do you think? RE: my --exclude=./var/* that is for boxes that don't contain indexed data RE: my --exclude=./$(hostname)*bz2./ this is because I am running the tar from /opt/splunk dir Thank you

Glasses2 · ‎02-29-2024

I have a distributed deployment at version 9.0.4.1 Everything in running on RHEL 7 and the system/server team does not want to do in place upgrades to RHEL 9. I have been tasked to migrate each node to a new replacement server (which will be renamed / IP- addressed to match the existing). From what I have read this is possible, but I have a few questions. Lets consider I start with standalone nodes, like a SHC-deployer, Monitoring Console, License Manager... These are the general steps I have gathered 1 Install Splunk (same version) on the new server 2 Stop Splunk on the old server 3 Copy old configs to new server ?? <<< which configs? is there a check list documented somewhere 4 Start new Splunk server and verify I could go thru each directory copying configs, but any advice to expedite this step is appreciated. Thank you

Glasses2 · ‎11-29-2023

The numbers are not exact, from the DS Forwarder Management > 1275, dc(h) from metrics > 1287, and the total stats count from the final query > 1166 so its not accurate. I will need to create a lookup of UFs. Thank you for your support.

Glasses2 · ‎11-28-2023

index=_internal source=*license_usage.log earliest=-1d@d latest=now [search index=_internal source=*metrics.log fwdType=uf earliest=-1d@d latest=now | rename hostname as h | fields h] | stats sum(b) as total_usage_bytes by h | eval total_usage_gb = round(total_usage_bytes/1024/1024/1024, 2) | fields - total_usage_bytes | addcoltotals label="Total" labelfield="h" total_usage_gb I think this is what I wanted, unless someone thinks its inaccurate? Please advise. TY

Glasses2 · ‎11-28-2023

Thank you for the reply. I also looked at this log but it requires curating an exact list of the UFs, bc I have some pollution, e.g. h= HFs, SC4S, etc. The license_usage log may be the best route if I can put together a lookup of just UFs.

Glasses2 · ‎11-28-2023

Thank you for your reply, do you have a method of querying to get an answer for my question? I am not finding the key logs containing UF data thruput or ingest information.

Glasses2 · ‎11-28-2023

Hi I am working on a query to determine the hourly (or daily) totals of all indexed data (in GBs) coming from UFs. In our deployment, UFs send directly to the Indexer Cluster. The issue I am having w/ the following query, is that the volume is not realistic, and I am probably misunderstanding the _internal metrics log. Perhaps the kb field is not the correct field to sum as data thruput? index=_internal source=*metrics.log group=tcpin_connections fwdType=uf | eval GB = kb/(1024*1024) | stats sum(GB) as GB Any advice appreciated. Thank you

Glasses2 · ‎07-17-2023

yes thx I accepted that over a year ago

Glasses2 · ‎05-11-2023

Hi Does anyone know how to fix the Proofpoint TAP WebUI for inputs and configuration? When I launched a clean install of the app on a clean install of 9.0.4.1, the inputs and configuration pages are blank. However, when I edit the url to /search it loads properly. Anyone know how to fix this? Thank you

Glasses2 · ‎04-27-2023

Hi, I have not received any response from Cisco directly on this topic so I thought I would try here. I am cleaning up a messy syslog pipeline containing all sorts of devices, including Cisco. I want to throw everything Cisco in 1 index. But I am not sure Cisco syslog formats are same across all iOS devices: switches, routers, etc. I would assume it would be or very compatible, syslog/CEF... Can anyone confirm or speak to this question? Ideally I will move to using SC4S but in the meantime I want to cleanup the existing and use available TAs to parse/format the data. Any advice appreciated. Thank you

Glasses2 · ‎04-10-2023

Curious, is TZ=GB for UK valid or did I misread something?

Glasses2 · ‎04-10-2023

Hi, I am forced to set individual TZ for individual hosts in a SeverClass because the hosts' OS time is not standardized. I have noticed TZ = US/Eastern, TZ = US/Central, and TZ = US/Pacific, all account for Daylight Savings Time automatically. However, I have servers in the following Time Zones and I am hoping someone can confirm what TZ settings I should use to automatically adjust for DST. AUS/Eastern <<< using TZ=Australia/Sydney AWST <<< using TZ=Australia/West Etc/GMT+12 <<<< cannot find alternate GB (for UK BST) <<<< using TZ=GB (for UK locations w/ BST) HKT <<<< cannot find alternate Hopefully that is correct... I was given these by the host admin. Please refer me to doc, as I don't find these TZs in Splunk docs, other than a ref to wikipedia. Thank you

Glasses2 · ‎03-08-2023

POA 1) disable KVstore on the indexer instances only 2) update KVstore all other Splunk instances with a "ready" enabled KVstore to be safe. **In my case, all instances are default enabled ** Hopefully everything goes well! I will post if it goes sideways.

Glasses2 · ‎03-08-2023

Well, in line with your recommendations... I am being advised by Internal Splunk Personnel, to upgrade everything except indexers... My feeling is to disable the kvstore on the indexers but not the other nodes like IDXcluster master etc... Would you agree? (you can conclude that I value your opinion with equal weight as support 😉, maybe more)

Glasses2 · ‎03-08-2023

Thanks for the clarification (and persevering thru my questions). I really do appreciate it! My concerns, re: IDXCluster and KVstore, are due to all the non-sensical configurations the previous admins left for me.... It would not surprise me to see something unusual here, as I have seen some really goofy stuff. For instance , I am concerned they did some sort of remote kvstore replication/sync to the indexer peers etc... maybe that doesn't matter, not 100% about that. Would you advise that I "disable" the KVstore on the indexers? for safety? per Splunk docs>>> https://docs.splunk.com/Documentation/Splunk/8.1.3/Admin/AboutKVstore RE: "upgrade only Search Heads and Heavy Forwarders" , I see collections (probably default) on my MC/LM instance and Deployment Server instance as well. So I am thinking upgrade those as well? IDK. when I run this on my MC/LM instance >>> | rest splunk_server="local" "/services/search/distributed/peers/" | search disabled=0 | rename peerName AS splunk_server | fields host splunk_server | join type=left splunk_server [| rest splunk_server="*" "/services/kvstore/status" | table splunk_server current.status current.replicationStatus current.storageEngine ] | sort - current.storageEngine I see every instance (including indexers) has a default enabled kvstore in "ready"... So now, knowing this information, does that change your advice? RE: support, IDK, I do feel support should be able to answer these questions. I get support is "break/fix" but do I really have to break it first to get help or a couple definitive answers? I am sure Support will kick it to our Sales Rep and they will say we need to engage PS for 2 week$ 🤣... Thank you!

Glasses2 · ‎03-07-2023

Thank you for the response. You answered me before in another post >> https://community.splunk.com/t5/Deployment-Architecture/In-a-distributed-and-clustered-environ-which-kvstores-need/td-p/633203 upgrade all nodes. So I presume there is no harm upgrading a kvstore engine and version on an instance w/ or w/out any active collections? I am scrambled and support won't even give me a definite answer.

Glasses2 · ‎03-07-2023

In order to upgrade Splunk from 8.1.3 to 9.0.4, I need to migrate/upgrade the KVstore engine from MMAPv1 to WiredTiger and then upgrade the KVstore (mongoDB) version to 4.2. I believe that I only need to upgrade the KVstore on instances with active KVstore collections. And I should stop scheduled searches writing to collections or wait until they are finished before running the migration. By default the KVstore is enabled on all instances which is a bit confusing, BUT the monitoring console (MC) KVstore dashboard gives me some indication that only the instances with active kvstore collections are the SHC and some HFs... but not my MC/license master or the Deployment Server or the Index Cluster master. Splunk docs mention that you can disable kvstore and ignore certain collections. My question is how to verify and identify the active collections on my instances? Where is the kvstore files location? (apparently the directory seems to be /opt/splunk/var/lib/splunk/kvstore/mongo) but the files are not human-readable clear... Is there a query to see the individual KVstore collection file sizes with human-readable names? This gives me I think most of them... not sure | rest /servicesNS/-/-/data/transforms/lookups splunk_server=local | table eai:acl.app title collection | search collection!="" Any advice appreciated. Thank you

Glasses2 · ‎03-06-2023

I think, better safe than sorry, looks like some apps use KVstores...

Glasses2 · ‎03-03-2023

I am about to upgrade 8.1.3 distributed / clustered environment to 9.0.4. Per Docs> Migrate your App Key Value Store storage engine from the Memory Mapped (MMAP) storage engine to the Wired Tiger storage engine, and update your MongoDB version from 3.6 to 4.2. These updates are required in Splunk Enterprise 9.0. See Migrate the KV store storage engine in the Admin manual to plan your migration. Back up your App Key Value Store (KV Store) databases prior to starting an upgrade. If you run version 7.1 and lower of Splunk Enterprise, you must stop Splunk Enterprise instances first. I am going to do this part prior to the Upgrade. I inherited this deployment so IDK where all the KVstore(s) are located or which nodes have a default KVstore needing the same migration/upgrade (e.g. MC or DS). My Environment> SHC(4), SHC-deployer, IDXCM, IDXC(10), MC/LM, DS, HFs, UFs Any advice appreciated. If someone can share knowledge such as, how to validate KVstore locations and which KVstore(s) need update... that would help. Thank you Thank you

Glasses2 · ‎02-27-2023

Well I understand your point about "this"... but that's the problem, I couldn't find an error with the skipped searches... unless I am missing something. Since I did the rolling restart (reset) there are no more skipped searches. Previously I looked for the longest running searches and none were over-running their schedules, that I could see. For example one search took an hour approx., but it ran every 4 hours. Since I did some optimizing there were only 3 scheduled searches that produced the warning which I identified with index="_internal" sourcetype="scheduler" | eval scheduled=strftime(scheduled_time, "%Y-%m-%d %H:%M:%S") | stats values(scheduled) as scheduled values(savedsearch_name) as search_name values(status) as status values(reason) as reason values(run_time) as run_time values(dm_node) as dm_node values(sid) as sid by _time,savedsearch_name | sort -scheduled | table scheduled, search_name, status, reason, run_time When I looked back at those 3 specific searches, they were not over-running the schedules, so I was wondering how it got stuck thinking it was "piling up" vs "still running". I am trying to understand/investigate, if a search is "skipped" then when the shc scheduler retries that previously skipped search at its next runtime, "how can I see that the shc CPT thinks its still running"? And looking back at the "skipped" events, they don't contain "run_time"... so I look back historically to find a day with a high value. But when the searches were running they took max 4 seconds with avg of 2 seconds to complete, which is why I thought the scheduled searches were piling up. Hope that makes sense. The only other variable I can think of is that these searches are using the "| dbxquery" cmd from Splunk DB Connect app. So did it the SHC just get stuck? Any further thoughts appreciated. TY

Posts	63
Solutions	8
Karma Given	22
Karma Received	0
Member Since	‎06-22-2022

Online Status	Offline
Date Last Visited	Friday

How to resolve index buckets stuck in "Fixup Tasks...

PART2 How to migrate a distributed, clustered Splu...

How to migrate a distributed, clustered Splunk (9....

How can I determine the total amount of data recei...

TA Proofpoint TAP v1.3.150 WebUI not displaying co...

How to onboard Cisco appliances syslog data?

Need guidance with props Time Zone settings

How to identify active KVstore collections on all ...

In a distributed and clustered environ, which kvst...

How to do a RCA on "The maximum number of concurre...

Re: How to resolve index buckets stuck in "Fixup T...

How to resolve index buckets stuck in "Fixup Tasks...

Re: How to migrate a distributed, clustered Splunk...

PART2 How to migrate a distributed, clustered Splu...

Re: How to migrate a distributed, clustered Splunk...

Re: How to migrate a distributed, clustered Splunk...

How to migrate a distributed, clustered Splunk (9....

Re: How can I determine the total amount of data r...

Re: How can I determine the total amount of data r...

Re: How can I determine the total amount of data r...

Re: How can I determine the total amount of data r...

How can I determine the total amount of data recei...

Re: How do I change the default index name in the ...

TA Proofpoint TAP v1.3.150 WebUI not displaying co...

How to onboard Cisco appliances syslog data?

Re: Need guidance with props Time Zone settings

Need guidance with props Time Zone settings

Re: How to identify active KVstore collections on ...

Re: How to identify active KVstore collections on ...

Re: How to identify active KVstore collections on ...

Re: How to identify active KVstore collections on ...

How to identify active KVstore collections on all ...

Re: In a distributed and clustered environ, which ...

In a distributed and clustered environ, which kvst...

Re: How to do a RCA on "The maximum number of conc...