Solved: How to minimise Missed Slot Leader checks / missed blocks with stake pool ADA Cardano node

If this information helps you consider staking some ADA to [BLKBX] Cardano Stake Pool

Assumptions:

This guide was built for earlier versions of stake pool nodes and is likely now out of date
This guide assumes you are using a linux based server and a cardano node built using the coincashew guide; however, you can easily adapt it if you have installed using other methods

Garbage Collection

Are you getting a high number of missed blocks / missed slot leader checks in gLiveView or on your Grafana dashboard? Not to worry so are most people. There’s a couple of things you can do to reduce this.

Assuming your node is healthy and correctly configured the usual cause for the missed slot leader check is Garbage Collection.

Depending on how you have installed your node the defaults can vary but generally garbage collection is set to copying-gc which stops all processes during the garbage collection. With 1.29.0 you will see typical GCs around 2.5GB and if you have monitoring such as Grafana you can typically see that the missed slot leader checks occur at the same time as the Garbage Collection event.

To resolve this you can change the Garbage Collection to use Nonmoving-GC which uses the newer concurrent mark-and-sweep garbage collector, and it doesn’t block other processes.

To change this you will need to modify your .bashrc script

you can typically get to this file with the following command; however, depending on your install yours may be in a different location

sudo nano ~/.bashrc

add the following line to your .bashrc file (credit orpheus-ant)

export GHCRTS='-N -T -I0 -A16m --disable-delayed-os-memory-return --nonmoving-gc'

Then reload bash with

source ~/.bashrc

Then restart your cardano node – depending on how you have installed the node your restart commands may vary

sudo systemctl restart cardano-node

What the GC options do:

–disable-delayed-os-memory-return

Optional, results in more accurate resident memory usage

–nonmoving-gc

Enable the concurrent mark-and-sweep garbage collector

-N

(details from Haskell)

When set to a non-zero value, this option divides the allocation area (-A value) into chunks of the specified size. During execution, when a processor exhausts its current chunk, it is given another chunk from the pool until the pool is exhausted, at which point a collection is triggered.

This option is only useful when running in parallel (-N2 or greater). It allows the processor cores to make better use of the available allocation area, even when cores are allocating at different rates. Without -n, each core gets a fixed-size allocation area specified by the -A, and the first core to exhaust its allocation area triggers a GC across all the cores. This can result in a collection happening when the allocation areas of some cores are only partially full, so the purpose of the -n is to allow cores that are allocating faster to get more of the allocation area. This means less frequent GC, leading a lower GC overhead for the same heap size.

This is particularly useful in conjunction with larger -A values, for example -A64m -n4m is a useful combination on larger core counts (8+).

-T

Collects GC stats – you can query the data using the GHC.Stats module

-I0

Disables idle Garbage collection

-A16m

Set the allocation size to 16MB

Ensure your node is running in multicore mode

Again this depends on how you have compiled and installed Haskell on your node. So long as you have compiled in threaded mode then you can pass in the additional options when starting your node.

Edit the script you use to start the cardano-node, if using coincashew guide then edit the following file, otherwise edit the script you use to start cardano-node

sudo nano $NODE_HOME startBlockProducingNode.sh

edit the part where you have “cardano-node run” and change it to “cardano-node run +RTS -N -RTS”

-N will use all available cores. If you want to specifically specify the number of cores you can use -N4 (for example use 4 cores). If you plan to ever resize your VM or change your processor it’s probably safer to use -N

Save the script and restart your node

sudo systemctl restart cardano-node

Turn off TraceMempool

If you are still having issues you can consider disabling TraceMempool which can consume additional CPU and cause issues on some installations. Typically if you are not dancing on the edge of CPU and Memory minimums you shouldn’t have a problem.

Edit mainnet-config.json and change the following line from

"TraceMempool": true,

"TraceMempool": false,

then restart your node

sudo systemctl restart cardano-node

One comment

Andrej Podzimek

December 22, 2021 / 9:24 AM Reply

–disable-delayed-os-memory-return looks like a bad idea; it can only make things (slightly) slower, not faster, i.e. it goes against the missed slot leader checks elimination, in a sense.

Garbage Collection

Ensure your node is running in multicore mode

Turn off TraceMempool

Share:

Related Posts

Add Rentberry (BERRY) to Metamask

One comment

Leave a ReplyCancel reply