GRANDPA Equivocation and sysinfo Process Collection Results In Slashing on Kusama Network: a Post-Mortem.

Multiple bugs in code resulted in nodes dropping out from Kusama network and losing the database that stores which blocks they validated. Consequently, the same nodes double-signed those blocks on restart. The slashes caused by this issue have been reverted via Kusama Council motions.

By PolkadotAugust 18, 2020

Multiple bugs in code resulted in nodes dropping out from Kusama network and losing the database that stores which blocks they validated. Consequently, the same nodes double-signed those blocks on restart. The slashes caused by this issue have been reverted via Kusama Council motions.

On Friday July 31, two Kusama validators on runtime version v2019 started crashing every few minutes giving two distinctive errors, reporting an issue. At a first glance, the problem seemed to be related to the validators' keys. It was subsequently found that this was not the cause, as the validators affected confirmed they did not change keys in the process. Additionally, the issue seemed to be present solely on Kusama network, not on Polkadot.

Going a bit further down the rabbit hole, the team realised that the issue seemed to have started as a result of a GRANDPA equivocation causing a slash event in Kusama, originally triggered by a file descriptor leak that caused nodes to crash. This leak prevented nodes from writing the GRANDPA voter state (the votes at a given round) to disk and caused the nodes that lost this data to vote again after restarting, this time voting for a block newer than their original choice. This led to an equivocation.

The combination of these two events resulting in validators being slashed started at some point after v0.8.15 (v2015 in Kusama) was released and the network was upgraded. The Authority Discovery feature had already been in place for some time on the runtime module level but not enabled by default on the client, and this version also enabled GRANDPA to report equivocations on unsigned extrinsics.

With this information in hand, the team's main hypothesis was that equivocations caused by the file descriptors leak could actually have started happening a while ago but were only reported after the v0.8.15 upgrade back in July: by running this version of the network, nodes started reporting themselves after crashing and this attracted the attention of the teams involved. Still, investigation into the logs of nodes run by Parity did not find any previous equivocation (they would be logged to the terminal).

Further investigation into the root causes of the file descriptor leak pointed at two main culprits: authority discovery and metrics collection. Authority discovery was using an excessive amount of sockets to query data from the DHT (i.e. discovering other authorities IP addresses). For system metrics collection (e.g. CPU and memory) we were relying on the sysinfo crate which was keeping a cache of file descriptors over all processes in the system and threads for each process (it's fetching the data by reading from /proc).

The short-term solution was to disable the Authority Discovery feature by default and also to stop collecting system metrics. The Authority Discovery module will be re-enabled again in a future release once there is a proper fix for the excessive use of sockets.

Until a new version was available the Parity team recommended manually disabling Authority Discovery. Additionally, in any case of the node crashing, validators were advised to introduce a delay before restarting it (1-2 minutes). This reduces the likelihood of the node equivocating in GRANDPA if its votes were not persisted to disk.

After some discussions and developments, Polkadot v0.8.22 was released, including the short-term fixes detailed above. All validators should upgrade their version and monitor for results. All slashes caused by this bug were reverted by the Kusama Council - and in this spirit, a new discussion was opened regarding the reversion of economic loss but not the nomination loss by validators.

To keep up with developments, there are plenty of ways to get plugged in to the Kusama community. Join the discussion on the Direction Channel. Learn more about Kusama on our website and in the Kusama Wiki. Want to join the core growth team behind Kusama? Join the Ambassador Program.

From the blog

Unifying Polkadot’s developer docs: A new chapter for builder experience

The PaperMoon team, supported by a Decentralized Futures grant, launches the Polkadot Developer Documentation Hub, a unified platform that simplifies developer onboarding and provides access to essential resources. This marks a transformative step for builders in the Polkadot ecosystem.

Building the future of decentralized marketing: A case study

Polkadot’s decentralized marketing sets the standard for Web3 growth. With OpenGov, bounties, and community tools, contributors are empowered to innovate, ensuring consistent, impactful campaigns and showcasing the potential of decentralized ecosystems.

Uplifting new voices and validators: Polkadot's path to sustainable decentralization

Decentralized Nodes and Decentralized Voices programs by the Web3 Foundation offer pathways for validators and governance participants to gain influence based on merit, helping to build a fairer, more resilient Polkadot ecosystem that aligns with the core principles of web3.

Is large-scale, high-quality Web3 education possible?

The Polkadot Blockchain Academy (PBA) is taking Web3 education to the next level with PBA-X, an online course crafted to deliver high-quality blockchain knowledge at scale. Learn how we’re preserving educational standards, fostering a global community, and creating new opportunities for learners everywhere.

Blockchain explained: A practical guide from concepts to use cases

Blockchain has grown far beyond its roots in cryptocurrency, reshaping industries from finance to healthcare. This guide walks you through the basics, benefits, and groundbreaking potential of blockchain and its impact on the future of the open internet.

Empowering the next wave of founders: Welcome to EasyA x Polkadot University

Unlock a structured path to start building on Polkadot with EasyA x Polkadot University.

Dynamic & Modular: Scaling Ambition with Agile Coretime

Polkadot’s Agile Coretime simplifies launching and scaling blockchain projects with dynamic blockspace allocation and flexible cost options. Learn how Agile Coretime makes it easier to build, launch, and scale ambitious Web3 projects.

How play-to-earn (P2E) is transforming onchain mobile sports gaming

Play-to-earn games are transforming mobile sports gaming. Learn how blockchain, NFTs, and platforms like Polkadot create new opportunities for digital asset ownership and cross-chain gameplay.

Polkadot Token 2049 and Decoded Asia 2024: A multichain ecosystem in action

At Token 2049 and Decoded Asia 2024 in Singapore, Polkadot teams and contributors showcased a multichain future for real-world applications. Key moments included Dr. Gavin Wood’s vision for digital individuality, Chrissy Hill’s regulatory insights, and announcements from emerging projects shaping the Web3 ecosystem.

What is a crypto wallet? Your all-access pass to the future web

In Web3, your wallet is your most valuable digital tool. It’s more than just a place to store, send, and receive cryptocurrencies securely—it’s your passport to the decentralized world.

July 2024: Key network metrics and insights

Welcome to your go-to source for the latest tech updates, key metrics, and discussions within Polkadot, brought to you by the Parity Success Team. This blog series covers a variety of topics, drawing insights from GitHub, project teams, and the Polkadot Forum.

Polkadot 2.0: The rebirth of a network

Polkadot 2.0 reimagines blockchain with a bold rebrand and powerful features: Agile Coretime, Async Backing, and Elastic Scaling. Step into a more flexible, faster, and scalable network. Learn about the improvements and changes that led to this next era of Polkadot.