What Comprises an Ethereum Fullnode Implementation?

Demystifying Ethereum Fullnodes

May 14, 2019

Note: This post was originally published May 13, 2019 on the Amentum blog: https://medium.com/amentum/what-comprises-an-ethereum-fullnode-implementation-a9113ce3fe3a

Like clock-work, similar misinterpretations of blockchain-based “fullnodes” gets a bit out of hand. When we see this confusion, we want to do our part in dispelling it so as to push the community forward and educating them, as I did with bitcoin initially.

Since validating transactions on Bitcoin is a bit different from Ethereum due to how they handle, store, and archive state updates for reference and security, given Ethereum’s difference in architecture, we want to dispel any confusion between the two. This post will only discuss fullnodes, to get a full in-depth breakdown of how Ethereum works, check out this article; or to simply understand theETH transaction lifecycle, this post here.

What Is a Fullnode?

On Ethereum, validating the current state of your transactions only requires knowing the current state of the block that your transaction will be in included in, once it is propagated to the network, and accepted by miners.

Because Ethereum takes your smart contract code, compiles it to bytecode, and then submits your transaction to the EVM to update its state, maintaining the older historical state outside of the block with your transaction is not necessary for immediate validation (though some implementations will store the state of the last few hundred blocks, just for additional security against block reorgs, more on that later).

For Ethereum, a “fullnode” is copy of the entire chains’ history of state changes, from one account to another. By default, all intermediary states of contract transactions and contract calls are computed on the fly during your initial sync (unless you adjust this setting specifically before you sync), and what is unneeded (older state transitions) are pruned away (removed to save storage).

This is a feature, not a bug, since Ethereum abstracts state updates into contracts and compiles them via the EVM before they’re eventually saved on-chain and the state root is updated; this architecture gives us greater flexibility in how we can configure your node to be most efficient, depending on your personal use-case.

Richard Moore @ricmoo

If you have a 2 column database, with "word" and "definition", and a row for each word in the English language, this is a valid “full dictionary”. By adding an index to "word" it becomes an “archive dictionary”, which isn’t “fuller”, it’s just a lot faster for certain operations.

This guy gets it. Nodes are your economic representation on a blockchain-based network.

Importance of Ethereum Fullnodes

As we stated above, by default all intermediary states of all contracts that interact on chain are computed by default, but not all of them are stored. It is important, however, to note that those intermediate states could be re-computed and that data’s integrity can be check at anytime, since the intermediate states are stored on disk from genesis with a default fullnode.

This process is of course CPU intensive, and can require a lot of data to store all those intermediate state updates on your harddisk (that don’t even belong to you!). So, this is why by default those unnecessary states are “pruned” (periodically removed from the state trie so as to reduce bloat when initially syncing your node).

Not everyone needs to know the state of every contract interaction in the world (unless of course you’re a data analytics firm, like Etherscan, that needs to store all states to provide you a block explorer).

Example of a state update flow, from an externally controlled ETH account, to the transaction with a contract, and the subsequent internal contract calls. This image was taken from Preethi Kasireddy’s “How Does Ethereum Work, Anyways?”

In Bitcoin, the transition from one unspent output (UTXO model) to a new input when you send from one address to another, is the state update itself. This is why many in the Bitcoin space will claim a pruned fullnode is not a “fullnode” in the extreme purist sense, because spent TXs are removed from the state when you fully sync a pruned node. Once removed, you cannot re-compute the state, you must re-sync from genesis to get the full snapshot of the blockchain once more.

Here’s two analogies from Twitter we think will help:’

Nick Johnson (arachnid.eth) @nicksdjohnson

Understanding the difference between a full node and an archive node doesn't have to be difficult. By comparison with your bank account, a full node has a record of every transaction in or out of your account. An archive node also stores the balance after each transaction.

A very simple analogy to compare an Ethereum fullnode, to a traditional bank account from Nick Johnson (ENS).

Richard Moore @ricmoo

Re-computing intermediate states just formulates a full index for reference, in Ethereum.

Now that we understand quickly why fullnodes work the way they do on Ethereum, let’s dive into all the types of economic agents (nodes) that are possible on the network.

Types of Ethereum Nodes

Remember, Ethereum fullnodes are flexible by design, but they still need to do a few things. In short, fullnodes have to be able to: validate transactions that are being mined (if mining); apply the block reward (which is now 3 ETH/block at time of this writing based on recent EIPs); and, the most important task for the majority of users, verifying that state and the resulting state changes are applied properly and follow the consensus rules (which are checked against the state trie in each new block header).

However, sometimes your use-case may require a certain type of access to information (or as the least amount of information as possible, due to system restraints). Here’s each of the various types of clients you can configure on Ethereum:

Light Clients: Requires no validation, requests the current state from the P2P network to verify current state (fine for processing payments and simple contract calls), but your validation is out-sourced to other fullnodes with the necessary information.
Fast Node (Fast Sync/Warp Sync):Will not validate intermediate states during the initial sync, will validate everything else after that, however. This allows older data to be pruned that likely has nothing to do with your transactions
Full Node:As described above, validates everything, and will prune old intermediate states from memory, and will keep an archive of future state trie updates.
Archive Node (Historical Node):Validates and stores all intermediate states, nothing is pruned, all state transitions for accounts are retrievable. Future state trie updates and full intermediate states are stored.

Here’s some more reactions from the community responding to others with concerns about Ethereum, due to public misunderstanding of how its different economic nodes function (the misinformation is very widespread).

Nick Johnson (arachnid.eth) @nicksdjohnson

@mriou @antiprosynth @Catheryne_N @VitalikButerin A fast synced node includes every block, and therefore every transaction. By replaying those, you can reconstruct the state at any time in the past, if you need it (but by design, most DApps shouldn't). Practically: `geth dump`, and then `geth import` with a different datadir.

Etherscan.io @etherscan

@rolflobker @nicksdjohnson @5chdn @DZack23 @StopAndDecrypt @pedrouid @0xstark @VitalikButerin @socrates1024 @ethhub_io As of Today : - Default full synch on Parity (~135GB) - Default full synch on Geth (~150GB) - Archive synch on Parity (~1.8GB) - Archive synch on Geth (~1.9GB) For all intents and purposes the default full synch is a "Full Synch".

If you’re looking to sync a Geth or Parity node of your own (Ethereum’s two largest implementations), you can follow this guide hereto get up-to-speed on the various modes prior to syncing.

Dispelling Myth and Confusion

We are in the firm belief that Ethereum’s capability to have so much flexibility with how it stores and updates its state root gives it a real strategic advantage to platforms following UTXO models. In Ethereum, there are many types of clients that record state, and in the future, perhaps even clients that don’t require state at all!

But, as Ethereum continues to mature, and is embedded in more devices, its flexibility will begin to shine and continue to improve (with a fresh blank state with the incoming transition to ETH 2.0, making things even more efficient storage-wise).

We see a future of many clients, all acting as independent economic agents, with different data availability requirements. But, regardless, a fullnode on Ethereum is in fact that, a fullnode. And, how one syncs theirs should not be misconstrued with other chains like Bitcoin, as long as you’re using the right type of node for the job.

References and Other Relevant Educational Resources:

Ethereum node configuration modes cheat sheet
This document is a quick cheat sheet of most common geth and parity configurations explained - usually, everything you…dev.to

How does Ethereum work, anyway?
Introductionmedium.com

The Blockchain Stack
Co-authored by Alexis Gaubamedium.com

Synchrony and Timing Assumptions in Consensus Algorithms Used in Proof of Stake Blockchains
Co-authored with Alexis Gauba.medium.com

Amentum Capital

Discussion about this post