Published on

We saved $250,000 by running our own RPC nodes

Authors
We monetize orderflow.
👋 merkle specializes in protecting and monetizing orderflow in that order. We guarantee best-in-class inclusion & ordering while paying the most for your orderflow.

At merkle, we consume hundreds of millions of RPC requests every month, and we need to make sure that our infrastructure is capable of handling the load. We use a custom load balancer that we built in-house to distribute the load across our RPC nodes and achieve high availability while keeping costs low.

RPC services

When merkle started, we used Alchemy. However, we quickly realized this wouldn't scale, after receiving a $1,600 bill for just 2 days of usage:

Alchemy bill

Extrapolated, this would have cost us $24,000 per month, which is way too much for a startup. We decided to build our own RPC nodes, and we've been using them ever since.

Building our own RPC nodes

Our goals were simple:

  • High availability: we need to be able to handle billions of requests per month, and we can't afford to have downtime.
  • Low cost: we're a startup, and we need to keep our costs low, less than $1,500 per month.
  • Low latency: we need to be able to handle requests in a timely manner.
  • Easy to scale: we need to be able to scale up and down easily.
  • Low maintenance: we don't want to spend a lot of time maintaining our infrastructure.
  • Multi-chain: we need to be able to support multiple chains / add new chains quickly with zero downtime.

Picking a cloud

We decided to use OVH, a French cloud provider, because they offer a lot of flexibility and low prices for beefy machines. We also use AWS for some services, but we prefer OVH for our RPC nodes.

Specifically, we use ADVANCE-2 servers, which have 16 cores, 32GB of RAM, and 2x 1.92TB NVMe SSDs. They cost $200 (less with commitments) per month, which is a great deal.

For Polygon, and BSC, we use the same server, with a higher disk capacity (2x 3.84TB NVMe SSDs) for $250 per month.

But the real value in OVH servers is the unlimited outgoing/incoming bandwidth.

We spawned minimum 2 nodes per chain, adding up to 6 nodes in total, and a monthly cost of ~$1,000 (due to some long term commitments and discounts from OVH).

Picking a load balancer

Nginx is a great choice for a load balancer, but RPC nodes are different, that's why we decided to build our own load balancer using Go. merkle products are mostly rust, but we use Go for some high traffic services, and it's a great fit for this use case.

Building a load balancer for rpc nodes

We needed a high throughput, low latency load balancer, that can handle hundreds of millions of requests per month, and we needed to build it quickly.

The architecture

In order to keep track of all upstream nodes, the load balancer connects to them over multiple websockets (we don't use http).

Load balancer design

The load balancer, at all times, has between 5 and 10 websocket connections to every upstream servers.

Supporting eth_subscribe

Keep track of the head

State consistency is an issue with normal web services, but when it comes to rpc nodes, it's a totally different problem. We want to make sure we never route requests to a server that is lagging behind the head of the network.

For example, suppose that we have server A and B. When A hears of a new block, it'll quickly process it and update its state, but B might not have receive the new block yet. And then you have two nodes with different state.

In order to solve this problem, we keep track of the head of the network of each node, and we only route requests to nodes that are at the latest head. However, we need to wait until the majority of nodes have been synced to the new head become advertising it to clients, otherwise all requests would be routed to one server for a short period, which would lead to a lot of load on one server.

eth_subscribe

New blocks:

eth_subscribe is the fastest way to get notified of new blocks and new pending transaction, but we don't want to just proxy the request and attach a stream to a node, because we want to make sure that we don't miss any events. And in case the nodes goes down, we want to make sure that the client never notices, and keeps receiving new blocks.

Thankfully, we already track every new block event to route requests. Therefore, a eth_subscribe never actually needs to be forwarded to a node, we can just keep track of the subscription on the load balancer, and forward the events to the client.

New pending transactions:

Under the hood, the load balancer connects to our Transaction stream to seamlessly advertise pending transactions as fast as possible.

Caching

We know from experience that as soon as a new block is advertised, the load balancer gets flooded with eth_getTransactionReceipt, eth_getTransactionByHash, eth_getBlockByNumber and eth_getBlockByHash. That's why we cache all the responses before advertising a new block. Leading to 40-80% cache hits.

Cache hits

Quality of life improvements

Our engineers used to always ask What is the RPC url for <x> chain. So we put our load balancer behind rpc.merkle.net (our internal network). Now, they can just use https://rpc.merkle.net/<chain> for any chain that we support.

Conclusion

We've been using this load balancer for over 3 months now. It's been working great, has scaled very well, and we were to save over $250,000 in the process.

Work with us!
We are looking for talented enginners with Rust experience. Bonus points if you have ever contributed to Reth or built an MEV bot in any language. We are also hiring for Senior Frontend Engineers, Senior Backend Engineers, Data Engineers & more.
Sign up
Snipe API
Add best-in-class sniping into your Bot or Non-custodial Wallet.
Sign up