Published on

Scaling the merkle Private Mempool to 25M tx/day

Authors
We monetize orderflow.
👋 merkle specializes in protecting and monetizing orderflow in that order. We guarantee best-in-class inclusion & ordering while paying the most for your orderflow.

It’s been a busy year for us at merkle. We’ve been working on many things, but one of the most interesting is our private mempool and how we managed to scale it to billions of requests per month with unpredictable traffic spikes, especially over the last three months. The mempool is a critical part of our infrastructure, where even a p99 > 1s is highly noticeable, hurts user trust, and can lead to millions in lost revenue.

We are hiring for multiple engineering positions — come join us!

merkle is now powering major wallets, RPC providers, and even Trump's World Liberty Finance through our public Ethereum RPC service! We have cumulatively paid out millions of dollars in revenue to our customers in 2024 and now serve almost 250M requests per day.

merkle mempool

Transactions received per day over the last 3 months.

Challenges

Unpredictable traffic

The main challenge with scaling anything related to trading or markets is the unpredictability of the traffic. It’s impossible to know when Ethereum will drop 20% in a single day or when a DeFi protocol will get hacked. However, some events are predictable — for example, when Trump won the election:

trump

Trump won the election, and our traffic spiked 4x in a few minutes.

Data production

The second challenge is the amount of data we produce. We must store all transactions that are sent to our servers, as well as the state of the mempool at any given time. This means we handle a large volume of data and need to scale our storage to accommodate it. Currently, we produce terabytes of data per month.

Database scaling

Our main datastore is PostgreSQL. We use it to store the state of the mempool and the transactions sent to our servers. However, scaling off-the-shelf PostgreSQL can be a brutal task.

Tackling challenges one by one

Unpredictable traffic

To absorb traffic spikes, we needed to optimize database queries, improve hot paths, and adopt a queue-driven architecture.

Optimizing database queries

We noticed our database queries grew exponentially with traffic. This stemmed from our queues receiving data in the following format (for example, our broadcast queue):

{
  "transactionId": "0001-2345-4953-1029"
}

Before broadcasting, the queue processor had to look up the transaction in the database. We refactored the queue processors to use a more efficient data structure, guided by one rule: all the data needed to process the queue should be in the message.

{
    "transaction": "0x02....",
    "blockNumber": 17000000,
    "options": {
        // broadcast options
        "maxTimestamp": 17000000,
        // etc ...
    }
}

This allowed the queue to function even if the database was down. It also drastically reduced the number of database queries. After applying this principle to all our queues (20+), we reduced database load by about 70%.

Hot paths

We optimized our hot paths to be queue-driven rather than database-driven, letting us continue partial operations (receiving and broadcasting transactions) even if the database is down or at max capacity. We achieved this by passing all required data in the queue message and consolidating writes to the database in a single queue.

Data production

We are fortunate that our data is temporary: once a transaction is mined or expires, it is no longer needed. This means we can optimize our database heavily. We automatically ETL our production data into Snowflake and delete transactions older than seven days from our production database. This keeps indexes light and inserts fast.

Thanks to Snowflake, we can easily scale our data storage and compute to handle the data production and queries for our reports and dashboards.

Database scaling

PostgreSQL is a great database, but it’s not the best at large-scale out of the box. While PostgreSQL might be faster at a smaller scale, we want consistency over speed for a database used by thousands of users.

Fortunately, distributed SQL was invented about a decade ago and has matured enough for production use. We are now using CockroachDB to scale our database, and it has been a game-changer.

Work with us!
We are looking for talented enginners with Rust experience. Bonus points if you have ever contributed to Reth or built an MEV bot in any language. We are also hiring for Senior Frontend Engineers, Senior Backend Engineers, Data Engineers & more.
Sign up
Snipe API
Add best-in-class sniping into your Bot or Non-custodial Wallet.
Sign up