Published on

Scaling the merkle Private Mempool to 25M tx/day

Authors
We monetize orderflow.
👋 merkle specializes in protecting and monetizing orderflow in that order. We guarantee best-in-class inclusion & ordering while paying the most for your orderflow.

It's been a busy year for us at merkle. We've been working on a lot of things, but one the most interesting one is our private mempool and how we managed to scale it to billions of requests per month with "random" traffic spikes, especially over the last 3 months. The mempool is a critical part of infrastructure, where even a p99 > 1s is highly noticeable, hurts users trust and can lead to lost revenue in the orders of millions.

We are hiring for multiple positions for our engineering teams, come join us!

merkle is now powering major wallets, RPC providers and even Trump's World Libery Finance frontend through our public ethereum RPC service! We have cumulatively paid out millions of dollars in revenue to our customers in 2024 and serve almost 250M requests per day.

merkle mempool

Transaction received per day over the last 3 months.

Challenges

Unpredictable traffic

The main challenge with scaling anything related to trading or markets is the unpredictability of the traffic. You cannot predict when Ethereum is going to drop 20% in one day, or when a DeFi protocol is going to get hacked. However, there are some predictable events, for example when Trump won the election.

trump

Trump won the election, and our traffic spiked 4x in a few minutes.

Data production

The second challenge is the amount of data we produce. We have to store all the transactions that are sent to our servers, as well as the state of the mempool at any given time. This means that we have to deal with a lot of data, and we have to be able to scale our storage to handle it. Thankfully, we do not need to keep a lot of historical data in our production database. As of today, we produce terabytes of data per month.

Database scaling

Our main datastore if PostgreSQL. We use it to store the state of the mempool and the transactions that are sent to our servers. However, scaling off-the-shelf PostgreSQL can be a brutal task.

Tackling challenges one by one

Unpredictable traffic

In order to absord traffic spikes, we had to optimize database queries, hot paths and be queue-driven.

Optimizing database queries

We noticed that our database queries would exponentially grow with traffic. This was due to our queues receiving data in the following format, let's take our broadcast queue for example:

{
    // uuid of the transaction
    "transactionId": "0001-2345-4953-1029"
}

This meant that before broadcasting the transaction, the queue processor had to do a lookup in the database to get the transaction. We refactored the queue processors with a more efficient data structure and one rule: "all the data we need needed to process the queue should be in the message".

{
    "transaction": "0x02....",
    "blockNumber": 17000000,
    "options": {
        // broadcast options
        "maxTimestamp": 17000000,
        // etc ...
    }
}

This allowed the queue to do its work even if the database was down. Additionally, it reduced the amount of queries many folds.

We applied this principle over all our of queues (20+) and it allowed us to reduce database load by about 70%.

Hot paths

We optimized our hot paths to be queue-drive instead of database driven. This allows us to continue partial operations (reciving and broadcasting transactions) even if the database is down or at max capacity. We achieved this by passing around all required data in the queue message and having one queue to write to the database.

Data production

We are fortunate that our data is temporary, once a transaction is mined or expired, it's not longer needed. This means that we can use a lot of tricks to optimize our database. We automatically ETL our production data into Snowflake and delete transactions older than 7 days from our production database. This helps keeping indexes light and inserts fast.

Thanks to Snowflake, we can easily scale our data storage and compute to handle the data production and queries for our reports & dashboards.

Database scaling

PostgreSQL is a great database, but it's not the best at scaling, at least not out of the box. While PostgreSQL might be faster at smaller scale, you really want consistancy over speed for a database that is used by thousands of users.

Fortunately, distributed SQL was invented 10 years ago and is starting to be mature enough to run in production. We are using CockroachDB to scale our database and it's been a game changer.

Work with us!
We are looking for talented enginners with Rust experience. Bonus points if you have ever contributed to Reth or built an MEV bot in any language. We are also hiring for Senior Frontend Engineers, Senior Backend Engineers, Data Engineers & more.
Sign up
Snipe API
Add best-in-class sniping into your Bot or Non-custodial Wallet.
Sign up