Can Java compete with C++/Rust in latency-sensitive applications? Learnings from Running a Java-based Trading Bot
With the same (limited) amount of developer effort, we might get better latency with Java given the faster development speed.
(Coauthored with ChatGPT)
This post is for my friends who use Java instead of C++/Rust/Golang. I think we can stay with Java for now. In the future, with the advancement of ChatGPT and LLMs, we as software engineers just need to speak the language that we feel the most comfortable with and leave the rest (translation) to AI. :)
I would love to hear your thoughts! Please comment on the post.
Note: Centralized cryptocurrency exchanges are currently facing legal challenges in the US, which indicates that this might be a dwindling use case. However, the learnings from low-latency Java programming highlighted in this article can be applied in various other domains and are not exclusively relevant to cryptocurrency trading.
High-frequency trading (HFT) in the cryptocurrency market is a highly competitive environment, where milliseconds or even microseconds can make a difference between profits and losses. The choices of programming languages, infrastructure, and optimization techniques are crucial for success. In this article, we will explore the learnings from running a Java-based high-frequency trading bot on centralized crypto exchanges.
1. Level Playing Field in Centralized Crypto Exchanges
Unlike traditional stock exchanges where physical proximity to the exchange servers can offer an edge in trading, centralized crypto exchanges are often hosted on public clouds. This creates a level playing field as there is no option for colocated trading setups (which are prohibitively expensive for small players).
Still, the competition is intense, and reducing the latency of the trading bot is the key.
2. The Importance of Latency
Latency determines how quickly our trading bot can respond to market changes and capitalize on arbitrage opportunities. In HFT, reducing latency is paramount, as opportunities may vanish within milliseconds.
3. Common Language Choices: C++ and Rust
Most HFT bots are written in C++ or Rust due to their low-level control and performance optimizations. These languages are known for their minimal runtime overhead, which is crucial for reducing latency.
4. The Case for Java
Despite the popularity of C++ and Rust, Java can also be a viable option for HFT. If we have existing algorithms and logic in Java, it may be a better route to optimize it within Java than to translate it to C++ or Rust, since Java has a rich ecosystem and can boost developer productivity. Furthermore, the Java community is actively working to reduce runtime overhead, with GraalVM being a notable development.
5. A Shift in Focus: From Throughput to Latency
In popular Java applications like big data, throughput is often the main concern. However, in HFT, latency takes precedence. Optimizing Java for low latency requires a different set of strategies and tools.
6. Top Learnings in Java Latency Optimization
6.0. Measurement is Key
Start by measuring. Java's System.nanoTime()
can be used to measure the time taken by different parts of our code. Use thread-local arrays of longs for storing nanoTime and strings for comments on the last block of code executed. This can significantly reduce the amount of code changes for manual instrumentation (since method signatures won’t need to change), while keeping the instrumentation overhead very low. Compared with automatic instrumentation, this method doesn’t rely on the runtime, so it won’t interfere with JIT or block AOT.
Accurate measurement allows us to find out the biggest culprit of the performance so that we can dramatically improve the ROI of our latency-optimization effort.
6.1. Minimizing Garbage Collection Pauses
The time scale for HFT is often below 1ms, while garbage collection (GC) in Java can cause stop-the-world pauses of tens of milliseconds. Instead of trying to optimize the GC latency, focus on reducing the frequency of stop-the-world GCs. Unless we are optimizing for the P99.9 latency, GC latency usually doesn’t matter that much. For trading bots, it’s totally OK to miss 0.1% or 1% of the opportunity.
6.2. Network Latency Considerations
Network latency between AWS zones in the same region is in the range of 0.1-1ms, which is significant for HFT. We should choose machines that are in the same zone as the exchange’s execution engine to reduce network latency if possible. In addition, we should find the fastest IP address of the server host name if it can resolve to multiple IPs.
6.3. Library Choices and Customizations
Many Java libraries for network protocols like WebSocket and HTTP are optimized for flexibility and throughput, not latency. Single-step through the libraries, understand the logic, and strip out unused features for better performance. Jetty is a good example of that.
Some other libraries, like Netty, have lower overhead and better througput. However it is a low-level library and may require more effort to use.
6.4. Precomputation
Time-sensitive calculations should be done ahead of time when possible. We can store results in memory and retrieve them during critical trading times, instead of computing them on the fly.
This needs to be carefully coded though, since it’s very easy to miss updates (e.g. timeouts) and have stale results in the precomputed data. Ideally this should be done by a framework or compiler.
6.5. Optimizing String Operations (especially in Logging)
We often need to print integer or double to String in Java logging code. This is actually pretty slow even with the pre-optimized system libraries since Java uses char-based (double-byte) String. While Java has already done extensive optimizations for Latin strings, by writing custom methods for converting integers and doubles to our own Latin string classes, additional time can be saved.
This also applies to many network libraries that receives and sends Latin or UTF8 over the network, while exposing String in the API. This is not new though. It reminds me of the Text class in Hadoop.
6.6. Optimizing the Use of System Libraries and Basic Data Structures
Another critical aspect of Java latency optimization is the efficient use of system libraries and data structures. Java’s standard libraries are versatile but might not be optimized for ultra-low-latency scenarios. Here are some tips:
Object Pools and Synchronization
Reducing object allocation overhead can be achieved by using object pools, or more simply, just preallocation (without reuse) of objects with the drawback of more GC but often smaller code modifications. This would introduce synchronizations in acquiring (and releasing) objects with the pool. While simple synchronization using object.notify()
is common, it can be surprisingly slow.
Atomic Variables
As an alternative, we can use atomic variables such as AtomicReference
and AtomicLong
. They provide a lot better performance for concurrent updates because they leverage low-level system operations to reduce the cost of synchronization.
Alternative Data Structures
The standard data structures in the Java Virtual Machine (JVM) might not be fast enough for high-frequency trading. There are alternative libraries that focus on high performance and low latency. For example, Chronicle Queue is a high-performance, off-heap, persisted messaging and event-driven architecture library.
Another alternative is the Guava libraries, which offer a range of efficient data structures and utilities. Both Chronicle Queue and Guava libraries can be highly advantageous in scenarios where performance is critical.
Conclusion
By understanding and optimizing various aspects of the Java ecosystem, it’s possible to build high-frequency trading systems that can compete effectively in the fast-paced world of cryptocurrency trading on centralized exchanges. Although Java might not be the traditional choice for this domain, with the right strategies focusing on latency reduction, such as using efficient libraries and optimizing data structures, it can be a powerful tool in the hands of skilled developers. Balancing between the rich features of Java and the need for ultra-low latency is the key to success in this competitive arena.
Quantitative analysis is easy for latency and performance alone, but not so much for productivity. Since my goal is performance over limited amount of engineering effort, I didn’t do a thorough analysis (which itself will take effort).
However, I can share that my HFT program in Java is definitely one of the fastest on those exchanges (and I suspect that I might be the only one using Java).
Thanks, Zheng, for this post and sharing with us your insights!
As an outsider of the area of HFT (but a Java programmer for a long while), I am curious if there is some (high-level) quantitative analysis on the competitiveness of Java based HFT systems, comparing to systems using other languages.
Well, one day it could just be natural human language thanks to ChatGPT and LLMs :-)