My Q3 2023 Plan: Embarking on Reinforcement Learning for Algorithm Optimization 🚀

Jun 21, 2023

(Coauthored with ChatGPT)

Hello Everyone! 👋. The main reason that I wrote and share this plan is for looking for friends who are interested in working on these together. Please ping me if you would like to join the journey!

As many of you know, earlier in my career, I was heavily engaged in optimizing big data software. Initially, this endeavor was invigorating, offering a wealth of learning opportunities, spanning hardware, operating systems, data structures, programming languages, and user applications. However, over time, the repetitiveness set in as the same strategies, like tiered storage and caching, pre-computation and materialization, measurement and optimization iterations were employed repeatedly. This led me to ponder the potential of AI in efficiently tackling these optimization challenges, and that's how Reinforcement Learning (RL) piqued my interest.

This was also mentioned in my earlier post Q1 2023: My Last 3 Months of Exploration. In this article, I would like to expand that and share my exact learning plan.

The Allure of Reinforcement Learning 🤔

Reinforcement Learning fascinates me primarily because of its ability to mimic how humans learn from experience and make decisions. It represents a unique paradigm where machines learn by interacting with their environment, trying different approaches, and optimizing their choices.

Harnessing RL for Algorithm Invention 🧮

A compelling aspect of RL is its applicability in the creation and optimization of algorithms. In many other ML and RL applications, the availability of high-quality data is the bottleneck. However that problem is almost non-existent in creation and optimization of algorithms, since data can be generated via “self-play”.

Verifying an algorithm and assessing its performance can be relatively straightforward (without human intervention), but devising an efficient algorithm is inherently challenging since many different strategies and experiments need to be tried. RL can be an invaluable asset in this regard, as models can be trained to invent innovative and effective algorithms.

Transforming Infrastructure and System Software 🛠️

Currently, algorithm creation is predominantly a human endeavor, and experts have conceived ingenious solutions like bloom filters and cuckoo hashing. However, I firmly believe that AI holds the promise of surpassing human limitations in the long term. This notion is not only exciting but also has the potential to fundamentally transform the infrastructure and system software landscape.

Inspiration from DeepMind 📘

DeepMind has been at the forefront of advancements in RL. Their contributions, namely AlphaTensor (focusing on matrix multiplication algorithms) and AlphaDev (optimizing sorting libraries at the assembly level), are exemplary of the immense potential that RL holds in algorithm development. These works build upon DeepMind's earlier groundbreaking achievements such as MuZero, AlphaZero, AlphaGo Zero, and AlphaGo.

Setting Sail on My RL Voyage 🔧

As Q3 unfolds, I am thrilled to embark on this exciting journey into the depths of Reinforcement Learning. I'm resuming Coursera specialization in RL and supplementing it with the book Reinforcement Learning: An Introduction.

Additionally, I am eager to experiment with RL libraries like RLLib in Ray and HorizonRobots' MuZero implementation. I plan to try out Github Copilot X with VSCode as the IDE. While I have been a long time happy user with IntelliJ (and recently with Github Copilot plugin) with Java, VSCode is picking up steams with Copilot X and I need to switch language to Python anyway.

Other Related Interests

There are also a lot of progress on using LLM for code generation, although that work seems to be targeting “interactive coding” use case, and is somehow detached to the RL efforts.

If and when I get time, I would like to dig into Codex from OpenAI and BigCode open-source project as well. Meta has built and started using CodeCompose internally with very good feedback from users.

Let’s Collaborate and Learn Together! 🤝

If you have suggestions, resources, or share an enthusiasm for RL, please don’t hesitate to reach out! Together, let’s traverse this captivating domain and unravel the possibilities that Reinforcement Learning beholds.

Here’s to an enriching and invigorating Q3 2023! 🎉

Happy Learning! 📈

References 📚

Guozhang Wang

Jul 6, 2023

Haha very nice to read your latest post! I've also started in RL from a slight different aspect of interest, in real-time gradient descent for meta learning. Have just done all the courses from https://community.deeplearning.ai. Thanks for the pointers of https://www.coursera.org/learn/fundamentals-of-reinforcement-learning, will start on it.

Expand full comment

1 reply by Zheng Shao

Yuxi Li

Jun 22, 2023Edited

Great to see that you plan to focus on RL, during the storm of LLMs.

RL is “guaranteed” to make (academic) progress, given enough resources, e.g., there is a chance to innovate all manually designed methods or previous algorithms with RL, like AlphaTensor, AlphaDev, and works in DB, compiler, chip design, magnetic control of tokamak plasmas, stratospheric balloons, even learning algorithms themselves.

ICML/NeurIPS workshops on reinforcement learning for real life

https://sites.google.com/view/RL4RealLife

(Invited talks and panel discussions from top experts and great papers.)

A survey in early 2022.

Reinforcement Learning in Practice: Opportunities and Challenges

https://arxiv.org/abs/2202.11296

Recently I spend much time on LLMs.

The following blog is at an abstract level.

Reinforcement learning is all you need, for next generation language models.

https://yuxili.substack.com/p/reinforcement-learning-is-all-you

I am working on a perspective paper with more concrete ideas.

Hopefully I will finish it by the end of this month.