2023/12/01

The Bitter Lesson about Engineers in a ChatGPT World

Author:: darren-mercari

, 2023/12/01

The Bitter Lesson about Engineers in a ChatGPT World

This post is for Day 1 of Mercari Advent Calendar 2023, brought to you by Darren from the Mercari Data Engine team.

Tomorrow’s article will be by Pratik, about a huge cost saving engineering initiative. Looking forward to it!

It’s been over a year since ChatGPT was released and we asked the question on every engineer’s mind, Do We Need Engineers in a ChatGPT World?. In this post, we will follow up on last year’s discussion and talk about how the development of large language models (LLMs) has changed the trajectory of engineering.

First off, we’re still here! Engineers are still engineering, and there seems to be no slowdown, but rather an acceleration of activity. We need more humans than ever!

What happened? Why didn’t the LLMs take all our jobs yet?

In order to answer this question, it is useful to look backwards to the distant past of … 2019. In February of that year, GPT-2 was announced. It was a 1.5 billion parameter model that was able to generate fluent, coherent text from a simple auto-regressive completion pre-training regime. A month later, famed computer scientist Richard Sutton wrote a post titled “The Bitter Lesson” about his conclusions from looking at more than half a century of AI research. It states that “the biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.” The reason this lesson is “bitter” is that computer scientists often feel compelled to imbue systems with their own knowledge. But, over the long run, models that simply use a great deal of training data to learn the important patterns on their own almost always end up outperforming the hand coded systems.

In 2023, there are two big competing trends in the world of LLMs: the trend towards larger, more general models such as GPT-4, PaLM, LLaMA, etc., and the trend towards smaller specialized models. The Bitter Lesson tells us to prefer the more general methods. But the constraints of our current compute environment force us to use techniques such as LoRA to run large models more efficiently, or to simply go small from the beginning and train specialized models with fewer parameters.

Engineering is being pulled in similarly competing directions. On the one hand, engineers are being asked to understand more and more frameworks and systems just to do their jobs. This is thanks to software eating the world, but also to the massive rise in cloud computing over the past decade and a half. As a data engineer who also manages production systems, I might find myself looking at monitoring dashboards one minute, running kubectl the next, jumping to Airflow to check on some data pipelines, followed by running a massively distributed analytics job in BigQuery. These are only a handful of the myriad tools we use on any given day, but they were all originally built by different teams with different philosophies of software engineering. The different “language” of each framework is yet another layer of context switching for the already overloaded engineer.

The other direction engineers are being pulled in is specialization, just like LLMs. Each area of software engineering is a huge discipline, and no one is expected to be an expert in everything. Many engineers choose to specialize, whether it be in networking, iOS or Android development, graphics programming, or any of the many subfields of artificial intelligence. In each individual discipline, there are still new things to discover and build, and there are a wealth of careers available.

But what do we do now that code generation LLMs are increasingly reaching human level at coding tasks? Do we let LLMs specialize for us, while we stay general, or do we specialize and let the LLMs do the generic stuff?

To answer this, I would like to introduce another bitter lesson for engineers: it turns out that a lot of what we do in our jobs is intellectually not very novel. Not that it isn’t important or meaningful or creative, because it certainly can be, but rather, the vast majority of an engineer’s time is spent doing things other than making new intellectual discoveries. Likewise, autoregressive LLMs are not yet creating new intellectual discoveries, but rather have organized their training data in an extremely useful way that allows them to generate outputs similar to what they already know. How we choose to work with LLMs, since they are not going away, will define our future as engineers.

My advice is to turn this bitter lesson around and see it as a sweet relief: LLMs can handle the tasks we don’t want to do so we can focus our attention on the more meaningful pursuits. For me personally, I use LLMs to wade through tons of documentation that I don’t have time to peruse. Remember all the frameworks mentioned above? The grand total documentation for all the frameworks I use must be many millions of words (or tokens, for you LLMs). I have certainly not read it all. But LLMs are great for this use case, and I often use them for help finding methods and even sample code for frameworks that I only know part of. Sure, the language models often hallucinate methods that don’t exist (but should!), but they generally at least point me in the right direction. In information retrieval terms, the recall is high, but accuracy often suffers, meaning I can usually find what I want, even if there are a lot of irrelevant results. In fact, this particular use case of searching and summarizing large corpuses of text has led to a whole industry around Retrieval Augmented Generation, which essentially extends an LLM with a vector database to combine generative AI and information retrieval.

Another way I use LLMs is to learn the fundamentals of an engineering task I haven’t done before. Rather than giving access to an entire code base and telling an LLM to fix something for me, I would rather learn how to do it in the most basic way, and then use that knowledge to build the solution. This comes back to the probabilistic nature of LLMs – while they do a rather good job of generating human-level code, if you don’t even understand what you’re reading, how will you know if it’s a valid solution beyond whether the outputs are correct? This job of an engineer is increasingly important, and its analog in the world of AI is “explainability”. Of course, as systems grow more and more complex, the ability to understand all pieces becomes accordingly more challenging. But while arcane syntax is not necessarily important to memorize completely, and often quite difficult to do across dozens of languages, the overall structure of algorithms and system design absolutely need to be understood. The bitter lesson is, whether we’re training the next-gen LLM or transforming billions of rows into an aggregate, at the end of the day, we’re just trying to push bits through processors as fast as possible, and the basics apply just as well as before LLMs.

In Sutton’s “Bitter Lesson”, those who fight the trend towards larger data sets and more general methods end up overtaken by those who trust in the simplicity of their methods to discern complex patterns for themselves, and the availability of future compute to perform the training. Engineers have important takeaways from this lesson. We, too, should not place too much emphasis on the specialized knowledge we have accumulated over the years, especially the trivia, because LLMs already surpass us in those domains. Instead, we can focus on the general methods of engineering, because the first principles never change. Or we can push harder towards domain expertise by leveraging LLMs to take over the more mundane parts of our jobs. Either way, we consciously choose to co-evolve with LLMs, using them to effectively accelerate our role in engineering the future.

If you want to accelerate your career, check out our open positions at https://careers.mercari.com/.

The Bitter Lesson about Engineers in a ChatGPT World

Related article

LLM-based Approach to Large-scale Item Category Classification

Fine-Tuned CLIP: Better Listing Experience and 80% More Budget-Friendly

Leveraging LLMs in Production: Looking Back, Going Forward