Part 2: The proliferation of Generative AI Coding Tools and how Product Engineering teams will use them

a computer generated image of four people working at screens in red light — Image by author using Midjourney

How Generative AI will impact product engineering teams — Part 2

This is the second part of a six part series investigating how generative AI productivity tools aimed at developers, like Github Copilot, ChatGPT and Amazon CodeWhisperer might impact the structure of entire product engineering teams. In Part 1, we explored:

The landscape of product engineering and the possibility that teams will need fewer human engineers with the rise of Generative AI tools.
The traditional 5:1 Ratio in tech teams: how roughly five engineers for every product manager is common across the industry.
The roles of product managers and engineers in current product development processes and how these roles might shift with AI advancements.
How past research has given flawed predictions on which professions will be the least impacted by AI, and how LLMs have upended these predictions, particularly for the tech and creative industries.

The explosion of AI coding tools

Automation has been a part of software engineering for almost as long as there has been software engineering. Eric Raymond’s 2003 landmark essay, The Art of Unix Programming reflected on 17 design rules for software engineers, including the Rule of Generation: “Avoid hand-hacking; write programs to write programs when you can”. Raymond’s advice is still relevant even 20 years after it was published:

“Human beings are notoriously bad at sweating the details. Accordingly, any kind of hand-hacking of programs is a rich source of delays and errors. The simpler and more abstracted your program specification can be, the more likely it is that the human designer will have gotten it right. Generated code (at every level) is almost always cheaper and more reliable than hand-hacked.”

Since Raymond wrote those words we have developed automated test tools, linters (tools which automatically check the code we write), auto-completion for our development environments (like a spell-checker, but for code) and even frameworks (like React and Django) which automate the creation of a large amount of basic, boilerplate code for generic applications like websites and mobile applications. For the most part developers have reveled in automation, while always being quietly confident that our experience, skill and creative uniqueness would make us hard to replace. It didn’t help that McKinsey and their peers were also telling us that we’d be safe from the clutches of AI automation. The most recent additions to the code automation space are a group of tools that offer to sit alongside developers as an equal, and look eminently capable of progressing far further than we had previously imagined. Much like Raymond’s 17th Principle, it’s possible that these tools will write complex software better than humans. Probably the prima donna of developer assistance tools as I write is Github Copilot, which unblinkingly markets itself as a pair programmer (the term given to a human coding buddy, who sits alongside you and co-authors your code). Copilot was released to developers in June 2022, and its younger and more charming sibling, Copilot X, was announced in March this year. Github has made what appear to be entirely reasonable claims that Copilot increases developer productivity, with their study reporting that the tool reduces the time to complete engineering tasks by up to 55%. Github (don’t forget that this is a company owned by Microsoft, who also own a rumoured 45% stake in ChatGPT makers, OpenAI) isn’t alone in this space. Amazon CodeWhisperer was also announced last year, with similar claims of a 57% increase in developer productivity. Like Copilot, CodeWhisperer is accessible in the engineer’s development environment (IDE) and can correct, comment, explain and write code.

IDEs have long been able to provide code autocompletion, but with Copilot X it can write entire sections of code with little context

In a fascinating article on how the Github team developed and improved Copilot, John Berryman, a senior researcher at Github explained that it wasn’t only the code directly in front of the developer that was being used to prompt the AI models. Berryman explained,

“The secret is that we don’t just have to provide the model with the original file that the GitHub Copilot user is currently editing; instead we look for additional pieces of context inside the IDE that can hint the model towards better completions.”

It’s this wider context that these developer tools can exploit — the IDE, the files available on the developer’s machine, the code repositories in git and even, potentially, the documentation for the application — that make them so powerful. This broad context lowers the skill levels required for developers to ‘prompt engineer’ solutions from the AI. Examples to improve prompts are available to the AI tools from directly within the environment they operate, and also from the billions of lines of pre-existing code they were trained on. This gives the LLM a good chance of being able to create not just an appropriate output, but one that exceeds the capabilities of an average developer. With Copilot X, the code interface itself is no longer the only palette the AI has to work with, and it’s also no longer restricted to just auto-generating code. A user can highlight code in their development environment, and simply ask for an explanation of what the code is doing:

Copilot X expands the reach of the coding assistant into a ChatGPT-like chat environment

While Copilot and Codewhisperer are specific implementations of large language models that are enhanced for developers, even the general purpose ChatGPT turns out to be pretty handy as an engineering companion. I’ve been using both the GPT-3.5 and GPT-4 models for a number of months across a wide variety of tasks, and continue to be bewildered by its capability — both with code and in any number of other disciplines. Recently, I was dusting off my rusty Python skills to play with the OpenAI APIs and it seemed churlish not to ask ChatGPT for help. As you’d expect by now, the general purpose chatbot was able to give me some entirely reasonable instructions to get set up that reflected the online documentation, but with the added benefit that I could walk through and expand the chat to other topics (like why Python never works properly on Windows whenever I install it).

Even the general purpose ChatGPT is pretty adept at providing a variety of assistance in technical areas

Not to be left behind by Copilot and Codewhisperer, OpenAI recently announced their own code-enhanced version, a GPT-4 model with Code Interpreter, which can both write and execute Python code. This improves on the implementation of code completion in the rather general purpose model of ChatGPT, by actually allowing it to write and run code directly in the chat environment.

ChatGPT with Code Interpreter extension can not just write code, but run it as well

So far, I’ve spent a lot of time talking about the three big players in this space; Github, Amazon and OpenAI, but others aren’t far behind. Google has announced Duet AI for Google Cloud, a code assistant in the vein of Copilot and Codewhisperer, and also AI assistance in their Colab machine learning tools (implementing Codey, a family of code models built on their own PaLM 2 model). Outside of developer specific tools, Microsoft is rolling out a brace of Copilots, in both Windows and Office (at eye watering prices), and Salesforce has recently seemingly rebranded every one of their products by putting ‘GPT’ after the product name. Not to be left behind, Google is expected to add Duet to Google Workplace which, like Office Copilot, will help people craft documents and create slightly less tedious presentations. It seems that wherever you look, someone is ploughing money into AI models that make humans more productive. On the face of it, the early signs are indicating that it’s working, and we should be preparing for the impact.

How will Product Engineering teams use these tools?

The first time that I spent a serious amount of time generating code with ChatGPT, I was actually asking it to generate some user stories. A friend had asked me for some help writing an application, and with my professional coding days now long behind me, I needed some help. My thought process was reasonably simple: “I can’t code anymore, but I can probably describe what is needed in a way that a developer can understand so we can hire a freelancer to actually write the code”. I explained to ChatGPT that I wanted it to create some behaviour driven design (BDD) stories. BDD is a reasonably well adopted approach to development that serves as an intermediary between what the customer (and the product manager) wants, and the code that the engineer will write. It’s a good test of your own understanding of a problem to write a BDD story, so naturally, I just got ChatGPT to do it for me.

An example of a BDD User Story that ChatGPT created — because wizards vs robots. That is all.

It’s important to understand that a BDD story isn’t an approximation of code that will execute, but it is a very good way to understand if the application that is delivered matches up with what a product manager really asked for (and hopefully, what a customer really wants). Having asked for these user stories, it struck me that there was no reason I couldn’t ask ChatGPT to write code that represented the stories. With a little prompt engineering, I suggested that ChatGPT create the application in Javascript, HTML and CSS (the most basic code for a simple web application). Two minutes later, I had the building blocks of an application that represented the basics of the functionality we had written the user stories for. Once I had the application running, in its waddling, newly-hatched form, I realised that any self-respecting developer would be creating tests based on the code and the user stories. So, after a brief discussion with ChatGPT about the most suitable test runner for Javascript, I asked for a set of more technical, test driven development (TDD) tests to be created, to ensure the app was meeting my original BDD stories.

ChatGPT did a credible job of recommending the Jasminne Javascript test runner, and then writing tests

What was incredibly striking about the entire process was the ease with which some of the tasks most disliked by developers were completed. For most engineers, the beauty of the work is in the creativity of finding a solution and the thrill of code running successfully. Writing tests, creating documentation, creating user stories and even bootstrapping basic code are tedious tasks that, emotionally, are best avoided. Going through this process reminded me of a thought-provoking tweet thread by Simon Wardley, the creator of the Wardley Maps approach to strategic decision making. Simon’s comment proposed that your suite of tests represent and explain the complex and interdependent map of relationships that your product actually consists of, and it’s here that a huge amount of your intellectual property lies.

https://platform.twitter.com/widgets.js

X : What’s the best way of writing a specification for a commodity? Me : Your test suite. X : Eh? Me : Every novel thing starts with a few basic tests, as it evolves it gains more, your product should be built on those test and eventually they should help define the commodity.

https://platform.twitter.com/widgets.js

Me : Your business, hardware and software should have test driven development baked in throughout wherever possible. How do you change anything in a complicated environment without it. Every single line on a map is a relationship, an interface for which there should be tests …

https://platform.twitter.com/widgets.js

… those tests define the operation of the interface, they are the specification you hand to another, that you check a product against, that you test a commodity with and then determine the trades you wish to make. Those test should expand and evolve with the thing itself.

Although tests aren’t as exciting to write as application code, and are often seen as cumbersome to maintain, good tests demonstrate the anticipated behaviour of the application. Even more importantly, tests don’t just describe how something should behave, they also demonstrate that it really does behave that way repeatedly. The definition of the scope and the details of a really valuable test suite is part of the collaborative work that product and engineering teams do together. Simon’s tweet inspired me to reflect that, perhaps, the future of a prompt engineered application starts with thoughtful test design. From this perspective, we can start to envision a future for product engineering teams that embrace Generative AI tools. The job of eliciting customer requirements, understanding potential solutions, analysing the value that will be created by serving the customer and prioritising work will remain just as important as they are today. Product management will no doubt benefit from Generative AI tools, but I suspect the impact for the product discipline will be less shocking than it is for tech teams. Which brings us back to those five engineers.

In Part 3, you can read about:

How Generative AI tools could potentially upend the longstanding ratio of 5 engineers to 1 product manager.
How tools like Github Copilot and AWS Amplify Studio could reshape product development, shifting the engineer’s focus from hand coding to design, architecture, and integration.
How Generative AI tools can assist teams who are facing painfully outdated tech, handling complex porting and refactoring effortlessly.
The possible unifying influence of AI tools on mobile and web app development, reducing duplicated efforts and bridging the skill gaps between web, Android, and iOS development.
The impact of coding automation on junior developers and engineering progression

Read Part 3 here Other articles in this series:

P.S. If you’re enjoying these articles on teams, check out my Teamcraft podcast, where my co-host Andrew Maclaren and I talk to guests about what makes teams work.