The number of times users are pleasantly surprised is a reliable heuristic to measure a products likelihood to acquire loyal customers. It’s especially good when these users are surprised enough to spend time writing online about their experience with the product.
I wrote the above quote in January to describe how I felt about ChatGPT and the public sentiment around the product. I think there’s something interesting and under-reflected about the aftermath of ChatGPT’s release. For example, people twitter tagging Sam Altman and others they knew at OpenAI after trying the tool to express their gratitude and share what they queried. It could be for lack of exposure but I haven’t seen such public exuberance with other products.
But as I spend more time with the tool, my exuberance declines because the number of flaws I encounter increases.
I respect Sam and the team for shipping and prioritizing feedback from users, like me, who share their experiences online. I’ve even written about ChatGPT and its potential for innovation in education, particularly for lower income students who can homeschool themself by stitching together ChatGPT and Youtube. The intention behind this blog post is to share my experience using the tool, specifically for SWE, where I’ve found it to be productive, and where it isn’t. My intention is not to be condescending to an iterative tool.
Where it excels:
- Writing boilerplate code: Especially for well-documented, typical use cases like web scrapers (python and bs4) or websites (html, css, js). Boilerplate code is ubiquitous on GitHub, most obvious in the example of web development. Many developers create websites to display their tools and all websites follow the same fundamental structure which is easy for the transformer to understand and output to a user. This problem is fundamentally because producing new code is more probabilistic than asking for boilerplate. On the spectrum of deterministic to probabilistic, ChatGPT hallucinates more on probabilistic outputs.
- Checking my intention with reality: After I write a code block in VSCode, I paste an explanation of what I want the code to do and the code I’d like it to check in ChatGPT. I’ve found this helpful because it makes the development process more modular; I can write code in smaller chunks and have conviction that it works.
- Correcting my explanations of open source code: I’ve started to build projects with open-source projects like langchain but can’t convince myself to make library calls without understanding how the code works. So, I navigate into the langchain python folders and copy/paste the code into ChatGPT with “python-like comments.” We then logic through the code together, it corrects me where I’m wrong and I clarify until I understand. This is the student-tutor relationship I wrote about in this essay.
- Translation from different programming languages: It can do well turning C++ code into Python.
Where it can be improved:
- Customized use cases: This is an obvious inverse of the writing boilerplate code point. Code that’s specific to an enterprise, for example, isn’t in ChatGPT’s training dataset so it won’t know what references an Amazon developer makes inside their codebase. Until ChatGPT is trained on the companies codebase, it can’t provide good output.
- It’s too unconfident: I had a sequence of asking ChatGPT “are you sure?” last week and each time I asked it that question, I’d get a new code output with an explanation of why it was wrong. But it was going back and forth between two different code blocks without knowing it.
- It doesn’t have good memory: I began coding with type annotations last week and asked ChatGPT to give me an output without those annotations. It followed directions the first time but failed the second time. I prompted it again. Success, then failure. Surprisingly I haven’t had this problem when researching with ChatGPT, it’s memory is fantastic in that domain.
For experienced programmers, even intermediate programers like me, I’ve found coding to be faster without ChatGPT. In fact, I think ChatGPT is a distraction for programmers doing original work.