My friend Elliott and I built Cellar Door, a web app to help us identify the most beautiful word in the English language. (I won’t patronise you by explaining ‘Cellar Door’ — you can always google it!). The UX still needs work (especially on mobile), but my brother-in-law Shea is working on that.
This is a brief post to explain how Cellar Door works, and to share some thoughts on AI-enabled development (more on that here).
The Corpus
We played with a few different word lists. Our first attempt was using this corpus sourced from common words found on the web. Unfortunately, it contained too many nonsense words (poopdeck!) and the app served me this one-in-a-hundred-billion comparison in an early test run:
So we abandoned this corpus in favour of one from Google Ngrams, which you can find here. I wrote a script that used OpenAI’s API to check whether a word is valid or not:
Unfortunately, gpt-4o-mini has a 10k request-per-day limit, so I’m still going through the corpus cleaning it up. I tried using OpenAI’s batch API, but that also imposes a limit of 2m tokens per day.
The scoring mechanism
We use a standard ELO calculation for comparing words. Basically, every time you vote between two words, we update the words’ respective scores based on your choice, and the words’ prior scores. If you choose a word that already has a high score over one with a low score, both scores won’t change by much; if you pick a generally-disliked word over a critically-acclaimed one, your choice will have a bigger impact.
We used a K factor of 36, and we based this decision on absolutely nothing.
(As an aside: why don’t we use ELO for everything?
We like to rate things: we rate our Airbnbs and hotel rooms, our Deliveroo orders, our Uber drivers (and clients), our restaurants, our books, our films. But everyone knows these rating systems are broken (one of relatively few things that actually are broken!): most people give a default of 5 stars, and so comparing things is impossible.
Why don’t we use ELO across more things? E.g., after you place an order with Deliveroo, the app could ask you to compare the restaurant you ordered from to the one you ordered from last time; ditto with Airbnb.
And my favourite proposed application: applying ELO ratings to LinkedIn connections! That’d be fun to see, but unfortunately LinkedIn’s API doesn’t allow access to a person’s list of connections. You might be able to hack something together with webscrapping though.
)
Of course, this is a fool’s errand
There’s viral and there’s viral; I’d need the latter on steroids to get a good answer to the question of which is the most beautiful word in the English language — millions of votes have to be cast. Still, it’s been a fun project.
(Determining how many votes would have to be cast to determine the best word, what other assumptions are required, and what would be an optimal strategy for doing so (instead of showing words at random, which is what Cellar Door does) is a great question if you’re screening for creative thinking and quant skills!)
Coding with AI
GPT has become much better at giving me high-quality code that works. When I coded Scholargrams, GPT’s code had bugs that I had to fix myself. This time, the scripts GPT gave me were error-free.
And OpenAI’s API is a joy to work with. Super simple to set up, transparent pricing, exceptionally easy to use all-around.
But my main thesis remains unchanged: AI should have made it possible for non-technical people to code and bring their ideas to life, but the infrastructure just isn’t there. Using Github or Vercel is daunting and confusing, even for people like me who have a background in computer science. And this is just for web apps — forget creating mobile apps if you don’t have engineering experience.
This is a start-up idea just begging to be funded:
MVP: start by taking Github and Vercel and simplifying them: remove the need to install and run packages locally, strip away the jargon, integrate domain registration, and add plug-and-play functionality (emails, user profile creation, etc)
Phase II: smart LLM integration. The user describes their idea, and the tool walks them through implementing it (‘create a file here, copy-paste this code in it’ etc).
Phase III: the user just writes the prompt, and the tool does everything itself — domain registration, code, database, app upload to the stores, everything.
(Another aside: the pitfalls of new technologies
There’s a story that a group of men once sought out John Von Neumann. They asked him to build a computer to solve a problem they were working on. Von Neumann scribbled on a pad for a few minutes, then said ‘gentlemen, you don’t need a computer. I have the answer’.
This story is told to show how smart Von Neumann was, but I think there’s a more important lesson in it: that when we become so accustomed to a technology, we default to using it even when it’s not best-suited for our objectives.
In the case of the scientists who asked Von Neumann’s help, they were so set to using a computer that it didn’t occur to them they could solve their problem without one. In my case, I was elbow-deep in using GPT that it didn’t occur to me there were more obvious solutions to some problems I encountered.
For example, an issue I faced is that the definitions provided by GPT often included commas in them. Since I’m storing these definitions into a CSV, these commas were troublesome. The solution is to enclose the definition within quotation marks, so I asked GPT to do that. This request completely stumped GPT: sometimes it would enclose a single word in quotation marks; and the new instruction caused it to forget my request that the word itself not be repeated at the beginning of the definition.
I tried to refine my prompt (and the context I provided to the API) to fix this, but nothing worked. I then asked GPT itself how to design a better prompt. To its credit, GPT realised that the best solution was not to get GPT to add the quotation marks, but to do it myself in javascript, after I had got a response back:
Obviously that’s the right way to do it. Obviously it’s simpler and more elegant. But I had shifted my mindset so much towards ‘GPT is the answer to everything’ that it didn’t occur to me.
Using new technologies when there exist simpler, cheaper, and more effective solutions is suboptimal, but sadly it’s not only the result of silliness (as in my case) but also of graft — think of the countless useless crypto projects that were solutions in search of a problem. Before you set out to design a new way of doing things (or paying for ‘AI/crypto/other fad-empowered solutions’) make sure you can’t do what you’re trying to do with legacy tech.
)
And that’s about it! Happy voting!
Found my way here from Lev's Six Things. I thought poop deck was a real thing--at least as far as a seafaring term, so not a nonsense word? No matter, I wanted to throw in a vote for "thicket" a word that makes my mouth happy to say. So beautiful!
I’d like to better understand what the definition of ‘beautiful’ is in this study. Is it meant to describe a word’s appearance in written form, its sound when spoken, its common meaning? On what basis are we to judge a word’s beauty?