More hacking on automating Wordle
Just a warning: if you are playing Wordle and don’t want any hints, this is not the Medium entry for you. Also, I am not a mathematician or statistician, so everything is just based on my knowledge of cryptanalysis and nothing more.
One other thing to note is that the word list I am working with is not Wordle’s as I intentionally wanted to avoid looking at its source code.
Okay. So I had one remaining question from my last entry on Wordle: what is the most efficient starting word overall?
“Cares” has a word score of 967.4, whereas “audio” gets 288.4 and “ports” has 796.6. The point of the game is to produce an answer in as few steps as possible and while you will knock off a lot of vowels and consonants with those two, you are likely reducing the chances of answering in the fewest attempts.
My choice of word also includes the letter “e”, which is the most common letter used in the English language. On average, you should see that letter appear about 10–13% of the time, with the letter “a” being about 7–9%, and “s” being 6–9%.
You can also order it as “scare” or “races” too if you desire, but I believe that with “s” at the end and putting “r” in the middle, you cover all initial possibilities quite effectively here.
I did not like what I wrote here, but at the time I didn’t really run numbers on outcomes. However, I was working on this because you cannot really automate Wordle without a good starting word.
Using the same logic from before, the process to find the winning word was fairly simple.
For each starting word, we then ran through each possible five letter word other than itself. It would run each guess against an implementation of Wordle in my own code with the same hints and correct guesses. Once it got to an answer, it would then record all attempts into a set and then move on to the next word.
Unlike Wordle, there was no guess limit as I wanted to know how many attempts it would take using the methodology I did before where I came up with a list based on efficiency. So the first word based on possibilities made by the hints and correct letter guesses available from the list was the word chosen.
There are flaws with this list sorting as discussed on Twitter, but I wanted to stick to my current methodology, so we’ll proceed forward.
The worst words
Defining the worst word can be achieved one of two ways. We can either look for a group of words or a single word that has the most guess attempts or we can look at them on average.
A minimum of the number of guesses is quite simple: one if you’re lucky, but if the word doesn’t match, then two. Guessing what the largest number of guesses will be took a little bit of magic.
The words that managed to hit a the maximum-seen attempts of 14 were: “agile”, “anile”, “dells”, “duels”, “dulls”, “dully”, “exile”, “gaily”, “guile”, “gulls”, “gully”, “jails”, “jello”, “jells”, “jelly”, “jinns”, “jolly”, “keels”, “kelly”, “lolls”, “lolly”, “lulls”, “nails”, “ninny”, “nulls”, “quell”, “rails”, “rials”, “villa”, “ville”, “villi”, “yella”, “yells”, and “yield”. If you are to notice a pattern here, almost all of them have the letter ‘l’ in them, but this isn’t really the reason why they performed badly.
Word scores are all over the place with these words, which leads me to believe that it is not necessarily indicative of a good start. For example, “quells” is amongst that list and had a score of 307.4, but then “rails” received 768.6. The average score for all of the words was 531.1, so I’m not confident all that much in using that as a metric for a good starting word.
So then what about averages? This gives us a better picture because Wordle limits us to six guesses. I would suggest if a word has a minimum average of 4 or higher that it should not be used at the start.
The worst performing word had an average of 4.26 attempts, which was “jujus”. This was followed by “wheee” at 4.23, and “yukky” at 4.21. There were three tied for 4.2, and they were “vexes”, “sexes”, and “esses”.
The word score “jujus” was 506.6, but interestingly, “sexes” received 888.6.
Is there a guaranteed winner?
The short answer is: no.
All words with this methodology ranged from 8 to 14 maximum attempts. No single word went below 8 attempts, meaning there are starting words which will lead to failure.
There is a silver-lining, the number of words with the smallest maximum is only slightly larger than the number of words with the largest maximum.
There are 1,764 words of which had 12 maximum attempts, 2,085 with 11, 984 with 10, and 411 with 9. However, with 8, there are only 39 words, and all of them might justifiably be the best candidates.
The words that performed the best were “blest”, “deism”, “doest”, “durst”, “feist”, “heist”, “least”, “liest”, “merse”, “midst”, “obese”, “pseud”, “sebum”, “sedan”, “sedge”, “sedgy”, “sedum”, “seize”, “sepia”, “serge”, “serif”, “serum”, “sheaf”, “sheik”, “sherd”, “siege”, “sieve”, “skein”, “skirt”, “slept”, “speck”, “spend”, “sperm”, “steal”, “suede”, “swell”, “swift”, “welsh”, and “wrist”.
Going back to word scores, we arrive upon something curious: the range of scores is between 277.8 and 526, with the average being 398.4. Perhaps the way to look at the word score is not at whether it is strong at the start, but instead if it eliminates all of the least likely? This would suggest that words such as “obese” or “pseud” could be good starting candidates.
Could we find a most-likely?
This is where I want to deviate: a maximum does not imply that it is the worst choice. And in this case, I am willing to believe this because I did find something amongst all of the words that allows me to suggest that there is either a flaw in my methodology or I am looking at this all wrong.
The words “estop”, “epsom”, and “serum” had an average attempt count of 2.74, 2.77, and 2.79 each respectively. When I looked at their word scores, “estop” had 160.4, “epsom” had 165.8, and “serum” had 418.
I decided to look at all words of which were lower than 2.9, and within the the 54 words, the minimum score was 106.6 and the maximum was 496, with an average of 336.3. I think I know where the word score’s usefulness may lie: it’s better to start with a lower than a higher one.
When I spoke about maximums earlier, “estop” found itself with a maximum of 9 attempts. In fact, “serum”, which had a slightly higher average than “estop” found itself with a maximum of 8 — “epsom” received 9.
But this is still an incorrect approach.
We’re looking at the average and what I’d like to know is what has the highest percentage of wins within the limitations imposed by the game. We have six guesses, so what word is best suited on average to produce an outcome.
With these parameters in mind, we can then use the following methodology: which word is most likely to get a guess correct within those six attempts and then not consider the steps beyond that point since they’re not indicative of what would be considered a success.
Of the 5,756 words tested, one word managed to achieve a win in three moves 69.4% of the time: “verso”. It also achieved a win in two moves 10.8% of the time, four was 13.1%, five was 4.6%, and six was 1.5% — a 99.4% successful starting word.
But this is not the best word even though it is the best word for the quickest win — fortunately we already know the best one to use in order to win.
As mentioned before, “estop” had an average of 2.74 attempts before a win, but importantly it has the highest success rate. It edged out “verso” by achieving a 99.6% win rate with 46% of all attempts being achieved in its first two guesses and then three in 40% — four was 9.4%, five was 3.2%, and six was 0.9%.
With a failure on only 19 words from the word list — compared to 30 for the other word I was testing with — starting with just “estop” should net you a win most of the time.
So in my opinion: “estop” is by far the best word.
I’m happy to share the data I have made and I welcome you to demonstrate anything you have done with it. However, I must reiterate that I am not a statistician nor do I consider myself adept with mathematics. What I will say that I was out to have fun to solve this challenge and I feel like I have.
I wanted to play this game unconventionally and I think I accomplished that goal. There are some further ideas to achieve a 100% success rate or at least get extremely close to it, but being that I am at 99.6% successful on just this algorithm alone, it does make me think that this little project can be considered “ended”.