Solving Wordle using information theory -

Solving Wordle using information theory

Views: 1013594
Like: 43618
An excuse to teach a lesson on information theory and entropy.
Help fund future projects: ​
Special thanks to these supporters:
An equally valuable form of support is to simply share the videos.

Note, the way I wrote the rules for coloring while doing this project differs slightly from the real Wordle when it comes to multiple letters. For example, suppose in a word like “woody” the first ‘o’ is correct, hence green, then in the real Wordle that second ‘o’ would be grey, whereas the way I wrote things the rule as simply any letter which is in the word somewhere, but not in the right position, will be yellow.

To be honest, even after realizing this differed from the proper rule, I stuck with it because it made the computation of the full matrix of word-combination patterns more elegant (and faster), and the normal rule has always slightly bothered me. Of course, it doesn’t make any difference for the actual lesson here on entropy, which is the primary goal, and at least as I’ve gone back tried rerunning some of the models with the correct convention, it doesn’t really change the final results.

0:00 – What is Wordle?
2:43 – Initial ideas
8:04 – Information theory basics
18:15 – Incorporating word frequencies
27:49 – Final performance

Original wordle site:

Music by Vincent Rubinetti.

Shannon and von Neumann artwork by Kurt Bruns.

Code for this video:

These animations are largely made using a custom python library, manim. See the FAQ comments here:

You can find code for specific videos and projects here:


3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe:

Various social media stuffs:


  1. 7:24 Got wormy and wryly but not wordy which is funny considering the games called wordle and it's about words.

  2. A master piece as always, thank you !

  3. 18:52 how could you not highlight the word COULD when you said it right after THESE and OTHER and before ABOUT? it was the perfect sentence, I can't tell if you just forgot to animate it or if you didn't realize that you said the word COULD as well since it was fast, I'm leaning toward the former

  4. My fave starting words are siren, octal, dumpy. I rarely need all three before i have enough info. Still need 4 or 5 entries before i get to the correct word, so.. 🙂

  5. For words that start with w, end in y, and contain an r that is not in the fourth place, there's also "warby" and "warty" from the accept-worlds list.

  6. A bit interested why slate is better then tales or stale or some other combination of same letters.

    Have to be about chance of getting green or yellow answer.

  7. AUDIO lets you sample 4 vowels in the first go, it helped me get it in 3 easy tries today.

  8. Here's one for you, given peoples daily scores can you calculate the odds that someone is cheating?

  9. I've been using "space" and that also seems to be a good opener

  10. 9:50 logarithms were just invented because mathematicians were tired of writing zeroes.

  11. also please look into the word starter AUDIO, i feel like taking advantage of the fact that there are only really 5 vowels could be game breaking, like by guessing a word like AUDIO, depending on which vowels hit, you're also getting information about the probability of E, essentially giving you 6 letters that you're learning about from the first guess, and then if testing for E in the 2nd guess is the optimal option, whether or not the E hits will determine the vowels in the word, and hopefully the consonants that have been guessed already will allow you to solve it in 1 or 2 more guesses, I just feel like the vowels haven't been utilized enough by the bots but i could be mistaken

  12. "Dedicated players will always find a way to min-max the fun out of any game"
    Nice video!

  13. This video was sadly a bit hard to follow. At 15:21 I see that slate gives 4.49 bits of info, yet it shows we know there is an s and an a in the word, which, based on the explanation of what "Information" means, earlier in the video, I thought it was 2 bits of information (we remove words with no s, and those with no a).
    I guess it's 4.49 and not 2 because knowing there is an s or an a doesn't divide by 2 the space of all possibilities, but by a different number ?

  14. I was always starting with "Adieu" followed by "Ghost" to test all the vowels. Depending on what's ruled out, this can also get rid of ch, sh, ph, -igh, th, ed, es, combos. However I've noticed that using this results in giving me something like:

    Oh, it's "-akes". Let's try:


    and I end up losing because although I got the vowels, I didn't rule out enough.

  15. Interesting. I think many people will intuitively (and incorrectly it turns out) implement 'hard mode' whether they have hard mode on or not. i.e once they learn there's a P in the answer every subsequent guess will contain a P (either in the right place or moving it depending on green/yellow) but the optimal strategy turns out to at least initially ignore your guess and get more information by using as many different letters as you can. I noted this on the ITV daytime programme called 'LINGO' which is basically wordle on 4, 5 and (for the final) 6 letter words (and, as an aside, makes you wonder why the NYT paid 7 figures for a format that predates this guy writing any code – I guess their main interest was the URL that millions of people visit every day rather than any copyright)

    One difference the TV show has is they give contestants the first letter – and if you make a wrong guess (a word that doesn't exist) your turn ends. Pretty much all the contestants stick with that first letter and use as many letters as they've found (at least the ones that don't make a complete hash of playing) Often in LINGO on the 4-letter words especially you reach a point where it might begin 'FOO?' and there's food, fool, foot etc but you have only 1 or 2 guesses left. Typically contestants just take a punt at 2 of the possibilities and hope for the best. It seemed to me it would be better to ignore the FOO and try a word that had D L and T, say, TOLD, and then you'd learn which of the 3 was in the answer so you could be sure of a win. Although the TV show is played against the clock so you'd do well to achieve this under the pressure of being on TV against the clock.

    In this analysis though I'm not really happy that you ignored the smaller known answer dictionary but use some arbitrary 'most common words' thing to improve your score. The truth here is by doing the latter you're acknowledging that you know the answer dictionary is only a subset of the full accepted list of words whilst pretending you don't know that. Which makes zero sense imo. Clearly if you know there's only 3000 or whatever possible valid answers either use that information or don't – but not using it should mean any of the 12000 dictionary words should be as likely as any other (and only reduced by the information you've gained from guessing) because it's only either by looking at the wordle source code (or using hindsight) that you could gain this information. Indeed a cheat (looking at the code) knows that if a word has appeared before it's never going to be an answer – the dictionary is shrinking by one word every day, which will massively improve your score over time to the last game when you'll know what word remains. Whereas an honest "no prior knowledge" should always start with 12000 possibilities for the first guess (anything less is kidding yourself that you're not using knowledge from the source code of wordle that you really are)

  16. dont worry about the length of the video being too long man.

  17. Good work, but the bit where I really oohed and aahed was the pronunciation of aahed…

  18. Best first word is "crate". "crane" is good, but t is more likely to appear than n.

    Edit: Best first three words are:
    Consist of 15/24 alphabet, have all the vocals aiueoy, guarantee correct on 4th or 5th try or even 3rd if you're lucky on first two. And you can change POUND into into BOUND, if you think B is more likely, for example BRAME, BEAMS, BREAD, etc.

  19. I did some programming to try to identify the best openers to lead in to manual play, and I struck upon SLATE myself because I gave greater weight for green letters specifically; I find them much more useful for mentally considering options than yellows. (If slate doesn't give me enough information for real guesses, the second turn is CORNY. I win on 3 a pretty decent percentage of the time.)

  20. Like a chess bot, could you build the bot to do multiple future lookups? Like not just. Checking the entropy of the current next guess, but the combined best entropy of this guess and the next guess? Or maybe the best entropy of this guess and the worst of the following? To see which pair of guesses from this spot done blindly would yield the best results?

  21. Now try to break Evil Wordle) Thanks for a video, you're very cool

  22. I like starting with MOIST. It's not rude in any way, but it still has that certain sub-tone.

  23. I start with
    1 AIRED
    2 POUTS
    All the vowels and common letters R S T D

  24. This bot only thinks one step ahead. You can do better with the same data by considering both this and the next move (and so on) in the distribution.

  25. What happens if each subsequent guess is restricted to just the possible words remaining?
    I noticed that the 'guess list' on the right side included words with letters that were already ruled out by being 'grey squared' on previous guesses..

  26. I spent way too long trying to come up with a good next guess for the purely illustrative wordle at 2.47
    I would guess START

  27. Thanks for the great video as always!
    Some thoughts.
    Mathematically, entropy is a good measure to estimate the next "good" guess.
    If you were going to use a lookahead anyways, you can simply use the naive expectimax search.
    This would yield the most optimal result but comes with some computation costs.
    I believe your algorithm was a mixture of expectimax and monte-carlo tree search.
    Also, probably the biggest limitation of the naive expetimax solution is that it might not make a good math video like this one. I have to say that your approach was more optimized for learning.

  28. What if we find after 5 years that our brains proceed in a similar way to calculate outcomes? LOL I just became a bio bot!!

  29. The next question surely is how good could you get the bot if you weren't constrained to using actual words for your initial guesses

  30. After watching this I played "day 652", and tried "crane" first with 3 green, and second turn "chase" for the win! I'm glad I didn't try goose first!

  31. 7:27. Tool doesn't ttake into account that "worry" is also a word.

  32. If Mr.John von Neumann was alive now, he would have completely established a quantum computer.

Leave a Reply

Your email address will not be published. Required fields are marked *