Implementing and Optimizing a Wordle Solver in Rust - wordle.plus

Implementing and Optimizing a Wordle Solver in Rust

Jon Gjengset
Views: 115075
Like: 1978
We implement a Wordle solver in Rust based off on the excellent
3blue1brown video on the same topic:

And then we profile and optimize it to improve the runtime from our
initial naive implementation by ~13500x. You can find the code at
.

0:00:00 Introduction
0:01:00 Wordle intro
0:04:50 What we’re doing today
0:11:24 Gathering our datasets
0:27:22 Structure the solver
0:44:04 The correctness of a guess
1:14:28 Testing the play machinery
1:30:16 Outlining the algorithm
1:38:55 Does a word match a pattern?
2:21:12 Reusing correctness computation
2:26:06 Computing a word’s “goodness”
2:49:20 Running the naive implementation
2:57:59 Profiling to the rescue
3:04:44 Avoiding allocations
3:22:05 Comparing bytes, not characters
3:31:58 Correctness computing is faster
3:42:23 HashMap iteration is slow
3:47:40 Compare bytes again
3:50:20 Trying to avoid bounds checks
3:54:42 Keep words as length 5 arrays
4:07:36 Only initialize remaining once
4:21:00 Back to length 5 arrays
4:32:14 Where is compute spending time?
4:51:20 Short break
4:55:20 What if we don’t set the first word?
5:02:49 What if we start with another word?
5:07:15 Precalculating matches
5:31:20 Prefer more likely words
5:38:05 Prune known-empty patterns
5:56:24 Don’t even consider unlikely words
6:07:36 Closing thoughts

Live version with chat:

30 Comments

  1. The look up at 1:10:08 is some magic I need to learn. I'd love to be able to search specific documentation with a ! macro in the web browser like that, but no idea how that's done.

  2. As you noted, not all words are tagged with an identifier. Making the identifier pattern option in ripgrep reveals a much larger list. The pattern,

    "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?t"

    leads to just over 4.2M lines.

    Curiously, using zgrep with the same pattern (and flags -E for expanded regex, -h to not print the file names) I get even more, just over 4.3M lines. Am I wrong to use the same regex for both? Commands are pasted below.

    Ripgrep: “`rg Iz "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?t" 1*-of-00024.gz > 5-letters.txt“`
    zgrep: “`zgrep Eh "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?t" 1*-of-00024.gz > 5-letters.txt“`

  3. 1:20:27 Isn't the lifetime error occurring because the guess function is declared after the 'w' variable?

  4. 3:56:45 I recently learned the name of this phenomenon. It's called semantic satiation!

  5. At around 1:55:00 won't it be easier to just search for an unused instance of the letter in word for the Correctness::Misplaced case

  6. Do you have your init vim available somewhere? That’s a really cool theme and status bar

  7. Hello, I am a Korean web developer who is studying while listening to your lecture.
    First of all, I want to say thank you. Thanks to you, I am learning a lot about programming itself because I can see the coding process of top programmers as well as the Rust language. Thank you.
    Can I ask you a favor?
    I want you to turn on YouTube automatic English subtitles so that users of non-English speaking countries, including myself, can easily listen to lectures.
    Recent lectures have automatic English subtitles, but the early ones you uploaded do not have automatic English subtitles on YouTube.
    Please turn on the YouTube automatic English subtitles for people who are not familiar with English like me or have difficulty hearing.
    Thank you always. from your big fan in south korea.

  8. After watching this I'm exhausted! I had almost no grey-outs this episode while previous videos felt like they were in Norwegen. 😀

  9. Is there some way to look at your vim configuration?

  10. As someone who's just discovered Rust and decided his first project would be the Wordle game, you wouldn't believe how much fun it is to see you make the same assumptions with regards to compute(). It was all so very easy until I wrote tests. 😄

  11. 1:05:15 i always wonder how after he yanked some line than he can just paste it down below of the yanked line without even going there, i really need to use this on my vim, any idea how anyone?

  12. The insight around 2:21:12 (reusing correctness computation) is that if you plug the guess as the expected answer and it yields a different mask (to the one you actually got previously) against the previous guess – then it cannot be the correct answer. If it were the correct answer, it would have to do exactly the same again as it did previously with the previous guess.

    So you can reject every remaining word that does not yield the same mask when treated as the target answer. And since you don’t know anything else about the answer except for the mask it returns, you cannot reject any word that yields the same mask as the true answer.

    That’s why you can just reuse the game logic and plug your candidate as the expected answer and compare the masks returned previously by actual game and now by the guess.

  13. Also, I’m not sure if you change that later, but so far (around 3 h in) you seem to be only considering remaining candidates as valid guesses. I believe the original algorithm considered all words accepted by Wordle (even if they’re already eliminated by the masks returned from previous guesses) – obviously you know they won’t be the right answer, but one of them still might be optimal for information gain.

    For example you might guess a word which has a letter X repeated in the same place where it already was yellow previously (thus not a correct answer), but it has such a combination of other letters that the rest of the returned mask will further narrow down the candidate list (and would do that better than any of the remaining candidates, as they might eg. give you insight only if they are the right answer, but wouldn’t narrow down anything if they’re not – so you might want to trade off the chance of winning in this step to gain more info for better chance in the next step).

  14. Some more insight relevant to 1:41:21 and 2:23:38:

    I ended up implementing the unoptimized algorithm on my own (before watching the relevant part of the video) and comparing that in retrospect. For the functionality of Guess.matches (described on 1:41:21), I simply used Correctness::check (later reached on 2:23:38), then verified that the returned mask is equal to the mask inside &self.

    Here's the logic:

    I was thinking in the lines of "I cannot possibly retain a word that, had their roles been swapped (answer and guessed word), will not provide the same mask".

    It's useful to think of the mask as a bilateral non-directional relation between two words. It does not care which is the answer, and which is the guessed word, but provides information that relates the letters of both words to each other.

  15. I guess the biggest impact would be to calculate the probability of a mask in a different way: you could generate a mask for every word against your potential guess, than increment counters on a hashmap in witch the masks are keys. At the end you divide each count by the total number of words, this would eliminate one big loop. I did this and I could run the entire thing on 100ms (including de first guess being calculated).

  16. Watching this now and tried Wordle, the word was "Rusty"….what a weird coincident 😂

  17. 40:15 That loop isn't infinite though, is it? Surely it'll end once it has ran with i == i32::MAX?

  18. What's with the different amounts of red and white squares next to your rust-analyzer(?) errors? What do they mean?

  19. 42:24 actually `n..` is an infinite loop, with the caveat that in debug builds it will panic when reaching usize::max (in release it will wrap)

  20. Admittedly not through the 6 hours, but the naive guesser could have started by marking all yellows, and then override them with green (the same as starting with incorrect is overridden by marking yellow). By switching the order you wouldn't have to keep track of the previous used, right?

  21. Good stuff! I'm following along on a Macbook from the very beginning. "awk '{print $2}' 5-letters-lc-sorted-combined.txt | paste -sd+ | bc" wouldn't run. I'm not sure why, but I skipped that single command. The rest is awesome. I appreciate your content immensely.

  22. Nitpick: I thought the same thing you did at firs but upon some thought I realized that your comment at 3:40:00 is actually wrong. If G gives mask C against A, A doesn't have to give mask C against G. Imagine: G: "awxyz", A: "bacde" -> G|A = "MWWWW", A|G = "WMWWW". However the code is still correct as explained by others here.

  23. 1:04:00 Is there really a reason to write test cases you can prove redundant by analysis? I write tests when it's non-trivial to prove a function's behaviour mathematically, or for edge cases that are non-trivial to consider.

  24. Can you implement something like libp2p2 in rust keeping blockchain in mind and create a series please?

  25. It doesn't just walk through the list of words it scrambles them in a certain predictable way so you can't just look up the day's answer without going through the indexing algorithm

  26. So awesome to watch the process end to end of identifying slow points in the code. Keep up the great work!

  27. I think I learned more from this one video than I learned in my entire first year of college. This was an amazing stream. Would love to see more of these!

  28. During the correctness check: If you loop over the answers you do not need the `let mut used = [false; 5];` array. If answer == guess you can mark it as correct and if it isn't loop over the guess array and mark the first matching one that is still set to Wrong as Misplaced (if I understood the problem correctly).
    (At minute 55)

  29. I especially appreciated your impersonation of a grumpy old man ^^
    That was pretty hilarious.
    And the coding part was of course amazing as well!

  30. 26 minutes into the video (feels like 2) and he says, "…and now this is a good place to start." I've found my perfect video! ❤

Leave a Reply

Your email address will not be published. Required fields are marked *