Implementing and Optimizing a Wordle Solver in Rust - wordle.plus

Implementing and Optimizing a Wordle Solver in Rust

Jon Gjengset
Views: 115075
Like: 1978
We implement a Wordle solver in Rust based off on the excellent
3blue1brown video on the same topic:

And then we profile and optimize it to improve the runtime from our
initial naive implementation by ~13500x. You can find the code at
.

0:00:00 Introduction
0:01:00 Wordle intro
0:04:50 What we’re doing today
0:11:24 Gathering our datasets
0:27:22 Structure the solver
0:44:04 The correctness of a guess
1:14:28 Testing the play machinery
1:30:16 Outlining the algorithm
1:38:55 Does a word match a pattern?
2:21:12 Reusing correctness computation
2:26:06 Computing a word’s “goodness”
2:49:20 Running the naive implementation
2:57:59 Profiling to the rescue
3:04:44 Avoiding allocations
3:22:05 Comparing bytes, not characters
3:31:58 Correctness computing is faster
3:42:23 HashMap iteration is slow
3:47:40 Compare bytes again
3:50:20 Trying to avoid bounds checks
3:54:42 Keep words as length 5 arrays
4:07:36 Only initialize remaining once
4:21:00 Back to length 5 arrays
4:32:14 Where is compute spending time?
4:51:20 Short break
4:55:20 What if we don’t set the first word?
5:02:49 What if we start with another word?
5:07:15 Precalculating matches
5:31:20 Prefer more likely words
5:38:05 Prune known-empty patterns
5:56:24 Don’t even consider unlikely words
6:07:36 Closing thoughts

Live version with chat:

74 Comments

  1. Shouldn't you prune the patterns only after you know that the candidate you're testing will be the actual guess? Like, an empty pattern for a candidate might not be empty for another one, no?

  2. @jonhoo, you should check out the Aur helper Paru! It's written in Rust! 🙂

  3. I was thinking that the pruning might change the results, because maybe a pruned word might give more bits of information than one that was not pruned

  4. Let's see you make a Nerdle solver now

  5. NADPH is a byproduct of a number of biological chemical processes, in case anyone else was wondering why a seemingly random string of letters was so common.

  6. 1:20:30 just cast it:
    (|_history: &[Guess]| "moved".to_string()) as (fn(history: &[Guess]) -> String)

  7. Great video, I would love to look at your vim config, what Language Server are you using for rust?

  8. I have a ray-tracing project that is running too slow compared to alternatives. I assume it has to do with Computer Science stuff and not necessarily the algorithms themselves (although they could be improved). If you want to use it as an example, I'd love to connect!

  9. Great to see collaboration between channels specially these 2 that are great!

  10. I'd suggest that you can use fzf to find your history of command line, I noted you're using fish shell but the examples of the configuration that enables your shell to do some fuzzy search with your system are mostly in bash/zsh (i didn't check if fish has examples as well), you can take a look and see if that suits you : )

  11. Just came to YouTube for some Swedish Death Metal… but here I am… to stay lol

  12. To be honest I don't understand what the popularity of words has to do with the solver. My understanding is that there are

    1. a set of words that can be guessed and
    2. a set of words that might be the answer (strict subset of 1)

    and the answer is uniformly picked from (2). Is that incorrect? Because if that's how it works I can't see how the real-world usage frequency of words has any bearing on the problem.

  13. Based on playing the game, one should not limit to the opening word. It should be opening 2 words and basically we should build a tree for the solution based on the word. I am typing this as I am watching the video

  14. how did you change the menu bar position of your firefox?

  15. I was wanting a sort of crash course on Rust and watching you work actually taught me so much more! I love the way you break down problems and it’s not too difficult to follow.

    Keep it up man. You have a new follower!

  16. Very nice video! I found it very entertaining and enlightening. I love 3blue1brown, so I thoroughly enjoyed the topic.

    I do want to note that the explanation about the correctness computation giving the answer to the matches query given around 3:40:20 is not quite accurate. If for instance the guess is abbbb for an answer of cccca, the correctness mask is [M W W W W], because the a in the first position is misplaced and all the b's are wrong, but if the situation is reversed and you guess cccca for the answer abbbb, the correctness mask is [W W W W M], because it is the a in the fifth position that is misplaced this time.

    Also, I'm not sure if I understood the pattern pruning optimization correctly. As I understand it, each time you play a game, you start with the full set of valid correctness patterns (3^5 = 243), and while deciding on each guess (after the first, which is fixed and does no computation), you iterate through all potential answers and for each answer you calculate its goodness by iterating through all remaining correctness patterns and checking how many of the potential answers would yield that pattern if you would guess the word you are evaluating. If no answers would yield that pattern, then you remove it from the list of consideration for the rest of the game. If that is indeed how it is to be understood, then I don't think this is a valid operation, since the set of valid patterns of course depends on the word that is guessed. If for instance, the set of possible answers has been so reduced that the only place a 'q' appears is at the beginning of a word (but the letter 'q' has not been confirmed or denied so far), then when a word beginning with 'q' is considered, all patterns that have yellow (misplaced) in the first position will be pruned because 'q' can't be misplaced in the first position according to the remaining answer list. But surely there are still valid patterns with yellow at the first position when guessing some different word that doesn't begin with 'q' (like those guesses beginning with the second letter of the answer).

  17. The look up at 1:10:08 is some magic I need to learn. I'd love to be able to search specific documentation with a ! macro in the web browser like that, but no idea how that's done.

  18. As you noted, not all words are tagged with an identifier. Making the identifier pattern option in ripgrep reveals a much larger list. The pattern,

    "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?t"

    leads to just over 4.2M lines.

    Curiously, using zgrep with the same pattern (and flags -E for expanded regex, -h to not print the file names) I get even more, just over 4.3M lines. Am I wrong to use the same regex for both? Commands are pasted below.

    Ripgrep: “`rg Iz "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?t" 1*-of-00024.gz > 5-letters.txt“`
    zgrep: “`zgrep Eh "^[a-zA-Z]{5}(?:_[A-Z]{3,4})?t" 1*-of-00024.gz > 5-letters.txt“`

  19. 1:20:27 Isn't the lifetime error occurring because the guess function is declared after the 'w' variable?

  20. 3:56:45 I recently learned the name of this phenomenon. It's called semantic satiation!

  21. At around 1:55:00 won't it be easier to just search for an unused instance of the letter in word for the Correctness::Misplaced case

  22. Do you have your init vim available somewhere? That’s a really cool theme and status bar

  23. Hello, I am a Korean web developer who is studying while listening to your lecture.
    First of all, I want to say thank you. Thanks to you, I am learning a lot about programming itself because I can see the coding process of top programmers as well as the Rust language. Thank you.
    Can I ask you a favor?
    I want you to turn on YouTube automatic English subtitles so that users of non-English speaking countries, including myself, can easily listen to lectures.
    Recent lectures have automatic English subtitles, but the early ones you uploaded do not have automatic English subtitles on YouTube.
    Please turn on the YouTube automatic English subtitles for people who are not familiar with English like me or have difficulty hearing.
    Thank you always. from your big fan in south korea.

  24. After watching this I'm exhausted! I had almost no grey-outs this episode while previous videos felt like they were in Norwegen. 😀

  25. Is there some way to look at your vim configuration?

  26. As someone who's just discovered Rust and decided his first project would be the Wordle game, you wouldn't believe how much fun it is to see you make the same assumptions with regards to compute(). It was all so very easy until I wrote tests. 😄

  27. 1:05:15 i always wonder how after he yanked some line than he can just paste it down below of the yanked line without even going there, i really need to use this on my vim, any idea how anyone?

  28. The insight around 2:21:12 (reusing correctness computation) is that if you plug the guess as the expected answer and it yields a different mask (to the one you actually got previously) against the previous guess – then it cannot be the correct answer. If it were the correct answer, it would have to do exactly the same again as it did previously with the previous guess.

    So you can reject every remaining word that does not yield the same mask when treated as the target answer. And since you don’t know anything else about the answer except for the mask it returns, you cannot reject any word that yields the same mask as the true answer.

    That’s why you can just reuse the game logic and plug your candidate as the expected answer and compare the masks returned previously by actual game and now by the guess.

  29. Also, I’m not sure if you change that later, but so far (around 3 h in) you seem to be only considering remaining candidates as valid guesses. I believe the original algorithm considered all words accepted by Wordle (even if they’re already eliminated by the masks returned from previous guesses) – obviously you know they won’t be the right answer, but one of them still might be optimal for information gain.

    For example you might guess a word which has a letter X repeated in the same place where it already was yellow previously (thus not a correct answer), but it has such a combination of other letters that the rest of the returned mask will further narrow down the candidate list (and would do that better than any of the remaining candidates, as they might eg. give you insight only if they are the right answer, but wouldn’t narrow down anything if they’re not – so you might want to trade off the chance of winning in this step to gain more info for better chance in the next step).

  30. Some more insight relevant to 1:41:21 and 2:23:38:

    I ended up implementing the unoptimized algorithm on my own (before watching the relevant part of the video) and comparing that in retrospect. For the functionality of Guess.matches (described on 1:41:21), I simply used Correctness::check (later reached on 2:23:38), then verified that the returned mask is equal to the mask inside &self.

    Here's the logic:

    I was thinking in the lines of "I cannot possibly retain a word that, had their roles been swapped (answer and guessed word), will not provide the same mask".

    It's useful to think of the mask as a bilateral non-directional relation between two words. It does not care which is the answer, and which is the guessed word, but provides information that relates the letters of both words to each other.

  31. I guess the biggest impact would be to calculate the probability of a mask in a different way: you could generate a mask for every word against your potential guess, than increment counters on a hashmap in witch the masks are keys. At the end you divide each count by the total number of words, this would eliminate one big loop. I did this and I could run the entire thing on 100ms (including de first guess being calculated).

  32. Watching this now and tried Wordle, the word was "Rusty"….what a weird coincident 😂

  33. 40:15 That loop isn't infinite though, is it? Surely it'll end once it has ran with i == i32::MAX?

  34. What's with the different amounts of red and white squares next to your rust-analyzer(?) errors? What do they mean?

  35. 42:24 actually `n..` is an infinite loop, with the caveat that in debug builds it will panic when reaching usize::max (in release it will wrap)

  36. Admittedly not through the 6 hours, but the naive guesser could have started by marking all yellows, and then override them with green (the same as starting with incorrect is overridden by marking yellow). By switching the order you wouldn't have to keep track of the previous used, right?

  37. Good stuff! I'm following along on a Macbook from the very beginning. "awk '{print $2}' 5-letters-lc-sorted-combined.txt | paste -sd+ | bc" wouldn't run. I'm not sure why, but I skipped that single command. The rest is awesome. I appreciate your content immensely.

  38. Nitpick: I thought the same thing you did at firs but upon some thought I realized that your comment at 3:40:00 is actually wrong. If G gives mask C against A, A doesn't have to give mask C against G. Imagine: G: "awxyz", A: "bacde" -> G|A = "MWWWW", A|G = "WMWWW". However the code is still correct as explained by others here.

  39. 1:04:00 Is there really a reason to write test cases you can prove redundant by analysis? I write tests when it's non-trivial to prove a function's behaviour mathematically, or for edge cases that are non-trivial to consider.

  40. Can you implement something like libp2p2 in rust keeping blockchain in mind and create a series please?

  41. It doesn't just walk through the list of words it scrambles them in a certain predictable way so you can't just look up the day's answer without going through the indexing algorithm

  42. So awesome to watch the process end to end of identifying slow points in the code. Keep up the great work!

  43. I think I learned more from this one video than I learned in my entire first year of college. This was an amazing stream. Would love to see more of these!

  44. During the correctness check: If you loop over the answers you do not need the `let mut used = [false; 5];` array. If answer == guess you can mark it as correct and if it isn't loop over the guess array and mark the first matching one that is still set to Wrong as Misplaced (if I understood the problem correctly).
    (At minute 55)

  45. I especially appreciated your impersonation of a grumpy old man ^^
    That was pretty hilarious.
    And the coding part was of course amazing as well!

  46. 26 minutes into the video (feels like 2) and he says, "…and now this is a good place to start." I've found my perfect video! ❤

Leave a Reply

Your email address will not be published. Required fields are marked *