George V. Reilly https://www.georgevreilly.com/ en Tue, 26 Sep 2023 07:00:00 GMT https://www.georgevreilly.com/rss/ acrylamid 0.8.dev0 Exploring Wordle https://www.georgevreilly.com/2023/09/26/ExploringWordle.html <p>Unless YOUVE LIVED UNDER ROCKS, you've heard of <a class="reference external" href="https://en.wikipedia.org/wiki/Wordle">Wordle</a>, the online word game that has become wildly popular since late 2021. You've probably seen people posting their Wordle games as grids of little green, yellow, and black (or white) emojis on social media.</p> <div class="line-block"> <div class="line">Wordle 797 4/6</div> <div class="line"><br /></div> <div class="line">⬛ ⬛ ⬛ ⬛ 🟨</div> <div class="line">🟨 ⬛ 🟩 ⬛ ⬛</div> <div class="line">⬛ ⬛ 🟩 🟨 ⬛</div> <div class="line">🟩 🟩 🟩 🟩 🟩</div> </div> <p>The problem that I want to address in this post is:</p> <blockquote> Given some <tt class="docutils literal">GUESS=SCORE</tt> pairs for Wordle and a word list, programmatically find all the words from the list that are eligible as answers.</blockquote> <p>Let's look at this four-round game for Wordle 797:</p> <table class="wordle"> <tr><td class="absent">J</td> <td class="absent">U</td> <td class="absent">D</td> <td class="absent">G</td> <td class="present">E</td> <td class="gs">JUDGE=....e</td></tr> <tr><td class="present">C</td> <td class="absent">H</td> <td class="correct">E</td> <td class="absent">S</td> <td class="absent">T</td> <td class="gs">CHEST=c.E..</td></tr> <tr><td class="absent">W</td> <td class="absent">R</td> <td class="correct">E</td> <td class="present">C</td> <td class="absent">K</td> <td class="gs">WRECK=..Ec.</td></tr> <tr><td class="correct">O</td> <td class="correct">C</td> <td class="correct">E</td> <td class="correct">A</td> <td class="correct">N</td> <td class="gs">OCEAN=OCEAN</td></tr> </table><p>The letters of each guess are colored Green, Yellow, or Black (dark-gray).</p> <ul class="simple"> <li>A Green tile 🟩 means that the letter is <strong>correct</strong>: <tt class="docutils literal">E</tt> is the third letter of the answer.</li> <li>A Yellow tile 🟨 means that the letter is <strong>present</strong> <em>elsewhere</em> in the answer. There is a <tt class="docutils literal">C</tt> in the answer; it's not in columns 1 or 4, but it is correct in column 2. Likewise, an <tt class="docutils literal">E</tt> is present in the answer; it's not in column 5, but it's correct in column 3.</li> <li>A Black tile ⬛ is <strong>absent</strong> from the answer: <tt class="docutils literal">J</tt>, <tt class="docutils literal">U</tt>, <tt class="docutils literal">D</tt>, <tt class="docutils literal">G</tt>, <tt class="docutils literal">H</tt>, <tt class="docutils literal">S</tt>, <tt class="docutils literal">T</tt>, <tt class="docutils literal">W</tt>, <tt class="docutils literal">R</tt>, and <tt class="docutils literal">K</tt> do not appear anywhere in <tt class="docutils literal">OCEAN</tt>.</li> </ul> <p>(This definition of “absent” turns out to be inadequate, as you will discover later.)</p> <p>The <tt class="docutils literal">GUESS=SCORE</tt> notation is intended to be clear to read and also easier to write than Greens and Yellows. For example:</p> <div style="text-align: center; font-family: &#x27;Source Code Pro&#x27;, monospace; font-size: 48px;"> <div><i>GUESS=SCORE</i></div> <div>CHEST=c.E..</div> </div> <table class="wordle"> <tr><td class="present">C</td> <td class="absent">H</td> <td class="correct">E</td> <td class="absent">S</td> <td class="absent">T</td></tr> </table><ul class="simple"> <li>the <em>uppercase</em> <tt class="docutils literal">E</tt> at position 3 in the score denotes that <tt class="docutils literal">E</tt> is in the <strong>correct</strong> position (i.e., green 🟩);</li> <li>the <em>lowercase</em> <tt class="docutils literal">c</tt> at position 1 in the score denotes that <tt class="docutils literal">C</tt> is <strong>present</strong> somewhere in the answer, but it is in the wrong position (yellow 🟨);</li> <li>the <tt class="docutils literal">.</tt>s in the score at positions 2, 4, and 5 denote that the corresponding letters in the guess (<tt class="docutils literal">H</tt>, <tt class="docutils literal">S</tt>, and <tt class="docutils literal">T</tt>, respectively) are <strong>absent</strong> from the answer (black ⬛).</li> </ul> <div class="section" id="deducing-constraints"> <h3>Deducing Constraints</h3> <p>What can we deduce from the first three rows of guesses, <tt class="docutils literal"><span class="pre">JUDGE=....e</span> CHEST=c.E.. <span class="pre">WRECK=..Ec.</span></tt>?</p> <p>There is a set of <em>valid</em> letters, <tt class="docutils literal">C</tt> and <tt class="docutils literal">E</tt>, that are either <em>present</em> (yellow 🟨) or <em>correct</em> (green 🟩). Both <tt class="docutils literal">E</tt> and <tt class="docutils literal">C</tt> start out as present, but <tt class="docutils literal">E</tt> later finds its correct position, while <tt class="docutils literal">C</tt> does not.</p> <p>There is a set of <em>invalid</em> letters that are known to be <em>absent</em> from the answer (black ⬛): <tt class="docutils literal">J</tt>, <tt class="docutils literal">U</tt>, <tt class="docutils literal">D</tt>, <tt class="docutils literal">G</tt>, <tt class="docutils literal">H</tt>, <tt class="docutils literal">S</tt>, <tt class="docutils literal">T</tt>, <tt class="docutils literal">W</tt>, <tt class="docutils literal">R</tt>, and <tt class="docutils literal">K</tt>.</p> <p>The remaining letters of the alphabet are currently <em>unknown</em>. When they are played, they will turn into <em>valid</em> or <em>invalid</em> letters. Unless we already have all five correct letters, we will draw candidate letters from the unknown pool.</p> <p>Furthermore, we know something about <em>letter positions</em>. The <em>correct</em> letters are in the correct positions, while the <em>present</em> letters are in the wrong positions.</p> <p>A candidate word <em>must</em>:</p> <ol class="arabic simple"> <li>include all valid letters — <tt class="docutils literal">C</tt> and <tt class="docutils literal">E</tt></li> <li>exclude all invalid letters — <tt class="docutils literal">JUDGHSTWRK</tt></li> <li>match all “correct” positions — <tt class="docutils literal">3:E</tt></li> <li>not match any “present” positions — <tt class="docutils literal">1:C</tt>, <tt class="docutils literal">4:C</tt>, or <tt class="docutils literal">5:E</tt></li> </ol> <p>These constraints narrow the possible choices from the word list.</p> <p>The obvious way to solve this with a computer is to codify the constraints provided by previous guess–score pairs and run through the entire list of words to find eligible words. But no human solves Wordle by methodically examining thousands of words. Instead, you rack your brain for “what ends in <tt class="docutils literal">SE</tt> and has an <tt class="docutils literal">M</tt>?” or “I've tried <tt class="docutils literal">A</tt>, <tt class="docutils literal">E</tt>, and <tt class="docutils literal">I</tt>; will <tt class="docutils literal">O</tt> or <tt class="docutils literal">U</tt> work?” or “What are the most likely letters left on the keyboard at the bottom?”</p> <p>This article will show you how to solve Wordle programmatically. It won't help you much in playing Wordle by hand, though you may understand more about the game when you're finished reading.</p> </div> <div class="section" id="prototyping-with-pipes"> <h3>Prototyping with Pipes</h3> <p>Let's prototype the above constraints with a series of <a class="reference external" href="https://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/">grep's</a> in a <a class="reference external" href="https://en.wikipedia.org/wiki/Pipeline_(Unix)">Unix pipeline</a> tailored to this <tt class="docutils literal">OCEAN</tt> example:</p> <pre class="code bash literal-block"> <span class="c1"># JUDGE=....e CHEST=c.E.. WRECK=..Ec. </span><span class="w"> </span>grep<span class="w"> </span><span class="s1">'^.....$'</span><span class="w"> </span>/usr/share/dict/words<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="c1"># Extract five-letter words </span><span class="w"> </span>tr<span class="w"> </span><span class="s1">'a-z'</span><span class="w"> </span><span class="s1">'A-Z'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="c1"># Translate each word to uppercase </span><span class="w"> </span>grep<span class="w"> </span><span class="s1">'^..E..$'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="c1"># Match CORRECT positions </span><span class="w"> </span>awk<span class="w"> </span><span class="s1">'/C/ &amp;&amp; /E/'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="c1"># Match ALL of VALID set, CORRECT|PRESENT </span><span class="w"> </span>grep<span class="w"> </span>-v<span class="w"> </span><span class="s1">'[JUDGHSTWRK]'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="c1"># Exclude INVALID set </span><span class="w"> </span>grep<span class="w"> </span><span class="s1">'^[^C]..[^C][^E]$'</span><span class="w"> </span><span class="c1"># Exclude PRESENT positions</span> </pre> <p>gives:</p> <pre class="literal-block"> ICENI ILEAC OCEAN OLEIC </pre> <p>(This was in Bash, on macOS 13.6. Zsh doesn't like the comments in the middle of the multi-line pipeline, so you may have to omit them. Other operating systems will have different versions of <tt class="docutils literal">/usr/share/dict/words</tt> that may not have all of these obscure words.)</p> <p>We can accomplish this with only the simplest features of regular expressions: the <a class="reference external" href="https://www.regular-expressions.info/dot.html">dot metacharacter</a> (<tt class="docutils literal">.</tt>), <a class="reference external" href="https://www.regular-expressions.info/charclass.html">character classes</a> (<tt class="docutils literal"><span class="pre">[JUD...]</span></tt>) and negated character classes (<tt class="docutils literal">[^E]</tt>), and the <tt class="docutils literal">^</tt> and <tt class="docutils literal">$</tt> <a class="reference external" href="https://www.regular-expressions.info/anchors.html">anchors</a>. Awk gives us <a class="reference external" href="https://www.georgevreilly.com/blog/2023/09/05/RegexConjunctions.html">regex conjunctions</a>, allowing us to match <em>all</em> of the chars.</p> <p>The above regular expressions are a simple mechanical transformation of the guess–score pairs. They could be simplified. For example, after <tt class="docutils literal">grep <span class="pre">'^..E..$'</span></tt>, the <tt class="docutils literal">E</tt> in <tt class="docutils literal">awk '/C/ &amp;&amp; /E/'</tt> is redundant. We're not going to optimize the regexes, however.</p> <p>Three of the four answers—<tt class="docutils literal">ICENI</tt>, <tt class="docutils literal">ILEAC</tt>, and <tt class="docutils literal">OLEIC</tt>—are far too obscure to be Wordle answers. Actual Wordle answers also exclude simple plurals (<tt class="docutils literal">YARDS</tt>) and simple past tense (<tt class="docutils literal">LIKED</tt>), but allow more complex plurals (<tt class="docutils literal">BOXES</tt>) and irregular past tense (<tt class="docutils literal">DWELT</tt>, <tt class="docutils literal">BROKE</tt>). We make no attempt to judge if an eligible word is <em>likely</em> as a Wordle answer; merely that it fits.</p> <p>Let's make a pipeline for Wordle 787 (<tt class="docutils literal">INDEX</tt>):</p> <pre class="code bash literal-block"> <span class="c1"># VOUCH=..... GRIPE=..i.e DENIM=deni. WIDEN=.iDEn </span><span class="w"> </span>grep<span class="w"> </span><span class="s1">'^.....$'</span><span class="w"> </span>/usr/share/dict/words<span class="w"> </span><span class="p">|</span><span class="w"> </span>tr<span class="w"> </span><span class="s1">'a-z'</span><span class="w"> </span><span class="s1">'A-Z'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span><span class="s1">'^..DE.$'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="c1"># CORRECT pos </span><span class="w"> </span>awk<span class="w"> </span><span class="s1">'/D/ &amp;&amp; /E/ &amp;&amp; /I/ &amp;&amp; /N/'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="c1"># VALID set </span><span class="w"> </span>grep<span class="w"> </span>-v<span class="w"> </span><span class="s1">'[VOUCHGRPMW]'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="c1"># INVALID set </span><span class="w"> </span>grep<span class="w"> </span><span class="s1">'^[^D][^EI][^IN][^I][^EN]$'</span><span class="w"> </span><span class="c1"># PRESENT pos</span> </pre> <p>yields:</p> <pre class="literal-block"> INDEX </pre> <p>This approach is promising, but constructing those regexes by hand is not maintainable.</p> </div> <div class="section" id="word-lists"> <h3>Word Lists</h3> <p>There are several sources of five-letter words.</p> <ul class="simple"> <li>Filtering <tt class="docutils literal">/usr/share/dict/words</tt> or similar lists.</li> <li><a class="reference external" href="https://github.com/georgevreilly/wordle/blob/main/wordle.txt">wordle.txt</a>: The nearly 15,000 words that Wordle accepts as entries. Many of these words are obscure.</li> <li><a class="reference external" href="https://github.com/georgevreilly/wordle/blob/main/answers.txt">answers.txt</a>: The 2,309 words that Wordle uses as answers. These words are fairly recognizable. They are a subset of the other list.</li> </ul> <p>The latter two lists were extracted from the source code of the game. In the various examples below, I use the larger 15,000-word list.</p> </div> <div class="section" id="initial-python-solution"> <h3>Initial Python Solution</h3> <p>Let's attempt to solve this in Python. The first piece is to parse a list of <tt class="docutils literal">GUESS=SCORE</tt> pairs.</p> <!-- wordle1 --> <pre class="code python literal-block"> <span class="k">def</span> <span class="nf">parse_guesses</span><span class="p">(</span><span class="n">guess_scores</span><span class="p">):</span><span class="w"> </span> <span class="n">invalid</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span> <span class="c1"># Black/Absent</span><span class="w"> </span> <span class="n">valid</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span> <span class="c1"># Green/Correct or Yellow/Present</span><span class="w"> </span> <span class="n">mask</span> <span class="o">=</span> <span class="p">[</span><span class="kc">None</span><span class="p">]</span> <span class="o">*</span> <span class="mi">5</span> <span class="c1"># Exact match for pos (Green/Correct)</span><span class="w"> </span> <span class="n">wrong_spot</span> <span class="o">=</span> <span class="p">[</span><span class="nb">set</span><span class="p">()</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)]</span> <span class="c1"># Wrong spot (Yellow/Present)</span><span class="w"> </span> <span class="k">for</span> <span class="n">gs</span> <span class="ow">in</span> <span class="n">guess_scores</span><span class="p">:</span><span class="w"> </span> <span class="n">guess</span><span class="p">,</span> <span class="n">score</span> <span class="o">=</span> <span class="n">gs</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot;=&quot;</span><span class="p">)</span><span class="w"> </span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">g</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">guess</span><span class="p">,</span> <span class="n">score</span><span class="p">)):</span><span class="w"> </span> <span class="k">assert</span> <span class="s2">&quot;A&quot;</span> <span class="o">&lt;=</span> <span class="n">g</span> <span class="o">&lt;=</span> <span class="s2">&quot;Z&quot;</span><span class="p">,</span> <span class="s2">&quot;GUESS should be uppercase&quot;</span><span class="w"> </span> <span class="k">if</span> <span class="s2">&quot;A&quot;</span> <span class="o">&lt;=</span> <span class="n">s</span> <span class="o">&lt;=</span> <span class="s2">&quot;Z&quot;</span><span class="p">:</span><span class="w"> </span> <span class="k">assert</span> <span class="n">g</span> <span class="o">==</span> <span class="n">s</span><span class="w"> </span> <span class="n">valid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="n">mask</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span><span class="w"> </span> <span class="k">elif</span> <span class="s2">&quot;a&quot;</span> <span class="o">&lt;=</span> <span class="n">s</span> <span class="o">&lt;=</span> <span class="s2">&quot;z&quot;</span><span class="p">:</span><span class="w"> </span> <span class="k">assert</span> <span class="n">g</span> <span class="o">==</span> <span class="n">s</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span><span class="w"> </span> <span class="n">valid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">elif</span> <span class="n">s</span> <span class="o">==</span> <span class="s2">&quot;.&quot;</span><span class="p">:</span><span class="w"> </span> <span class="n">invalid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">else</span><span class="p">:</span><span class="w"> </span> <span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Unexpected </span><span class="si">{</span><span class="n">s</span><span class="si">}</span><span class="s2"> for </span><span class="si">{</span><span class="n">g</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="p">(</span><span class="n">invalid</span><span class="p">,</span> <span class="n">valid</span><span class="p">,</span> <span class="n">mask</span><span class="p">,</span> <span class="n">wrong_spot</span><span class="p">)</span> </pre> <p>Let's try it for the <tt class="docutils literal">OCEAN</tt> guesses:</p> <pre class="code pycon literal-block"> <span class="gp">&gt;&gt;&gt; </span><span class="n">invalid</span><span class="p">,</span> <span class="n">valid</span><span class="p">,</span> <span class="n">mask</span><span class="p">,</span> <span class="n">wrong_spot</span> <span class="o">=</span> <span class="n">parse_guesses</span><span class="p">(</span><span class="w"> </span><span class="gp">... </span> <span class="p">[</span><span class="s2">&quot;JUDGE=....e&quot;</span><span class="p">,</span> <span class="s2">&quot;CHEST=c.E..&quot;</span><span class="p">,</span> <span class="s2">&quot;WRECK=..Ec.&quot;</span><span class="p">])</span><span class="w"> </span><span class="go"> </span><span class="gp">&gt;&gt;&gt; </span><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">invalid</span><span class="si">=}</span><span class="se">\n</span><span class="si">{</span><span class="n">valid</span><span class="si">=}</span><span class="se">\n</span><span class="si">{</span><span class="n">mask</span><span class="si">=}</span><span class="se">\n</span><span class="si">{</span><span class="n">wrong_spot</span><span class="si">=}</span><span class="s2">&quot;</span><span class="p">)</span><span class="w"> </span><span class="go">invalid={'H', 'K', 'D', 'G', 'T', 'R', 'U', 'W', 'J', 'S'} valid={'E', 'C'} mask=[None, None, 'E', None, None] wrong_spot=[{'C'}, set(), set(), {'C'}, {'E'}] </span><span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">vocab</span><span class="p">:</span><span class="w"> </span><span class="gp">... </span> <span class="k">if</span> <span class="n">is_eligible</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">invalid</span><span class="p">,</span> <span class="n">valid</span><span class="p">,</span> <span class="n">mask</span><span class="p">,</span> <span class="n">wrong_spot</span><span class="p">):</span><span class="w"> </span><span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="n">w</span><span class="p">)</span><span class="w"> </span><span class="gp">...</span><span class="w"> </span><span class="go">ICENI ILEAC OCEAN OLEIC</span> </pre> <p>Here's the <tt class="docutils literal">is_eligible</tt> function. We <a class="reference external" href="https://www.geeksforgeeks.org/short-circuiting-techniques-python/#">short-circuit the evaluation</a> and return as soon as any condition is <tt class="docutils literal">False</tt>.</p> <!-- wordle1 --> <pre class="code python literal-block"> <span class="k">def</span> <span class="nf">is_eligible</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">invalid</span><span class="p">,</span> <span class="n">valid</span><span class="p">,</span> <span class="n">mask</span><span class="p">,</span> <span class="n">wrong_spot</span><span class="p">):</span><span class="w"> </span> <span class="n">letters</span> <span class="o">=</span> <span class="p">{</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word</span><span class="p">}</span><span class="w"> </span> <span class="k">if</span> <span class="n">letters</span> <span class="o">&amp;</span> <span class="n">valid</span> <span class="o">!=</span> <span class="n">valid</span><span class="p">:</span><span class="w"> </span> <span class="c1"># Missing some 'valid' letters from the word;</span><span class="w"> </span> <span class="c1"># all Green/Correct and Yellow/Present letters are required</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;!Valid: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">False</span><span class="w"> </span> <span class="k">elif</span> <span class="nb">any</span><span class="p">(</span><span class="n">m</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">m</span> <span class="k">for</span> <span class="n">c</span><span class="p">,</span> <span class="n">m</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">mask</span><span class="p">)):</span><span class="w"> </span> <span class="c1"># Some of the Green/Correct letters are not at their positions</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;!Mask: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">False</span><span class="w"> </span> <span class="k">elif</span> <span class="n">letters</span> <span class="o">&amp;</span> <span class="n">invalid</span><span class="p">:</span><span class="w"> </span> <span class="c1"># Some invalid (Black/Absent) letters are in the word</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;Invalid: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">False</span><span class="w"> </span> <span class="k">elif</span> <span class="nb">any</span><span class="p">(</span><span class="n">c</span> <span class="ow">in</span> <span class="n">ws</span> <span class="k">for</span> <span class="n">c</span><span class="p">,</span> <span class="n">ws</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">wrong_spot</span><span class="p">)):</span><span class="w"> </span> <span class="c1"># We have valid letters in the wrong position (Yellow/Present)</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;WrongSpot: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">False</span><span class="w"> </span> <span class="k">else</span><span class="p">:</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;Got: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">True</span> </pre> </div> <div class="section" id="converting-to-classes"> <h3>Converting to Classes</h3> <p>Returning four parallel collections from a function is a <a class="reference external" href="https://pragmaticways.com/31-code-smells-you-must-know/">code smell</a>. Let's refactor these functions into a <tt class="docutils literal">WordleGuesses</tt> class.</p> <p>First, we'll need some helper classes:</p> <ul class="simple"> <li><tt class="docutils literal">WordleError</tt>: an exception class;</li> <li><tt class="docutils literal">TileState</tt>: a <a class="reference external" href="https://www.georgevreilly.com/blog/2023/09/02/PythonEnumsWithAttributes.html">multi-attribute enumeration</a>;</li> <li><tt class="docutils literal">GuessScore</tt>: a <a class="reference external" href="https://realpython.com/python-data-classes/">dataclass</a> that manages a guess–score pair and the associated <tt class="docutils literal">TileState</tt>s.</li> <li>We'll also use <a class="reference external" href="https://bernat.tech/posts/the-state-of-type-hints-in-python/">type annotations</a> because it's 2023.</li> </ul> <!-- wordle2 --> <pre class="code python literal-block"> <span class="n">WORDLE_LEN</span> <span class="o">=</span> <span class="mi">5</span><span class="w"> </span><span class="k">class</span> <span class="nc">WordleError</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span><span class="w"> </span><span class="sd">&quot;&quot;&quot;Base exception class&quot;&quot;&quot;</span><span class="w"> </span><span class="k">class</span> <span class="nc">TileState</span><span class="p">(</span><span class="n">namedtuple</span><span class="p">(</span><span class="s2">&quot;TileState&quot;</span><span class="p">,</span> <span class="s2">&quot;value emoji color css_color&quot;</span><span class="p">),</span> <span class="n">Enum</span><span class="p">):</span><span class="w"> </span> <span class="n">CORRECT</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;</span><span class="se">\U0001F7E9</span><span class="s2">&quot;</span><span class="p">,</span> <span class="s2">&quot;Green&quot;</span><span class="p">,</span> <span class="s2">&quot;#6aaa64&quot;</span><span class="w"> </span> <span class="n">PRESENT</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;</span><span class="se">\U0001F7E8</span><span class="s2">&quot;</span><span class="p">,</span> <span class="s2">&quot;Yellow&quot;</span><span class="p">,</span> <span class="s2">&quot;#c9b458&quot;</span><span class="w"> </span> <span class="n">ABSENT</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span> <span class="s2">&quot;</span><span class="se">\U00002B1B</span><span class="s2">&quot;</span><span class="p">,</span> <span class="s2">&quot;Black&quot;</span><span class="p">,</span> <span class="s2">&quot;#838184&quot;</span><span class="w"> </span><span class="nd">&#64;dataclass</span><span class="w"> </span><span class="k">class</span> <span class="nc">GuessScore</span><span class="p">:</span><span class="w"> </span> <span class="n">guess</span><span class="p">:</span> <span class="nb">str</span><span class="w"> </span> <span class="n">score</span><span class="p">:</span> <span class="nb">str</span><span class="w"> </span> <span class="n">tiles</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">TileState</span><span class="p">]</span><span class="w"> </span> <span class="nd">&#64;classmethod</span><span class="w"> </span> <span class="k">def</span> <span class="nf">make</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">guess_score</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="s2">&quot;GuessScore&quot;</span><span class="p">:</span><span class="w"> </span> <span class="n">guess</span><span class="p">,</span> <span class="n">score</span> <span class="o">=</span> <span class="n">guess_score</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot;=&quot;</span><span class="p">)</span><span class="w"> </span> <span class="n">tiles</span> <span class="o">=</span> <span class="p">[</span><span class="bp">cls</span><span class="o">.</span><span class="n">tile_state</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">score</span><span class="p">]</span><span class="w"> </span> <span class="k">return</span> <span class="bp">cls</span><span class="p">(</span><span class="n">guess</span><span class="p">,</span> <span class="n">score</span><span class="p">,</span> <span class="n">tiles</span><span class="p">)</span><span class="w"> </span> <span class="nd">&#64;classmethod</span><span class="w"> </span> <span class="k">def</span> <span class="nf">tile_state</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">score_tile</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">TileState</span><span class="p">:</span><span class="w"> </span> <span class="k">if</span> <span class="s2">&quot;A&quot;</span> <span class="o">&lt;=</span> <span class="n">score_tile</span> <span class="o">&lt;=</span> <span class="s2">&quot;Z&quot;</span><span class="p">:</span><span class="w"> </span> <span class="k">return</span> <span class="n">TileState</span><span class="o">.</span><span class="n">CORRECT</span><span class="w"> </span> <span class="k">elif</span> <span class="s2">&quot;a&quot;</span> <span class="o">&lt;=</span> <span class="n">score_tile</span> <span class="o">&lt;=</span> <span class="s2">&quot;z&quot;</span><span class="p">:</span><span class="w"> </span> <span class="k">return</span> <span class="n">TileState</span><span class="o">.</span><span class="n">PRESENT</span><span class="w"> </span> <span class="k">elif</span> <span class="n">score_tile</span> <span class="o">==</span> <span class="s2">&quot;.&quot;</span><span class="p">:</span><span class="w"> </span> <span class="k">return</span> <span class="n">TileState</span><span class="o">.</span><span class="n">ABSENT</span><span class="w"> </span> <span class="k">else</span><span class="p">:</span><span class="w"> </span> <span class="k">raise</span> <span class="n">WordleError</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Invalid score: </span><span class="si">{</span><span class="n">score_tile</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span><span class="w"> </span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span><span class="w"> </span> <span class="k">return</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">guess</span><span class="si">}</span><span class="s2">=</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">score</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span> <span class="k">def</span> <span class="nf">emojis</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">separator</span><span class="o">=</span><span class="s2">&quot;&quot;</span><span class="p">):</span><span class="w"> </span> <span class="k">return</span> <span class="n">separator</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">emoji</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">tiles</span><span class="p">)</span> </pre> <p>For brevity, I presented a minimal version of <tt class="docutils literal">GuessScore.make</tt> above. The version in my <a class="reference external" href="https://github.com/georgevreilly/wordle">Wordle repository</a> has robust validation.</p> <p>Let's add the main class, <tt class="docutils literal">WordleGuesses</tt>:</p> <!-- wordle2 --> <pre class="code python literal-block"> <span class="nd">&#64;dataclass</span><span class="w"> </span><span class="k">class</span> <span class="nc">WordleGuesses</span><span class="p">:</span><span class="w"> </span> <span class="n">mask</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]</span> <span class="c1"># Exact match for position (Green/Correct)</span><span class="w"> </span> <span class="n">valid</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="c1"># Green/Correct or Yellow/Present</span><span class="w"> </span> <span class="n">invalid</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="c1"># Black/Absent</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]]</span> <span class="c1"># Wrong spot (Yellow/Present)</span><span class="w"> </span> <span class="n">guess_scores</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">GuessScore</span><span class="p">]</span><span class="w"> </span> <span class="nd">&#64;classmethod</span><span class="w"> </span> <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">guess_scores</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">GuessScore</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="s2">&quot;WordleGuesses&quot;</span><span class="p">:</span><span class="w"> </span> <span class="n">mask</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="kc">None</span><span class="p">]</span> <span class="o">*</span> <span class="n">WORDLE_LEN</span><span class="w"> </span> <span class="n">valid</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span><span class="w"> </span> <span class="n">invalid</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[</span><span class="nb">set</span><span class="p">()</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">WORDLE_LEN</span><span class="p">)]</span><span class="w"> </span> <span class="k">for</span> <span class="n">gs</span> <span class="ow">in</span> <span class="n">guess_scores</span><span class="p">:</span><span class="w"> </span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">g</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">gs</span><span class="o">.</span><span class="n">tiles</span><span class="p">,</span> <span class="n">gs</span><span class="o">.</span><span class="n">guess</span><span class="p">)):</span><span class="w"> </span> <span class="k">if</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">CORRECT</span><span class="p">:</span><span class="w"> </span> <span class="n">mask</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span><span class="w"> </span> <span class="n">valid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">elif</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">PRESENT</span><span class="p">:</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="n">valid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">elif</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">ABSENT</span><span class="p">:</span><span class="w"> </span> <span class="n">invalid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="bp">cls</span><span class="p">(</span><span class="n">mask</span><span class="p">,</span> <span class="n">valid</span><span class="p">,</span> <span class="n">invalid</span><span class="p">,</span> <span class="n">wrong_spot</span><span class="p">,</span> <span class="n">guess_scores</span><span class="p">)</span> </pre> <p><tt class="docutils literal">WordleGuesses.parse</tt> is a bit shorter and clearer than <tt class="docutils literal">parse_guesses</tt>. It uses <tt class="docutils literal">TileState</tt> at each position to classify the current tile and accumulate state in the four member collections. Since <tt class="docutils literal">GuessScore.make</tt> has validated the input, <tt class="docutils literal">parse</tt> doesn't need to do any further validation.</p> <p>The <tt class="docutils literal">is_eligible</tt> method is essentially the same as its predecessor:</p> <!-- wordle2 --> <pre class="code python literal-block"> <span class="k">class</span> <span class="nc">WordleGuesses</span><span class="p">:</span><span class="w"> </span> <span class="k">def</span> <span class="nf">is_eligible</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">word</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">bool</span><span class="p">:</span><span class="w"> </span> <span class="n">letters</span> <span class="o">=</span> <span class="p">{</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word</span><span class="p">}</span><span class="w"> </span> <span class="k">if</span> <span class="n">letters</span> <span class="o">&amp;</span> <span class="bp">self</span><span class="o">.</span><span class="n">valid</span> <span class="o">!=</span> <span class="bp">self</span><span class="o">.</span><span class="n">valid</span><span class="p">:</span><span class="w"> </span> <span class="c1"># Did not have the full set of green+yellow letters known to be valid</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;!Valid: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">False</span><span class="w"> </span> <span class="k">elif</span> <span class="nb">any</span><span class="p">(</span><span class="n">m</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">m</span> <span class="k">for</span> <span class="n">c</span><span class="p">,</span> <span class="n">m</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">mask</span><span class="p">)):</span><span class="w"> </span> <span class="c1"># Couldn't find all the green/correct letters</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;!Mask: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">False</span><span class="w"> </span> <span class="k">elif</span> <span class="n">letters</span> <span class="o">&amp;</span> <span class="bp">self</span><span class="o">.</span><span class="n">invalid</span><span class="p">:</span><span class="w"> </span> <span class="c1"># Invalid (black) letters are in the word</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;Invalid: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">False</span><span class="w"> </span> <span class="k">elif</span> <span class="nb">any</span><span class="p">(</span><span class="n">c</span> <span class="ow">in</span> <span class="n">ws</span> <span class="k">for</span> <span class="n">c</span><span class="p">,</span> <span class="n">ws</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">wrong_spot</span><span class="p">)):</span><span class="w"> </span> <span class="c1"># Found some yellow letters: valid letters in wrong position</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">&quot;WrongSpot: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">False</span><span class="w"> </span> <span class="k">else</span><span class="p">:</span><span class="w"> </span> <span class="c1"># Potentially valid</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&quot;Got: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="kc">True</span><span class="w"> </span> <span class="k">def</span> <span class="nf">find_eligible</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vocabulary</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span><span class="w"> </span> <span class="k">return</span> <span class="p">[</span><span class="n">w</span> <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">vocabulary</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">is_eligible</span><span class="p">(</span><span class="n">w</span><span class="p">)]</span> </pre> <p>There's a <a class="reference external" href="https://www.spinellis.gr/blog/20200225/">famous story</a> where Donald Knuth was asked by Jon Bentley to demonstrate <a class="reference external" href="http://www.literateprogramming.com/">literate programming</a> by finding the <em>K</em> most common words from a text file. Knuth turned in an eight-page gem of WEB, which was reviewed by Doug McIlroy, who demonstrated that the task could also be accomplished in a six-line pipeline.</p> <p>Wordle can also be solved with a six-line pipeline, but the regexes are quite difficult to type correctly and they have to be carefully hand tailored for each set of guess–score pairs. There is no one general six-line pipeline.</p> <p>I know that I'd much rather work with these Python classes. As we'll see below, they are a solid foundation that can be built upon in many ways.</p> </div> <div class="section" id="does-it-work"> <h3>Does it Work?</h3> <p>Let's try it!:</p> <pre class="code bash literal-block"> <span class="c1"># answer: ARBOR </span>$<span class="w"> </span>./wordle.py<span class="w"> </span><span class="nv">HARES</span><span class="o">=</span>.ar..<span class="w"> </span><span class="nv">GUILT</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">CROAK</span><span class="o">=</span>.Roa.<span class="w"> </span><span class="nv">BRAVO</span><span class="o">=</span>bRa.o<span class="w"> </span>ARBOR<span class="w"> </span><span class="c1"># answer: CACHE </span>$<span class="w"> </span>./wordle.py<span class="w"> </span><span class="nv">CHAIR</span><span class="o">=</span>Cha..<span class="w"> </span><span class="nv">CLASH</span><span class="o">=</span>C.a.h<span class="w"> </span><span class="nv">CATCH</span><span class="o">=</span>CA.ch<span class="w"> </span>CACHE<span class="w"> </span>CAHOW<span class="w"> </span><span class="c1"># answer: TOXIC </span>$<span class="w"> </span>./wordle.py<span class="w"> </span><span class="nv">LEAKS</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">MIGHT</span><span class="o">=</span>.i..t<span class="w"> </span><span class="nv">BLITZ</span><span class="o">=</span>..it.<span class="w"> </span><span class="nv">OPTIC</span><span class="o">=</span>o.tIC<span class="w"> </span><span class="nv">TONIC</span><span class="o">=</span>TO.IC<span class="w"> </span>TORIC<span class="w"> </span>TOXIC </pre> <p>This looks right but there are some subtle bugs in the code.</p> </div> <div class="section" id="fifty-is-the-new-witty"> <h3>Fifty is the new Witty</h3> <p>Here we expect to find <tt class="docutils literal">FIFTY</tt>, but no words match:</p> <pre class="code bash literal-block"> <span class="c1"># answer: FIFTY </span>$<span class="w"> </span>./wordle.py<span class="w"> </span><span class="nv">HARES</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">BUILT</span><span class="o">=</span>..i.t<span class="w"> </span><span class="nv">TIMID</span><span class="o">=</span>tI...<span class="w"> </span><span class="nv">PINTO</span><span class="o">=</span>.I.T.<span class="w"> </span><span class="nv">WITTY</span><span class="o">=</span>.I.TY<span class="w"> </span>--None-- </pre> <p>Let's take a look at the state of the <tt class="docutils literal">WordleGuesses</tt> instance:</p> <pre class="code pycon literal-block"> <span class="gp">&gt;&gt;&gt; </span><span class="n">guess_scores</span> <span class="o">=</span> <span class="p">[</span><span class="n">GuessScore</span><span class="o">.</span><span class="n">make</span><span class="p">(</span><span class="n">gs</span><span class="p">)</span> <span class="k">for</span> <span class="n">gs</span> <span class="ow">in</span><span class="w"> </span><span class="go"> &quot;HARES=..... BUILT=..i.t TIMID=tI... PINTO=.I.T. WITTY=.I.TY&quot;.split()] </span><span class="gp">&gt;&gt;&gt; </span><span class="n">wg</span> <span class="o">=</span> <span class="n">WordleGuesses</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">guess_scores</span><span class="p">)</span><span class="w"> </span><span class="gp">&gt;&gt;&gt; </span><span class="n">wg</span><span class="w"> </span><span class="go">WordleGuesses(mask=[None, 'I', None, 'T', 'Y'], valid={'T', 'I', 'Y'}, invalid={ 'A', 'E', 'D', 'M', 'U', 'H', 'I', 'B', 'L', 'T', 'P', 'O', 'R', 'W', 'N', 'S'}, wrong_spot=[{'T'}, set(), {'I'}, set(), {'T'}], guess_scores=[GuessScore(guess='HARES', score='.....', tiles=[&lt;TileState.ABSENT: TileState(value=3, emoji='⬛', color='Black', css_color='#838184')&gt;, &lt;TileState.ABSENT: TileState(value=3, emoji='⬛', color='Black', css_color='#838184')&gt;, ... much snipped ...</span> </pre> <p>That's ugly.</p> </div> <div class="section" id="better-string-representation"> <h3>Better String Representation</h3> <p>Let's write a few helper functions to improve the <tt class="docutils literal">__repr__</tt>:</p> <!-- wordle3 --> <pre class="code python literal-block"> <span class="k">def</span> <span class="nf">letter_set</span><span class="p">(</span><span class="n">s</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span><span class="w"> </span> <span class="k">return</span> <span class="s2">&quot;&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">s</span><span class="p">))</span><span class="w"> </span><span class="k">def</span> <span class="nf">letter_sets</span><span class="p">(</span><span class="n">ls</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]])</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span><span class="w"> </span> <span class="k">return</span> <span class="s2">&quot;[&quot;</span> <span class="o">+</span> <span class="s2">&quot;,&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">letter_set</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="ow">or</span> <span class="s2">&quot;-&quot;</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">ls</span><span class="p">)</span> <span class="o">+</span> <span class="s2">&quot;]&quot;</span><span class="w"> </span><span class="k">def</span> <span class="nf">dash_mask</span><span class="p">(</span><span class="n">mask</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]):</span><span class="w"> </span> <span class="k">return</span> <span class="s2">&quot;&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">m</span> <span class="ow">or</span> <span class="s2">&quot;-&quot;</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">mask</span><span class="p">)</span><span class="w"> </span><span class="k">class</span> <span class="nc">WordleGuesses</span><span class="p">:</span><span class="w"> </span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span><span class="w"> </span> <span class="n">mask</span> <span class="o">=</span> <span class="n">dash_mask</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mask</span><span class="p">)</span><span class="w"> </span> <span class="n">valid</span> <span class="o">=</span> <span class="n">letter_set</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">valid</span><span class="p">)</span><span class="w"> </span> <span class="n">invalid</span> <span class="o">=</span> <span class="n">letter_set</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">invalid</span><span class="p">)</span><span class="w"> </span> <span class="n">wrong_spot</span> <span class="o">=</span> <span class="n">letter_sets</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">wrong_spot</span><span class="p">)</span><span class="w"> </span> <span class="n">unused</span> <span class="o">=</span> <span class="n">letter_set</span><span class="p">(</span><span class="w"> </span> <span class="nb">set</span><span class="p">(</span><span class="n">string</span><span class="o">.</span><span class="n">ascii_uppercase</span><span class="p">)</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">valid</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">invalid</span><span class="p">)</span><span class="w"> </span> <span class="n">_guess_scores</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;, &quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">gs</span><span class="si">}</span><span class="s2">|</span><span class="si">{</span><span class="n">gs</span><span class="o">.</span><span class="n">emojis</span><span class="p">()</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span> <span class="k">for</span> <span class="n">gs</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">guess_scores</span><span class="p">)]</span><span class="w"> </span> <span class="k">return</span> <span class="p">(</span><span class="w"> </span> <span class="sa">f</span><span class="s2">&quot;WordleGuesses(</span><span class="si">{</span><span class="n">mask</span><span class="si">=}</span><span class="s2">, </span><span class="si">{</span><span class="n">valid</span><span class="si">=}</span><span class="s2">, </span><span class="si">{</span><span class="n">invalid</span><span class="si">=}</span><span class="s2">,</span><span class="se">\n</span><span class="s2">&quot;</span><span class="w"> </span> <span class="sa">f</span><span class="s2">&quot; </span><span class="si">{</span><span class="n">wrong_spot</span><span class="si">=}</span><span class="s2">, </span><span class="si">{</span><span class="n">unused</span><span class="si">=}</span><span class="s2">)&quot;</span><span class="w"> </span> <span class="p">)</span> </pre> <p>Let's run it again, printing out the instance:</p> <pre class="code bash literal-block"> <span class="c1"># answer: FIFTY </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">HARES</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">BUILT</span><span class="o">=</span>..i.t<span class="w"> </span><span class="nv">TIMID</span><span class="o">=</span>tI...<span class="w"> </span><span class="nv">PINTO</span><span class="o">=</span>.I.T.<span class="w"> </span><span class="nv">WITTY</span><span class="o">=</span>.I.TY<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'-I-TY'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'ITY'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ABDEHILMNOPRSTUW'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[T,-,I,-,T]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'CFGJKQVXZ'</span><span class="o">)</span><span class="w"> </span><span class="nv">guess_scores</span><span class="o">=</span><span class="w"> </span><span class="o">[</span><span class="s1">'HARES=.....|⬛⬛⬛⬛⬛, BUILT=..i.t|⬛⬛🟨⬛🟨, TIMID=tI...|🟨🟩⬛⬛⬛, PINTO=.I.T.|⬛🟩⬛🟩⬛, WITTY=.I.TY|⬛🟩⬛🟩🟩'</span><span class="o">]</span><span class="w"> </span>--None-- </pre> <p>That's a huge improvement in legibility over the default string representation!</p> <p>There's a <tt class="docutils literal">T</tt> in both <tt class="docutils literal">valid</tt> and <tt class="docutils literal">invalid</tt>—two sets that should be mutually exclusive. The first “absent” <tt class="docutils literal">T</tt> at position 3 in <tt class="docutils literal">WITTY</tt> has poisoned the second <tt class="docutils literal">T</tt> at position 4, which is “correct”. The <tt class="docutils literal">T</tt> at position 1 in <tt class="docutils literal">TIMID</tt> and the <tt class="docutils literal">T</tt> at position 5 in <tt class="docutils literal">BUILT</tt> are “present” because they are the only <tt class="docutils literal">T</tt> in those guesses.</p> <p>When there are two <tt class="docutils literal">T</tt>s in a guess, but only one <tt class="docutils literal">T</tt> in the answer, one of the <tt class="docutils literal">T</tt>s will either be “correct” or “present”. The second, superfluous <tt class="docutils literal">T</tt> will be “absent”.</p> </div> <div class="section" id="first-attempt-at-fixing-the-bug"> <h3>First Attempt at Fixing the Bug</h3> <p>Let's modify <tt class="docutils literal">WordleGuesses.parse</tt> slightly to address that. When we get an <tt class="docutils literal">ABSENT</tt> tile, we should add that letter to <tt class="docutils literal">invalid</tt> only if it's not already in <tt class="docutils literal">valid</tt>.</p> <!-- wordle4 --> <pre class="code python literal-block"> <span class="k">class</span> <span class="nc">WordleGuesses</span><span class="p">:</span><span class="w"> </span> <span class="nd">&#64;classmethod</span><span class="w"> </span> <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">guess_scores</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">GuessScore</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="s2">&quot;WordleGuesses&quot;</span><span class="p">:</span><span class="w"> </span> <span class="n">mask</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="kc">None</span><span class="p">]</span> <span class="o">*</span> <span class="n">WORDLE_LEN</span><span class="w"> </span> <span class="n">valid</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span><span class="w"> </span> <span class="n">invalid</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[</span><span class="nb">set</span><span class="p">()</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">WORDLE_LEN</span><span class="p">)]</span><span class="w"> </span> <span class="k">for</span> <span class="n">gs</span> <span class="ow">in</span> <span class="n">guess_scores</span><span class="p">:</span><span class="w"> </span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">g</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">gs</span><span class="o">.</span><span class="n">tiles</span><span class="p">,</span> <span class="n">gs</span><span class="o">.</span><span class="n">guess</span><span class="p">)):</span><span class="w"> </span> <span class="k">if</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">CORRECT</span><span class="p">:</span><span class="w"> </span> <span class="n">mask</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span><span class="w"> </span> <span class="n">valid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">elif</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">PRESENT</span><span class="p">:</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="n">valid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">elif</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">ABSENT</span><span class="p">:</span><span class="w"> </span> <span class="k">if</span> <span class="n">g</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">valid</span><span class="p">:</span> <span class="c1"># &lt;&lt;&lt; new</span><span class="w"> </span> <span class="n">invalid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="bp">cls</span><span class="p">(</span><span class="n">mask</span><span class="p">,</span> <span class="n">valid</span><span class="p">,</span> <span class="n">invalid</span><span class="p">,</span> <span class="n">wrong_spot</span><span class="p">,</span> <span class="n">guess_scores</span><span class="p">)</span> </pre> <p>Does it work? Yes! Now we have <tt class="docutils literal">FIFTY</tt>.</p> <pre class="code bash literal-block"> <span class="c1"># answer: FIFTY </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">HARES</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">BUILT</span><span class="o">=</span>..i.t<span class="w"> </span><span class="nv">TIMID</span><span class="o">=</span>tI...<span class="w"> </span><span class="nv">PINTO</span><span class="o">=</span>.I.T.<span class="w"> </span><span class="nv">WITTY</span><span class="o">=</span>.I.TY<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'-I-TY'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'ITY'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ABDEHLMNOPRSUW'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[T,-,I,-,T]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'CFGJKQVXZ'</span><span class="o">)</span><span class="w"> </span>FIFTY<span class="w"> </span>JITTY<span class="w"> </span>KITTY<span class="w"> </span>ZITTY </pre> <p>But we also have <tt class="docutils literal">JITTY</tt>, <tt class="docutils literal">KITTY</tt>, and <tt class="docutils literal">ZITTY</tt>, which should not been considered eligible since <tt class="docutils literal">WITTY</tt> was eliminated for the <tt class="docutils literal">T</tt> at position 3. We'll come back to this soon.</p> </div> <div class="section" id="the-problem-of-repeated-letters"> <h3>The Problem of Repeated Letters</h3> <p>There's a problem that we haven't grappled with properly yet: <em>repeated letters</em> in a guess or in an answer. We've made an implicit assumption that there are five distinct letters in each guess and in the answer.</p> <p>Here's an example that fails with the original <tt class="docutils literal">parse</tt>:</p> <pre class="code bash literal-block"> <span class="c1"># answer: EMPTY </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">LODGE</span><span class="o">=</span>....e<span class="w"> </span><span class="nv">WIPER</span><span class="o">=</span>..Pe.<span class="w"> </span><span class="nv">TEPEE</span><span class="o">=</span>teP..<span class="w"> </span><span class="nv">EXPAT</span><span class="o">=</span>E.P.t<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'E-P--'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'EPT'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ADEGILORWX'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[T,E,-,E,ET]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'BCFHJKMNQSUVYZ'</span><span class="o">)</span><span class="w"> </span>--None-- </pre> <p>but works with the current <tt class="docutils literal">parse</tt>:</p> <pre class="code bash literal-block"> <span class="c1"># answer: EMPTY </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">LODGE</span><span class="o">=</span>....e<span class="w"> </span><span class="nv">WIPER</span><span class="o">=</span>..Pe.<span class="w"> </span><span class="nv">TEPEE</span><span class="o">=</span>teP..<span class="w"> </span><span class="nv">EXPAT</span><span class="o">=</span>E.P.t<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'E-P--'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'EPT'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ADGILORWX'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[T,E,-,E,ET]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'BCFHJKMNQSUVYZ'</span><span class="o">)</span><span class="w"> </span>EMPTS<span class="w"> </span>EMPTY </pre> <p>Note that there is no longer an <tt class="docutils literal">E</tt> in <tt class="docutils literal">invalid</tt>. In <tt class="docutils literal">TEPEE=teP..</tt>, the <tt class="docutils literal">E</tt> in position 2 is considered “present”, while the two <tt class="docutils literal">E</tt>s in positions 4 and 5 are marked “absent”. This tells us that there is only one <tt class="docutils literal">E</tt> in the answer. Since <tt class="docutils literal">P</tt> is correct in position 3 of <tt class="docutils literal">TEPEE</tt>, the <tt class="docutils literal">E</tt> must be in position 1. This is confirmed by the subsequent <tt class="docutils literal">EXPAT=E.P.t</tt>, where the initial <tt class="docutils literal">E</tt> is marked “correct”.</p> <p>Our previous understanding of “absent” was too simple. An “absent” tile can mean one of two things:</p> <ol class="arabic simple"> <li>This letter is not in the answer at all—the usual case.</li> <li>If another copy of this letter is “correct” or “present” elsewhere in the same guess (i.e., <em>valid</em>), the letter is superfluous at this position. The guess has more instances of this letter than the answer does.</li> </ol> <p>Consider the results here:</p> <pre class="code bash literal-block"> <span class="c1"># answer: STYLE </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">GROAN</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">WHILE</span><span class="o">=</span>...LE<span class="w"> </span><span class="nv">BELLE</span><span class="o">=</span>...LE<span class="w"> </span><span class="nv">TUPLE</span><span class="o">=</span>t..LE<span class="w"> </span><span class="nv">STELE</span><span class="o">=</span>ST.LE<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'ST-LE'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'ELST'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ABGHINOPRUW'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[T,-,-,-,-]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'CDFJKMQVXYZ'</span><span class="o">)</span><span class="w"> </span>STELE<span class="w"> </span>STYLE </pre> <p><tt class="docutils literal">STELE</tt> was an incorrect guess, so it should not have been offered as an eligible word. <tt class="docutils literal">E</tt>&nbsp;is valid in position 5, but wrong in position 3.</p> <p>Another example:</p> <pre class="code bash literal-block"> <span class="c1"># answer: WRITE </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">SABER</span><span class="o">=</span>...er<span class="w"> </span><span class="nv">REFIT</span><span class="o">=</span>re.it<span class="w"> </span><span class="nv">TRITE</span><span class="o">=</span>.RITE<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'-RITE'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'EIRT'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ABFS'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[R,E,-,EI,RT]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'CDGHJKLMNOPQUVWXYZ'</span><span class="o">)</span><span class="w"> </span>TRITE<span class="w"> </span>URITE<span class="w"> </span>WRITE </pre> <p><tt class="docutils literal">TRITE</tt> was an incorrect guess, so it should not have been offered. <tt class="docutils literal">4:T</tt> is valid, <tt class="docutils literal">1:T</tt> is wrong.</p> </div> <div class="section" id="fixing-repeated-absent-letters"> <h3>Fixing Repeated Absent Letters</h3> <p>We can fix this by making two passes through the tiles for each guess–score pair.</p> <ol class="arabic simple"> <li>Handle “correct” and “present” tiles as before.</li> <li>Add “absent” tiles to either <tt class="docutils literal">invalid</tt> or <tt class="docutils literal">wrong_spot</tt>.</li> </ol> <p>We need the second pass to handle a case like <tt class="docutils literal"><span class="pre">WITTY=.I.TY</span></tt>, where the “absent” <tt class="docutils literal">3:T</tt> precedes the “correct” <tt class="docutils literal">4:T</tt>: the <tt class="docutils literal">valid</tt> set must be fully updated before we process “absent” tiles.</p> <!-- wordle5 --> <pre class="code python literal-block"> <span class="k">class</span> <span class="nc">WordleGuesses</span><span class="p">:</span><span class="w"> </span> <span class="nd">&#64;classmethod</span><span class="w"> </span> <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">guess_scores</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="n">GuessScore</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="s2">&quot;WordleGuesses&quot;</span><span class="p">:</span><span class="w"> </span> <span class="n">mask</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="kc">None</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">WORDLE_LEN</span><span class="p">)]</span><span class="w"> </span> <span class="n">valid</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span><span class="w"> </span> <span class="n">invalid</span><span class="p">:</span> <span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">set</span><span class="p">[</span><span class="nb">str</span><span class="p">]]</span> <span class="o">=</span> <span class="p">[</span><span class="nb">set</span><span class="p">()</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">WORDLE_LEN</span><span class="p">)]</span><span class="w"> </span> <span class="k">for</span> <span class="n">gs</span> <span class="ow">in</span> <span class="n">guess_scores</span><span class="p">:</span><span class="w"> </span> <span class="c1"># First pass for correct and present</span><span class="w"> </span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">g</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">gs</span><span class="o">.</span><span class="n">tiles</span><span class="p">,</span> <span class="n">gs</span><span class="o">.</span><span class="n">guess</span><span class="p">)):</span><span class="w"> </span> <span class="k">if</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">CORRECT</span><span class="p">:</span><span class="w"> </span> <span class="n">mask</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span><span class="w"> </span> <span class="n">valid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">elif</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">PRESENT</span><span class="p">:</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="n">valid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="c1"># Second pass for absent letters</span><span class="w"> </span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">g</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">gs</span><span class="o">.</span><span class="n">tiles</span><span class="p">,</span> <span class="n">gs</span><span class="o">.</span><span class="n">guess</span><span class="p">)):</span><span class="w"> </span> <span class="k">if</span> <span class="n">t</span> <span class="ow">is</span> <span class="n">TileState</span><span class="o">.</span><span class="n">ABSENT</span><span class="p">:</span><span class="w"> </span> <span class="k">if</span> <span class="n">g</span> <span class="ow">in</span> <span class="n">valid</span><span class="p">:</span><span class="w"> </span> <span class="c1"># There are more instances of `g` in `gs.guess`</span><span class="w"> </span> <span class="c1"># than in the answer</span><span class="w"> </span> <span class="n">wrong_spot</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">else</span><span class="p">:</span><span class="w"> </span> <span class="n">invalid</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">g</span><span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="bp">cls</span><span class="p">(</span><span class="n">mask</span><span class="p">,</span> <span class="n">valid</span><span class="p">,</span> <span class="n">invalid</span><span class="p">,</span> <span class="n">wrong_spot</span><span class="p">,</span> <span class="n">guess_scores</span><span class="p">)</span> </pre> <p>We can see that <tt class="docutils literal">valid</tt> and <tt class="docutils literal">invalid</tt> are disjoint. The <tt class="docutils literal">is_eligible</tt> method needs no changes.</p> <p>Let's try the <tt class="docutils literal">WRITE</tt> example again:</p> <pre class="code bash literal-block"> <span class="c1"># answer: WRITE </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">SABER</span><span class="o">=</span>...er<span class="w"> </span><span class="nv">REFIT</span><span class="o">=</span>re.it<span class="w"> </span><span class="nv">TRITE</span><span class="o">=</span>.RITE<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'-RITE'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'EIRT'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ABFS'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[RT,E,-,EI,RT]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'CDGHJKLMNOPQUVWXYZ'</span><span class="o">)</span><span class="w"> </span>URITE<span class="w"> </span>WRITE </pre> <p>There is now a <tt class="docutils literal">T</tt> in the first <tt class="docutils literal">wrong_spot</tt> entry.</p> <p>And <tt class="docutils literal">STYLE</tt>?</p> <pre class="code bash literal-block"> <span class="c1"># answer: STYLE </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">GROAN</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">WHILE</span><span class="o">=</span>...LE<span class="w"> </span><span class="nv">BELLE</span><span class="o">=</span>...LE<span class="w"> </span><span class="nv">TUPLE</span><span class="o">=</span>t..LE<span class="w"> </span><span class="nv">STELE</span><span class="o">=</span>ST.LE<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'ST-LE'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'ELST'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ABGHINOPRUW'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[T,E,EL,-,-]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'CDFJKMQVXYZ'</span><span class="o">)</span><span class="w"> </span>STYLE </pre> <p>Both the second and third <tt class="docutils literal">wrong_spot</tt>s now have an <tt class="docutils literal">E</tt>. The “absent” <tt class="docutils literal">3:L</tt> from <tt class="docutils literal">BELLE</tt> is also in the third <tt class="docutils literal">wrong_spot</tt>.</p> <p>What about some other examples?</p> <p>In our previous attempt at fixing the bug, neither <tt class="docutils literal">QUICK</tt> nor <tt class="docutils literal">SPICK</tt> were found because the first <tt class="docutils literal">C</tt> in <tt class="docutils literal">CHICK</tt> was “absent” and thus marked invalid. Now, the <tt class="docutils literal">valid</tt> and <tt class="docutils literal">invalid</tt> sets are disjoint, there's a <tt class="docutils literal">C</tt> in the first element of <tt class="docutils literal">wrong_spot</tt>, and both words are found:</p> <pre class="code bash literal-block"> <span class="c1"># answer: QUICK </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">MORAL</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">TWINE</span><span class="o">=</span>..I..<span class="w"> </span><span class="nv">CHICK</span><span class="o">=</span>..ICK<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'--ICK'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'CIK'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'AEHLMNORTW'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[C,-,-,-,-]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'BDFGJPQSUVXYZ'</span><span class="o">)</span><span class="w"> </span>QUICK<span class="w"> </span>SPICK </pre> <p>As expected, we find only one answer for <tt class="docutils literal">FIFTY</tt> now:</p> <pre class="code bash literal-block"> <span class="c1"># answer: FIFTY </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">HARES</span><span class="o">=</span>.....<span class="w"> </span><span class="nv">BUILT</span><span class="o">=</span>..i.t<span class="w"> </span><span class="nv">TIMID</span><span class="o">=</span>tI...<span class="w"> </span><span class="nv">PINTO</span><span class="o">=</span>.I.T.<span class="w"> </span><span class="nv">WITTY</span><span class="o">=</span>.I.TY<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'-I-TY'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'ITY'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'ABDEHLMNOPRSUW'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[T,-,IT,I,T]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'CFGJKQVXZ'</span><span class="o">)</span><span class="w"> </span>FIFTY </pre> <p>The new <tt class="docutils literal">T</tt> in the third element of <tt class="docutils literal">wrong_spot</tt> blocks the rhymes for <tt class="docutils literal">WITTY</tt>.</p> </div> <div class="section" id="further-optimization-of-the-mask"> <h3>Further Optimization of the Mask</h3> <p>There's still room for improvement. If you guess <tt class="docutils literal">ANGLE=ANGle</tt>, it's immediately obvious (to a human player) that you should swap the <tt class="docutils literal">L</tt> and <tt class="docutils literal">E</tt> to guess <tt class="docutils literal">ANGEL</tt> on your next turn. Or swap the <tt class="docutils literal">P</tt> and <tt class="docutils literal">T</tt> in <tt class="docutils literal">SPRAT=SpRAt</tt> to guess <tt class="docutils literal">STRAP</tt>.</p> <p>Similarly, <tt class="docutils literal">TENET=TEN.t</tt> tells you that the fourth letter of the answer must be <tt class="docutils literal">T</tt>, while <tt class="docutils literal">CHORE=C.OrE</tt> must have <tt class="docutils literal">2:R</tt>.</p> <p>A more complex example:</p> <pre class="code bash literal-block"> <span class="c1"># answer: BURLY </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-v<span class="w"> </span><span class="nv">LOWER</span><span class="o">=</span>l...r<span class="w"> </span><span class="nv">FRAIL</span><span class="o">=</span>.r..l<span class="w"> </span><span class="nv">BLURT</span><span class="o">=</span>Blur.<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span><span class="s1">'B----'</span>,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span><span class="s1">'BLRU'</span>,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span><span class="s1">'AEFIOTW'</span>,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=</span><span class="s1">'[L,LR,U,R,LR]'</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span><span class="s1">'CDGHJKMNPQSVXYZ'</span><span class="o">)</span> </pre> <p>The <tt class="docutils literal">R</tt> is in the wrong spot in positions 5 (<tt class="docutils literal"><span class="pre">l...r</span></tt>), 2 (<tt class="docutils literal"><span class="pre">.r..l</span></tt>), and 4 (<tt class="docutils literal">Blur.</tt>). The <tt class="docutils literal">B</tt> is correct in position 1, so <tt class="docutils literal">R</tt> must be in position 3.</p> <p>The <tt class="docutils literal">L</tt> is in the wrong spot in positions 1, 5, and 2. <tt class="docutils literal">B</tt> is in position 1, <tt class="docutils literal">R</tt> is now in 3, so that leaves only position 4 for <tt class="docutils literal">L</tt>.</p> <p>There remain two possibilities for <tt class="docutils literal">U</tt>—positions 2 and 5; the information contained in <tt class="docutils literal">mask</tt> and <tt class="docutils literal">wrong_spot</tt> is not enough to determine where <tt class="docutils literal">U</tt> should go.</p> <p>The original mask, <tt class="docutils literal"><span class="pre">B----</span></tt>, was due to having only one “correct” letter. Using the cumulative information in the guesses and scores, we can infer a mask of <tt class="docutils literal"><span class="pre">B-RL-</span></tt>.</p> <p>In all of these cases, we can find exactly one remaining position where a “present” letter can be placed. In the <tt class="docutils literal">BURLY</tt> example, it takes two passes: we couldn't uniquely determine a place for <tt class="docutils literal">L</tt> until we had already placed <tt class="docutils literal">R</tt>.</p> <p>Up to now, we've been treating each tile in almost complete isolation. Let's optimize the mask programmatically.</p> <p>To account for repeated letters, such as the two <tt class="docutils literal">T</tt>s in <tt class="docutils literal">TENET=TEN.t</tt>, we use Python's <tt class="docutils literal">collections.Counter</tt> as a <a class="reference external" href="https://dbader.org/blog/sets-and-multiset-in-python">multiset</a>. <tt class="docutils literal">Counter</tt>'s union operation, <tt class="docutils literal">|=</tt>, computes the maximum of corresponding counts.</p> <p>First, we loop through <em>all</em> the guess–score pairs, building a <tt class="docutils literal">valid</tt> multiset of the “correct” and “present” letters. Then we subtract a multiset of the “correct” letters, yielding a multiset of the “present” letters.</p> <p>Second, we loop over <tt class="docutils literal">present</tt>, trying for each letter to find a single empty position where it can be placed in the mask. If there is such a position, we update <tt class="docutils literal">mask2</tt>, remove the letter from <tt class="docutils literal">present</tt>, and break out of the inner loop. If there isn't (as in the two possibilities for <tt class="docutils literal">U</tt> in <tt class="docutils literal">BURLY</tt>), then we use the little-known <a class="reference external" href="https://python-notes.curiousefficiency.org/en/latest/python_concepts/break_else.html">break-else</a> construct to exit from the outer loop.</p> <p>Finally, we merge <tt class="docutils literal">mask2</tt> into <tt class="docutils literal">self.mask</tt>. This <tt class="docutils literal">optimize</tt> method is called from the end of <tt class="docutils literal">WordleGuesses.parse</tt>.</p> <!-- wordle --> <pre class="code python literal-block"> <span class="k">class</span> <span class="nc">WordleGuesses</span><span class="p">:</span><span class="w"> </span> <span class="k">def</span> <span class="nf">optimize</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]:</span><span class="w"> </span><span class="sd">&quot;&quot;&quot;Use PRESENT tiles to improve `mask`.&quot;&quot;&quot;</span><span class="w"> </span> <span class="n">mask1</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">mask</span><span class="w"> </span> <span class="n">mask2</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="kc">None</span><span class="p">]</span> <span class="o">*</span> <span class="n">WORDLE_LEN</span><span class="w"> </span> <span class="c1"># Compute `valid`, a multiset of the correct and present letters in all guesses</span><span class="w"> </span> <span class="n">valid</span><span class="p">:</span> <span class="n">Counter</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">()</span><span class="w"> </span> <span class="k">for</span> <span class="n">gs</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">guess_scores</span><span class="p">:</span><span class="w"> </span> <span class="n">valid</span> <span class="o">|=</span> <span class="n">Counter</span><span class="p">(</span><span class="w"> </span> <span class="n">g</span> <span class="k">for</span> <span class="n">g</span><span class="p">,</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">gs</span><span class="o">.</span><span class="n">guess</span><span class="p">,</span> <span class="n">gs</span><span class="o">.</span><span class="n">tiles</span><span class="p">)</span> <span class="k">if</span> <span class="n">t</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">TileState</span><span class="o">.</span><span class="n">ABSENT</span><span class="w"> </span> <span class="p">)</span><span class="w"> </span> <span class="n">correct</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">mask1</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">)</span><span class="w"> </span> <span class="c1"># Compute `present`, a multiset of the valid letters</span><span class="w"> </span> <span class="c1"># whose correct position is not yet known; i.e., PRESENT in any row.</span><span class="w"> </span> <span class="n">present</span> <span class="o">=</span> <span class="n">valid</span> <span class="o">-</span> <span class="n">correct</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">valid</span><span class="si">=}</span><span class="s2"> </span><span class="si">{</span><span class="n">correct</span><span class="si">=}</span><span class="s2"> </span><span class="si">{</span><span class="n">present</span><span class="si">=}</span><span class="s2">&quot;</span><span class="p">)</span><span class="w"> </span> <span class="k">def</span> <span class="nf">available</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">i</span><span class="p">):</span><span class="w"> </span> <span class="s2">&quot;Can `c` be placed in slot `i` of `mask2`?&quot;</span><span class="w"> </span> <span class="k">return</span> <span class="n">mask1</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">mask2</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">c</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">wrong_spot</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span> <span class="k">while</span> <span class="n">present</span><span class="p">:</span><span class="w"> </span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">present</span><span class="p">:</span><span class="w"> </span> <span class="n">positions</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">WORDLE_LEN</span><span class="p">)</span> <span class="k">if</span> <span class="n">available</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">i</span><span class="p">)]</span><span class="w"> </span> <span class="c1"># Is there only one position where `c` can be placed?</span><span class="w"> </span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">positions</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span><span class="w"> </span> <span class="n">i</span> <span class="o">=</span> <span class="n">positions</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span> <span class="n">mask2</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">c</span><span class="w"> </span> <span class="n">present</span> <span class="o">-=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">c</span><span class="p">)</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="s2"> -&gt; </span><span class="si">{</span><span class="n">c</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span><span class="w"> </span> <span class="k">break</span><span class="w"> </span> <span class="k">else</span><span class="p">:</span><span class="w"> </span> <span class="c1"># We reach this for-else only if there was no `break` in the for-loop;</span><span class="w"> </span> <span class="c1"># i.e., no one-element `positions` was found in `present`.</span><span class="w"> </span> <span class="c1"># We must abandon the outer loop, even though `present` is not empty.</span><span class="w"> </span> <span class="k">break</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">present</span><span class="si">=}</span><span class="s2"> </span><span class="si">{</span><span class="n">mask2</span><span class="si">=}</span><span class="s2">&quot;</span><span class="p">)</span><span class="w"> </span> <span class="bp">self</span><span class="o">.</span><span class="n">mask</span> <span class="o">=</span> <span class="p">[</span><span class="n">m1</span> <span class="ow">or</span> <span class="n">m2</span> <span class="k">for</span> <span class="n">m1</span><span class="p">,</span> <span class="n">m2</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">mask1</span><span class="p">,</span> <span class="n">mask2</span><span class="p">)]</span><span class="w"> </span> <span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="w"> </span> <span class="sa">f</span><span class="s2">&quot;</span><span class="se">\t</span><span class="s2">optimize: </span><span class="si">{</span><span class="n">dash_mask</span><span class="p">(</span><span class="n">mask1</span><span class="p">)</span><span class="si">}</span><span class="s2"> | </span><span class="si">{</span><span class="n">dash_mask</span><span class="p">(</span><span class="n">mask2</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span> <span class="sa">f</span><span class="s2">&quot; =&gt; </span><span class="si">{</span><span class="n">dash_mask</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">mask</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span> <span class="p">)</span><span class="w"> </span> <span class="k">return</span> <span class="n">mask2</span> </pre> <p>Here are some examples of it in action. Going from <tt class="docutils literal"><span class="pre">---ET</span></tt> to <tt class="docutils literal"><span class="pre">-ESET</span></tt>:</p> <pre class="code bash literal-block"> <span class="c1"># answer: BESET </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-vv<span class="w"> </span><span class="nv">CIVET</span><span class="o">=</span>...ET<span class="w"> </span><span class="nv">EGRET</span><span class="o">=</span>e..ET<span class="w"> </span><span class="nv">SLEET</span><span class="o">=</span>s.eET<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span>---ET,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span>EST,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span>CGILRV,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=[</span>ES,-,E,-,-<span class="o">]</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span>ABDFHJKMNOPQUWXYZ<span class="o">)</span><span class="w"> </span><span class="nv">valid</span><span class="o">=</span>Counter<span class="o">({</span><span class="s1">'E'</span>:<span class="w"> </span><span class="m">2</span>,<span class="w"> </span><span class="s1">'T'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'S'</span>:<span class="w"> </span><span class="m">1</span><span class="o">})</span><span class="w"> </span><span class="nv">correct</span><span class="o">=</span>Counter<span class="o">({</span><span class="s1">'E'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'T'</span>:<span class="w"> </span><span class="m">1</span><span class="o">})</span><span class="w"> </span><span class="nv">present</span><span class="o">=</span>Counter<span class="o">({</span><span class="s1">'E'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'S'</span>:<span class="w"> </span><span class="m">1</span><span class="o">})</span><span class="w"> </span><span class="m">2</span><span class="w"> </span>-&gt;<span class="w"> </span>E<span class="w"> </span><span class="m">3</span><span class="w"> </span>-&gt;<span class="w"> </span>S<span class="w"> </span><span class="nv">present</span><span class="o">=</span>Counter<span class="o">()</span><span class="w"> </span><span class="nv">mask2</span><span class="o">=[</span>None,<span class="w"> </span><span class="s1">'E'</span>,<span class="w"> </span><span class="s1">'S'</span>,<span class="w"> </span>None,<span class="w"> </span>None<span class="o">]</span><span class="w"> </span>optimize:<span class="w"> </span>---ET<span class="w"> </span><span class="p">|</span><span class="w"> </span>-ES--<span class="w"> </span><span class="o">=</span>&gt;<span class="w"> </span>-ESET </pre> <p>And from <tt class="docutils literal"><span class="pre">C----</span></tt> to <tt class="docutils literal">CLER-</tt>:</p> <pre class="code bash literal-block"> <span class="c1"># answer: CLERK </span>$<span class="w"> </span>./wordle.py<span class="w"> </span>-vv<span class="w"> </span><span class="nv">SINCE</span><span class="o">=</span>...ce<span class="w"> </span><span class="nv">CEDAR</span><span class="o">=</span>Ce..r<span class="w"> </span><span class="nv">CRUEL</span><span class="o">=</span>Cr.el<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span>C----,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span>CELR,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span>ADINSU,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=[</span>-,ER,-,CE,ELR<span class="o">]</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span>BFGHJKMOPQTVWXYZ<span class="o">)</span><span class="w"> </span><span class="nv">valid</span><span class="o">=</span>Counter<span class="o">({</span><span class="s1">'C'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'E'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'R'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'L'</span>:<span class="w"> </span><span class="m">1</span><span class="o">})</span><span class="w"> </span><span class="nv">correct</span><span class="o">=</span>Counter<span class="o">({</span><span class="s1">'C'</span>:<span class="w"> </span><span class="m">1</span><span class="o">})</span><span class="w"> </span><span class="nv">present</span><span class="o">=</span>Counter<span class="o">({</span><span class="s1">'E'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'R'</span>:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span><span class="s1">'L'</span>:<span class="w"> </span><span class="m">1</span><span class="o">})</span><span class="w"> </span><span class="m">3</span><span class="w"> </span>-&gt;<span class="w"> </span>E<span class="w"> </span><span class="m">4</span><span class="w"> </span>-&gt;<span class="w"> </span>R<span class="w"> </span><span class="m">2</span><span class="w"> </span>-&gt;<span class="w"> </span>L<span class="w"> </span><span class="nv">present</span><span class="o">=</span>Counter<span class="o">()</span><span class="w"> </span><span class="nv">mask2</span><span class="o">=[</span>None,<span class="w"> </span><span class="s1">'L'</span>,<span class="w"> </span><span class="s1">'E'</span>,<span class="w"> </span><span class="s1">'R'</span>,<span class="w"> </span>None<span class="o">]</span><span class="w"> </span>optimize:<span class="w"> </span>C----<span class="w"> </span><span class="p">|</span><span class="w"> </span>-LER-<span class="w"> </span><span class="o">=</span>&gt;<span class="w"> </span>CLER- </pre> </div> <div class="section" id="demanding-an-explanation"> <h3>Demanding an Explanation</h3> <p>Would you like to know <em>why</em> a guess is ineligible? We can do that too.</p> <pre class="code bash literal-block"> <span class="c1"># answer: ROUSE </span>$<span class="w"> </span>./wordle.py<span class="w"> </span><span class="nv">THIEF</span><span class="o">=</span>...e.<span class="w"> </span><span class="nv">BLADE</span><span class="o">=</span>....E<span class="w"> </span><span class="nv">GROVE</span><span class="o">=</span>.ro.E<span class="w"> </span><span class="se">\ </span><span class="w"> </span>--words<span class="w"> </span>ROMEO<span class="w"> </span>PROSE<span class="w"> </span>STORE<span class="w"> </span>MURAL<span class="w"> </span>ROUSE<span class="w"> </span>--explain<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span>----E,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span>EOR,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span>ABDFGHILTV,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=[</span>-,R,O,E,-<span class="o">]</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span>CJKMNPQSUWXYZ<span class="o">)</span><span class="w"> </span>guess_scores:<span class="w"> </span><span class="o">[</span><span class="s1">'THIEF=...e.|⬛⬛⬛🟨⬛, BLADE=....E|⬛⬛⬛⬛🟩, GROVE=.ro.E|⬛🟨🟨⬛🟩'</span><span class="o">]</span><span class="w"> </span>ROMEO<span class="w"> </span>Mask:<span class="w"> </span>needs<span class="w"> </span>----E<span class="p">;</span><span class="w"> </span>WrongSpot:<span class="w"> </span>has<span class="w"> </span>---E-<span class="w"> </span>PROSE<span class="w"> </span>WrongSpot:<span class="w"> </span>has<span class="w"> </span>-RO--<span class="w"> </span>STORE<span class="w"> </span>Invalid:<span class="w"> </span>has<span class="w"> </span>-T---<span class="p">;</span><span class="w"> </span>WrongSpot:<span class="w"> </span>has<span class="w"> </span>--O--<span class="w"> </span>MURAL<span class="w"> </span>Valid:<span class="w"> </span>missing<span class="w"> </span>EO<span class="p">;</span><span class="w"> </span>Mask:<span class="w"> </span>needs<span class="w"> </span>----E<span class="p">;</span><span class="w"> </span>Invalid:<span class="w"> </span>has<span class="w"> </span>---AL<span class="w"> </span>ROUSE<span class="w"> </span>Eligible </pre> <pre class="code bash literal-block"> <span class="c1"># answer: BIRCH </span>$<span class="w"> </span>./wordle.py<span class="w"> </span><span class="nv">CLAIM</span><span class="o">=</span>c..i.<span class="w"> </span><span class="nv">TRICE</span><span class="o">=</span>.riC.<span class="w"> </span><span class="se">\ </span><span class="w"> </span>--words<span class="w"> </span>INCUR<span class="w"> </span>TAXIS<span class="w"> </span>PRICY<span class="w"> </span>ERICA<span class="w"> </span>BIRCH<span class="w"> </span>--explain<span class="w"> </span>WordleGuesses<span class="o">(</span><span class="nv">mask</span><span class="o">=</span>---C-,<span class="w"> </span><span class="nv">valid</span><span class="o">=</span>CIR,<span class="w"> </span><span class="nv">invalid</span><span class="o">=</span>AELMT,<span class="w"> </span><span class="nv">wrong_spot</span><span class="o">=[</span>C,R,I,I,-<span class="o">]</span>,<span class="w"> </span><span class="nv">unused</span><span class="o">=</span>BDFGHJKNOPQSUVWXYZ<span class="o">)</span><span class="w"> </span>guess_scores:<span class="w"> </span><span class="o">[</span><span class="s1">'CLAIM=c..i.|🟨⬛⬛🟨⬛, TRICE=.riC.|⬛🟨🟨🟩⬛'</span><span class="o">]</span><span class="w"> </span>INCUR<span class="w"> </span>Mask:<span class="w"> </span>needs<span class="w"> </span>---C-<span class="w"> </span>TAXIS<span class="w"> </span>Valid:<span class="w"> </span>missing<span class="w"> </span>CR<span class="p">;</span><span class="w"> </span>Mask:<span class="w"> </span>needs<span class="w"> </span>---C-<span class="p">;</span><span class="w"> </span>Invalid:<span class="w"> </span>has<span class="w"> </span>TA---<span class="p">;</span><span class="w"> </span>WrongSpot:<span class="w"> </span>has<span class="w"> </span>---I-<span class="w"> </span>PRICY<span class="w"> </span>WrongSpot:<span class="w"> </span>has<span class="w"> </span>-RI--<span class="w"> </span>ERICA<span class="w"> </span>Invalid:<span class="w"> </span>has<span class="w"> </span>E---A<span class="p">;</span><span class="w"> </span>WrongSpot:<span class="w"> </span>has<span class="w"> </span>-RI--<span class="w"> </span>BIRCH<span class="w"> </span>Eligible </pre> <p>Here's how those explanations were computed, using a variation on <tt class="docutils literal">is_eligible</tt>:</p> <!-- wordle --> <pre class="code python literal-block"> <span class="k">class</span> <span class="nc">WordleGuesses</span><span class="p">:</span><span class="w"> </span> <span class="k">def</span> <span class="nf">is_ineligible</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">word</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">str</span><span class="p">]:</span><span class="w"> </span> <span class="n">reasons</span> <span class="o">=</span> <span class="p">{}</span><span class="w"> </span> <span class="n">letters</span> <span class="o">=</span> <span class="p">{</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word</span><span class="p">}</span><span class="w"> </span> <span class="k">if</span> <span class="n">missing</span> <span class="o">:=</span> <span class="bp">self</span><span class="o">.</span><span class="n">valid</span> <span class="o">-</span> <span class="p">(</span><span class="n">letters</span> <span class="o">&amp;</span> <span class="bp">self</span><span class="o">.</span><span class="n">valid</span><span class="p">):</span><span class="w"> </span> <span class="c1"># Did not have the full set of green+yellow letters known to be valid</span><span class="w"> </span> <span class="n">reasons</span><span class="p">[</span><span class="s2">&quot;Valid&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;missing </span><span class="si">{</span><span class="n">letter_set</span><span class="p">(</span><span class="n">missing</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span> <span class="n">mask</span> <span class="o">=</span> <span class="p">[(</span><span class="n">m</span> <span class="k">if</span> <span class="n">c</span> <span class="o">!=</span> <span class="n">m</span> <span class="k">else</span> <span class="kc">None</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span><span class="p">,</span> <span class="n">m</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">mask</span><span class="p">)]</span><span class="w"> </span> <span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">mask</span><span class="p">):</span><span class="w"> </span> <span class="c1"># Couldn't find all the green/correct letters</span><span class="w"> </span> <span class="n">reasons</span><span class="p">[</span><span class="s2">&quot;Mask&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;needs </span><span class="si">{</span><span class="n">dash_mask</span><span class="p">(</span><span class="n">mask</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span> <span class="n">invalid</span> <span class="o">=</span> <span class="p">[(</span><span class="n">c</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">invalid</span> <span class="k">else</span> <span class="kc">None</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word</span><span class="p">]</span><span class="w"> </span> <span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">invalid</span><span class="p">):</span><span class="w"> </span> <span class="c1"># Invalid (black) letters present at specific positions</span><span class="w"> </span> <span class="n">reasons</span><span class="p">[</span><span class="s2">&quot;Invalid&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;has </span><span class="si">{</span><span class="n">dash_mask</span><span class="p">(</span><span class="n">invalid</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span> <span class="n">wrong</span> <span class="o">=</span> <span class="p">[(</span><span class="n">c</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">ws</span> <span class="k">else</span> <span class="kc">None</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span><span class="p">,</span> <span class="n">ws</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">wrong_spot</span><span class="p">)]</span><span class="w"> </span> <span class="k">if</span> <span class="nb">any</span><span class="p">(</span><span class="n">wrong</span><span class="p">):</span><span class="w"> </span> <span class="c1"># Found some yellow letters: valid letters in wrong position</span><span class="w"> </span> <span class="n">reasons</span><span class="p">[</span><span class="s2">&quot;WrongSpot&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;has </span><span class="si">{</span><span class="n">dash_mask</span><span class="p">(</span><span class="n">wrong</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="w"> </span> <span class="k">return</span> <span class="n">reasons</span><span class="w"> </span> <span class="k">def</span> <span class="nf">find_explanations_</span><span class="p">(</span><span class="w"> </span> <span class="bp">self</span><span class="p">,</span> <span class="n">vocabulary</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]</span><span class="w"> </span> <span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">tuple</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">str</span> <span class="o">|</span> <span class="kc">None</span><span class="p">]]:</span><span class="w"> </span> <span class="n">explanations</span> <span class="o">=</span> <span class="p">[]</span><span class="w"> </span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">vocabulary</span><span class="p">:</span><span class="w"> </span> <span class="n">reasons</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">is_ineligible</span><span class="p">(</span><span class="n">word</span><span class="p">)</span><span class="w"> </span> <span class="n">why</span> <span class="o">=</span> <span class="kc">None</span><span class="w"> </span> <span class="k">if</span> <span class="n">reasons</span><span class="p">:</span><span class="w"> </span> <span class="n">why</span> <span class="o">=</span> <span class="s2">&quot;; &quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="w"> </span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">k</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">&quot;</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">is_ineligible</span><span class="p">(</span><span class="n">word</span><span class="p">)</span><span class="o">.</span><span class="n">items</span><span class="p">())</span><span class="w"> </span> <span class="n">explanations</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">word</span><span class="p">,</span> <span class="n">why</span><span class="p">))</span><span class="w"> </span> <span class="k">return</span> <span class="n">explanations</span> </pre> <p>This approach is slower than <tt class="docutils literal">is_eligible</tt>, though it's not noticeable when running <tt class="docutils literal">wordle.py</tt> for one set of guess–scores. I have a test tool (<tt class="docutils literal">score.py</tt>) that runs through the 200+ games that I've recorded. Using <tt class="docutils literal">find_explanations</tt>, it took about 10 seconds to run. Switching to <tt class="docutils literal">find_eligible</tt>, it dropped to 2 seconds (5x improvement). By prefiltering the word list with a regex made from the mask, the time drops to half a second (further 4x improvement).</p> <pre class="code python literal-block"> <span class="n">pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">&quot;&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">m</span> <span class="ow">or</span> <span class="s2">&quot;.&quot;</span> <span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">parsed_guesses</span><span class="o">.</span><span class="n">mask</span><span class="p">))</span><span class="w"> </span><span class="n">word_list</span> <span class="o">=</span> <span class="p">[</span><span class="n">w</span> <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">vocabulary</span> <span class="k">if</span> <span class="n">pattern</span><span class="o">.</span><span class="n">fullmatch</span><span class="p">(</span><span class="n">w</span><span class="p">)]</span><span class="w"> </span><span class="n">eligible</span> <span class="o">=</span> <span class="n">parsed_guesses</span><span class="o">.</span><span class="n">find_eligible</span><span class="p">(</span><span class="n">word_list</span><span class="p">)</span> </pre> </div> <div class="section" id="finally"> <h3>Finally</h3> <p>I thought I knew a lot about solving Wordle programmatically when I started this long post a month ago. As I wrote this, I realized that I could use a few ugly greps to accomplish the same thing as my Python code; wrote a tool to render games as HTML and emojis; spun off a couple of blog posts on <a class="reference external" href="https://www.georgevreilly.com/blog/2023/09/02/PythonEnumsWithAttributes.html">multi-attribute enumeration</a> and <a class="reference external" href="https://www.georgevreilly.com/blog/2023/09/05/RegexConjunctions.html">regex conjunctions</a>; found and fixed several bugs with repeated letters, greatly refining my understanding of the nuances; rewrote the sections on repeated letters repeatedly; added a means to explain ineligibility; and had a minor epiphany about optimizing the mask programmatically.</p> <p>The full code can be found in my <a class="reference external" href="https://github.com/georgevreilly/wordle">Wordle repository</a>.</p> </div> <div class="section" id="other-work"> <h3>Other Work</h3> <p>I found these articles after I completed the final draft of this post.</p> <ul class="simple"> <li>Bertsimas and Paskov used <a class="reference external" href="https://mitsloan.mit.edu/ideas-made-to-matter/how-algorithm-solves-wordle">Exact Dynamic Programming</a> to find <a class="reference external" href="http://wordle-page.s3-website-us-east-1.amazonaws.com/assets/Wordle_Paper_Final.pdf">An Exact and Interpretable Solution to Wordle</a>.</li> <li><a class="reference external" href="https://yannlandry.photography/blog/wordle-intelligent-solver">Yann Landry's Solver</a> is a little JavaScript and HTML tool that tries to pick the best next word using a scoring system.</li> <li><a class="reference external" href="https://www.inspiredpython.com/article/solving-wordle-puzzles-with-basic-python">Solving with Basic Python</a> makes suggestions for each round based on word commonality.</li> <li>Some <a class="reference external" href="https://mashable.com/article/wordle-tips-tricks">Tips and Tricks</a> for playing the game.</li> </ul> <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -_ --> <!-- Sticking the Wordle stylesheet at the end out of the way --> <link rel="stylesheet" href="/wordle.css"></div> Tue, 26 Sep 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-09-26:/2023/09/26/ExploringWordle.html Regex Conjunctions https://www.georgevreilly.com/blog/2023/09/05/RegexConjunctions.html <p>Most regular expression engines make it easy to match <a class="reference external" href="https://www.regular-expressions.info/alternation.html">alternations</a> (or disjunctions) with the <tt class="docutils literal">|</tt> operator: to match <em>either</em> <tt class="docutils literal">foo</tt> <em>or</em> <tt class="docutils literal">bar</tt>, use <tt class="docutils literal">foo|bar</tt>.</p> <p>Few regex engines have any provisions for <a class="reference external" href="https://unix.stackexchange.com/a/55391/4060">conjunctions</a>, and the syntax is often horrible. Awk makes it easy to match <tt class="docutils literal">/pat1/ &amp;&amp; /pat2/ &amp;&amp; /pat3/</tt>.</p> <pre class="code bash literal-block"> $ cat <span class="s">&lt;&lt;EOF | awk '/bar/ &amp;&amp; /foo/' &gt; foo bar &gt; bar &gt; barfy food &gt; barfly &gt; EOF</span> foo bar barfy food </pre> <p>In the case of a Unix pipeline, the conjunction could also be expressed as a series of pipes: <tt class="docutils literal">... | grep pat1 | grep pat2 | grep pat3 | ...</tt>.</p> <p>The <a class="reference external" href="https://www.georgevreilly.com/blog/2020/04/23/regex-32-problems.html">longest regex</a> that I ever encountered was an enormous alternation—a true horror that shouldn't have been a regex at all.</p> Tue, 05 Sep 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-09-05:/blog/2023/09/05/RegexConjunctions.html Python Enums with Attributes https://www.georgevreilly.com/blog/2023/09/02/PythonEnumsWithAttributes.html <p>Python <a class="reference external" href="https://realpython.com/python-enum/">enumerations</a> are useful for grouping related constants in a namespace. You can add additional behaviors to an enum class, but there isn't an easy and obvious way to add attributes to enum members.</p> <pre class="code python literal-block"> <span class="k">class</span> <span class="nc">TileState</span><span class="p">(</span><span class="n">Enum</span><span class="p">):</span> <span class="n">CORRECT</span> <span class="o">=</span> <span class="mi">1</span> <span class="n">PRESENT</span> <span class="o">=</span> <span class="mi">2</span> <span class="n">ABSENT</span> <span class="o">=</span> <span class="mi">3</span> <span class="k">def</span> <span class="nf">color</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">if</span> <span class="bp">self</span> <span class="ow">is</span> <span class="bp">self</span><span class="o">.</span><span class="n">CORRECT</span><span class="p">:</span> <span class="k">return</span> <span class="s2">&quot;Green&quot;</span> <span class="k">elif</span> <span class="bp">self</span> <span class="ow">is</span> <span class="bp">self</span><span class="o">.</span><span class="n">PRESENT</span><span class="p">:</span> <span class="k">return</span> <span class="s2">&quot;Yellow&quot;</span> <span class="k">elif</span> <span class="bp">self</span> <span class="ow">is</span> <span class="bp">self</span><span class="o">.</span><span class="n">ABSENT</span><span class="p">:</span> <span class="k">return</span> <span class="s2">&quot;Black&quot;</span> <span class="k">def</span> <span class="nf">emoji</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="p">{</span> <span class="bp">self</span><span class="o">.</span><span class="n">CORRECT</span><span class="p">:</span> <span class="s2">&quot;</span><span class="se">\U0001F7E9</span><span class="s2">&quot;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">PRESENT</span><span class="p">:</span> <span class="s2">&quot;</span><span class="se">\U0001F7E8</span><span class="s2">&quot;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">ABSENT</span><span class="p">:</span> <span class="s2">&quot;</span><span class="se">\U00002B1B</span><span class="s2">&quot;</span><span class="p">,</span> <span class="p">}[</span><span class="bp">self</span><span class="p">]</span> </pre> <p>Accessing the members and the methods:</p> <pre class="code pycon literal-block"> <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">ts</span> <span class="ow">in</span> <span class="n">TileState</span><span class="p">:</span> <span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">name</span><span class="si">:</span><span class="s2">&lt;7</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">value</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">color</span><span class="p">()</span><span class="si">:</span><span class="s2">&lt;6</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">emoji</span><span class="p">()</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> <span class="gp">...</span> <span class="go">CORRECT: 1 Green 🟩 PRESENT: 2 Yellow 🟨 ABSENT : 3 Black ⬛</span> </pre> <p>You can add methods like <tt class="docutils literal">color()</tt> and <tt class="docutils literal">emoji()</tt> above—you can even decorate them with <tt class="docutils literal">&#64;property</tt> so that you don't need parentheses—but you have to remember to update <em>every</em> method when you add or remove members from the enumeration.</p> <div class="section" id="namedtuples-to-the-rescue"> <h3>Namedtuples to the rescue</h3> <p>It <a class="reference external" href="https://stackoverflow.com/a/62601113/6364">turns out</a> that you can build a <a class="reference external" href="https://www.georgevreilly.com/blog/2016/01/14/PythonBaseClassOrder.html">mixin</a> enumeration from <a class="reference external" href="https://realpython.com/python-namedtuple/">namedtuple</a> and <tt class="docutils literal">Enum</tt> that gives terse construction syntax:</p> <pre class="code python literal-block"> <span class="k">class</span> <span class="nc">TileState</span><span class="p">(</span><span class="n">namedtuple</span><span class="p">(</span><span class="s2">&quot;TileState&quot;</span><span class="p">,</span> <span class="s2">&quot;value emoji color css_color&quot;</span><span class="p">),</span> <span class="n">Enum</span><span class="p">):</span> <span class="n">CORRECT</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;</span><span class="se">\U0001F7E9</span><span class="s2">&quot;</span><span class="p">,</span> <span class="s2">&quot;Green&quot;</span><span class="p">,</span> <span class="s2">&quot;#6aaa64&quot;</span> <span class="n">PRESENT</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;</span><span class="se">\U0001F7E8</span><span class="s2">&quot;</span><span class="p">,</span> <span class="s2">&quot;Yellow&quot;</span><span class="p">,</span> <span class="s2">&quot;#c9b458&quot;</span> <span class="n">ABSENT</span> <span class="o">=</span> <span class="mi">3</span><span class="p">,</span> <span class="s2">&quot;</span><span class="se">\U00002B1B</span><span class="s2">&quot;</span><span class="p">,</span> <span class="s2">&quot;Black&quot;</span><span class="p">,</span> <span class="s2">&quot;#838184&quot;</span> </pre> <p>Each member now has multiple read-only attributes, like <tt class="docutils literal">emoji</tt> and <tt class="docutils literal">css_color</tt>:</p> <pre class="code pycon literal-block"> <span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">ts</span> <span class="ow">in</span> <span class="n">TileState</span><span class="p">:</span> <span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">name</span><span class="si">:</span><span class="s2">&lt;7</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">value</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">emoji</span><span class="si">}</span><span class="s2"> U+</span><span class="si">{</span><span class="nb">ord</span><span class="p">(</span><span class="n">ts</span><span class="o">.</span><span class="n">emoji</span><span class="p">)</span><span class="si">:</span><span class="s2">05x</span><span class="si">}</span><span class="s2"> &quot;</span> <span class="gp">... </span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">color</span><span class="si">:</span><span class="s2">&lt;6</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">ts</span><span class="o">.</span><span class="n">css_color</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> <span class="gp">...</span> <span class="go">CORRECT: 1 🟩 U+1f7e9 Green #6aaa64 PRESENT: 2 🟨 U+1f7e8 Yellow #c9b458 ABSENT : 3 ⬛ U+02b1b Black #838184</span> </pre> </div> Sat, 02 Sep 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-09-02:/blog/2023/09/02/PythonEnumsWithAttributes.html Patching a Python Wheel https://www.georgevreilly.com/blog/2023/08/10/PatchingAPythonWheel.html <p>Recently, I had to create a new <a class="reference external" href="https://realpython.com/python-wheels/">Python wheel</a> for <a class="reference external" href="https://pytorch.org/">PyTorch</a>. There is a <a class="reference external" href="https://github.com/pytorch/pytorch/issues/99622">cyclic dependency</a> between PyTorch 2.0.1 and Triton 2.0.0: Torch depends upon Triton, but Triton also depends on Torch. <a class="reference external" href="https://pip.pypa.io/en/latest/">Pip</a> is okay with installing packages where there's a cyclic dependency. <a class="reference external" href="https://bazel.build/">Bazel</a>, however, <a class="reference external" href="https://github.com/bazelbuild/rules_python/issues/1076">does not handle</a> cyclic dependencies between packages. We use Bazel extensively at Stripe and this cyclic dependency prevented us from using the latest version of Torch.</p> <p>I spent a few days trying to build the PyTorch wheel from source. It was a <em>nightmare!</em> I ran out of disk space on the root partition on my EC2 devbox trying to install system packages, so I had to bring up a custom instance. Then I ran out of space on the main partition, trying to compile, so I had to bring up another custom instance. Then I realized I had installed CUDA 12.1 and couldn't install CUDA 11.8 over it, so yet another instance. Then a long list of other problems. I was eventually able to get <tt class="docutils literal">python setup.py develop</tt> to execute, but it took three hours! And I had little confidence that I was building the same thing that was in the official wheels.</p> <p>Then I had a brainwave: what if I <a class="reference external" href="https://en.wikipedia.org/wiki/Patch_(computing)">patch</a> the official Torch wheel and simply remove the requirement on Triton? All the officially built code would remain untouched. That worked!</p> <p>This post is adapted from my <a class="reference external" href="https://github.com/pytorch/pytorch/issues/99622#issuecomment-1604812054">writeup on the issue</a>.</p> <div class="section" id="what-is-a-wheel"> <h3>What is a Wheel?</h3> <p>A Python <a class="reference external" href="https://packaging.python.org/en/latest/specifications/binary-distribution-format/">wheel</a> is a ready-to-install Python package that requires no compilation at installation time. Unlike older formats such as source distributions or eggs, <tt class="docutils literal">setup.py</tt> is not run during installation from a wheel. The older formats conflated build and install and required arbitrary code to run.</p> <p>A wheel is a <a class="reference external" href="https://en.wikipedia.org/wiki/ZIP_(file_format)">Zip archive</a> with a specially formatted filename and a <tt class="docutils literal">.whl</tt> extension. The wheel contains a <tt class="docutils literal"><span class="pre">dist-info</span></tt> metadata directory and the installable payload. A wheel is either pure Python, which can install on any platform, or a platform (binary) wheel, which usually contains compiled Python extension code.</p> <p>Java JARs, Android APKs, Mozilla XPIs, and many other file types are also structured Zip archives.</p> </div> <div class="section" id="manual-patching"> <h3>Manual Patching</h3> <p>The wheel file's <a class="reference external" href="https://packaging.python.org/en/latest/specifications/binary-distribution-format/#file-contents">contents</a> include the <tt class="docutils literal"><span class="pre">{distribution}-{version}.dist-info/</span></tt> directory, which contains metadata about the wheel.</p> <p>In the case of PyTorch 2.0.1, I had <tt class="docutils literal"><span class="pre">torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl</span></tt>, a Linux <tt class="docutils literal">x86_64</tt> wheel for Python 3.8.</p> <p>I used <tt class="docutils literal">unzip</tt> to extract the wheel's contents into a directory, <tt class="docutils literal">torch201.2</tt>. (The <tt class="docutils literal">.2</tt> denoted my second attempt.) In the <tt class="docutils literal">torch201.2</tt> directory was the entire content of the wheel, including the <tt class="docutils literal"><span class="pre">torch-2.0.1.dist-info/</span></tt> subdirectory.</p> <pre class="code bash literal-block"> unzip -d torch201.2 torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl <span class="nb">cd</span> torch201.2 <span class="c1"># Rename the `dist-info` directory to include '+stripe.2' as a suffix for `2.0.1` </span>mv torch-2.0.1<span class="o">{</span>,+stripe.2<span class="o">}</span>.dist-info/ <span class="nb">cd</span> torch-2.0.1+stripe.2.dist-info/ </pre> <p>Normally, when we build wheels for forked version of Python packages at Stripe, we append <tt class="docutils literal"><span class="pre">+stripe.{major}.{commits}.{revision}</span></tt> to the version number. Both <tt class="docutils literal">commits</tt> and <tt class="docutils literal">revision</tt> come from the output of <tt class="docutils literal">git describe <span class="pre">--tags</span> HEAD</tt>, which <a class="reference external" href="https://git-scm.com/docs/git-describe#_examples">looks like</a> <tt class="docutils literal"><span class="pre">{tag}-{commits}-g{revision}</span></tt>; <tt class="docutils literal">major</tt> is currently hardcoded to <tt class="docutils literal">1</tt>. This suffix helps distinguish a forked wheel's version from the upstream version number.</p> <p>Since I wasn't forking, I used a simplified scheme, <tt class="docutils literal"><span class="pre">+stripe.{attempt}</span></tt>.</p> <p>Then I updated some <a class="reference external" href="https://packaging.python.org/en/latest/specifications/core-metadata/">fields</a> in <tt class="docutils literal"><span class="pre">torch-2.0.1+stripe.2.dist-info/METADATA</span></tt>:</p> <ul class="simple"> <li>Updated <tt class="docutils literal">Version</tt> to include <tt class="docutils literal">+stripe.2</tt></li> <li>Removed the <tt class="docutils literal"><span class="pre">Requires-Dist</span></tt> line for <tt class="docutils literal">triton</tt>. This is the crucial step to fix the cyclic dependency problem.</li> </ul> <p>Now I had to update <tt class="docutils literal"><span class="pre">torch-2.0.1+stripe.2.dist-info/RECORD</span></tt>, which contains signatures for all the files in the wheel, in the form <tt class="docutils literal"><span class="pre">{filename},sha256={safe_hash},{filesize}</span></tt>. Of course, <tt class="docutils literal">RECORD</tt> does not have an entry for itself.</p> <p>The paths to all the <tt class="docutils literal"><span class="pre">dist-info</span></tt> files needed to be updated in <tt class="docutils literal">RECORD</tt> to include the <tt class="docutils literal">+stripe.2</tt> suffix.</p> <p>In Vim terms:</p> <pre class="code vim literal-block"> <span class="p">:</span>%s<span class="sr">/^\(torch-2.0.1\)\(\.dist-info\)/</span>\<span class="m">1</span><span class="p">+</span>stripe.<span class="m">2</span>\<span class="m">2</span>/ </pre> <p>You can use this <tt class="docutils literal">record_hash.py</tt> script to compute the entry for a file:</p> <pre class="code python literal-block"> <span class="ch">#!/usr/bin/env python3</span> <span class="kn">import</span> <span class="nn">base64</span> <span class="kn">import</span> <span class="nn">hashlib</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="n">filename</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s2">&quot;rb&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">digest</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha256</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">())</span> <span class="n">safe_hash</span> <span class="o">=</span> <span class="n">base64</span><span class="o">.</span><span class="n">urlsafe_b64encode</span><span class="p">(</span><span class="n">digest</span><span class="o">.</span><span class="n">digest</span><span class="p">())</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s2">&quot;us-ascii&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">rstrip</span><span class="p">(</span><span class="s2">&quot;=&quot;</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">filename</span><span class="si">}</span><span class="s2">,sha256=</span><span class="si">{</span><span class="n">safe_hash</span><span class="si">}</span><span class="s2">,</span><span class="si">{</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> </pre> <p>The output will look like this:</p> <pre class="code bash literal-block"> $ ../record_hash.py torch-2.0.1+stripe.2.dist-info/METADATA torch-2.0.1+stripe.2.dist-info/METADATA,sha256<span class="o">=</span>StmZkVzCWlHIxaIGVJocXv7JsDnlrSaNXwtuIlE_PKc,24703 </pre> <p>Replace the <tt class="docutils literal">METADATA</tt> entry in <tt class="docutils literal">RECORD</tt> with the output from <tt class="docutils literal">record_hash.py</tt>.</p> <p>Finally, you can <tt class="docutils literal">zip</tt> up everything into a new wheel. Note the <tt class="docutils literal">+stripe.2</tt> in the new wheel's filename:</p> <pre class="literal-block"> zip ../torch-2.0.1+stripe.2-cp38-cp38-manylinux1_x86_64.whl -r . </pre> <p>At this point, you can upload the wheel to a private repository.</p> <p>To install the wheel:</p> <pre class="literal-block"> pip install torch==2.0.1+stripe.2 </pre> <p>You will not see <tt class="docutils literal">triton</tt> being installed, unlike before. However, if you do install <tt class="docutils literal">triton</tt>, it will be satisfied by this patched version of <tt class="docutils literal">torch</tt>.</p> </div> <div class="section" id="summary"> <h3>Summary</h3> <p>If you have to manually patch a Python wheel:</p> <ul class="simple"> <li>Decide upon a suffix, such as <tt class="docutils literal">+stripe.2</tt>.</li> <li>Unzip the wheel.</li> <li>Rename the <tt class="docutils literal"><span class="pre">dist-info</span></tt> directory to include the suffix.</li> <li>Update <tt class="docutils literal">Version</tt> in <tt class="docutils literal">METADATA</tt> to include the suffix.</li> <li><strong>Make other modifications.</strong></li> <li>Append the suffix to the <tt class="docutils literal"><span class="pre">dist-info</span></tt> entries in <tt class="docutils literal">RECORD</tt>.</li> <li>Use <tt class="docutils literal">record_hash.py</tt> to compute new entries for all modified files. Update <tt class="docutils literal">RECORD</tt> accordingly.</li> <li>Zip up the new wheel. Include the suffix in the filename.</li> <li><tt class="docutils literal">pip install</tt> the new wheel.</li> </ul> </div> Thu, 10 Aug 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-08-10:/blog/2023/08/10/PatchingAPythonWheel.html Bram Moolenaar RIP https://www.georgevreilly.com/blog/2023/08/07/BramMoolenaarRIP.html <a class="reference external image-reference" href="https://www.vim.org/"><img alt="Vim" src="https://www.georgevreilly.com/content/binary/vim-logo-png-transparent.png" style="width: 250px;"/></a> <p>I woke up on Saturday to read on Bram Moolenaar's Facebook page an <a class="reference external" href="https://www.facebook.com/bram.moolenaar/posts/pfbid0d7rBdoVZu7Ww2yvmpEjmjJ1B3WYVFf86nFrFXczmRcYzjUxChq3xcjH84zURsZYjl">announcement</a> of his death. I knew Bram online for nearly 30 years and I was one of his relatively small number of Facebook friends, but we never met in real life. I knew that he had retired from Google Zurich to Tenerife, but I hadn't been aware that he had been ill.</p> <p>Bram was known to the world for his signature creation, the <a class="reference external" href="https://www.vim.org">Vim text editor</a>, used by millions of developers on Linux, macOS, and Windows. Vim stands for Vi IMproved, but it outgrew the original <tt class="docutils literal">vi</tt> long ago.</p> <p>I was an <a class="reference external" href="https://www.georgevreilly.com/blog/2005/12/30/20YearsOfVi.html">active contributor</a> to Vim in the 1990s: I wrote a lot of the Win32 console mode code as well as the alpha version of Windows gVim; my name is at the top of the <a class="reference external" href="https://vimhelp.org/os_win32.txt.html">page</a> if you do <tt class="docutils literal">:help win32</tt>. In the 00s, I ported Vim to Win64. I drifted away from active participation more than a decade ago, but I still lurk on the <a class="reference external" href="https://groups.google.com/g/vim_dev/">vim_dev</a> mailing list.</p> <p>Vim has been the thoroughly dominant flavor of <tt class="docutils literal">vi</tt> for a number of years, but that wasn’t the case in the 90s. There were Elvis, vile, xvi, and other things I no longer recall. Bram built a better <tt class="docutils literal">vi</tt> and he built a solid community of developers and users. I never saw the toxic behavior that’s prevalent in some tech communities. Bram was always a patient and reasonable leader. He poured countless hours into making Vim an ever better editor and he answered so many questions on the various mailing lists. Vim would not have succeeded half so well without the community that he built. I didn’t always agree with Bram's technical decisions (and neither did the NeoVim people), but I have enormous respect for what he accomplished, technically and socially.</p> <p>The other remarkable thing about Vim is that it’s <a class="reference external" href="https://vimdoc.sourceforge.net/htmldoc/uganda.html#license">charityware</a>. Vim users were strongly encouraged to donate to <a class="reference external" href="https://iccf-holland.org/">ICCF Holland</a>, which supports children in Kibaale, Uganda. Bram was the treasurer of ICCF and was involved with the work for many years. When I was at Microsoft, I got a bunch of Vim-loving engineers to donate; Microsoft matched our donations. I made another donation to ICCF today in his memory.</p> <p>It’s clear that work on Vim will continue. Although Bram was the benevolent dictator for life of Vim, a <a class="reference external" href="https://github.com/orgs/vim/people">handful of others</a> have commit rights and are planning <a class="reference external" href="https://groups.google.com/g/vim_dev/c/dq9Wu5jqVTw/m/puYIETTwAAAJ">future of the Vim project</a>. They have big shoes to fill. I don’t know enough about ICCF to say how severely this will affect them.</p> <p><strong>ETA</strong>: The upcoming Vim 9.1 release will be <a class="reference external" href="https://github.com/vim/vim/pull/12749">dedicated to Bram</a>, just as the <a class="reference external" href="https://groups.google.com/g/vim_announce/c/MJBKVd-xrEE/m/joVNaDgAAgAJ">9.0 release was dedicated to Sven Guckes</a>, who died last year. Sven was one of Vim's greatest ambassadors, endlessly helpful to users in the newsgroups. We stayed for a week with Sven in August 2014 in his Berlin apartment, and he was the most wonderful host, spending many hours showing us around his beloved city.</p> <p>The best articles that I’ve seen about Bram so far:</p> <ul class="simple"> <li><a class="reference external" href="https://www.theregister.com/2023/08/07/bram_moolenaar_obituary/">The Reg's obituary</a></li> <li><a class="reference external" href="https://j11g.com/2023/08/07/the-legacy-of-bram-moolenaar/">The Legacy of Bram Moolenaar</a>, Jan van den Berg</li> <li><a class="reference external" href="https://neovim.io/news/2023/08">Vim Boss</a>, Justin M. Keyes</li> </ul> <p><tt class="docutils literal">:wq!</tt></p> Mon, 07 Aug 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-08-07:/blog/2023/08/07/BramMoolenaarRIP.html Cold Brew Coffee Recipe https://www.georgevreilly.com/2023/07/24/ColdBrewCoffeeRecipe.html <a class="reference external image-reference" href="https://www.amazon.com/dp/B00JVSVM36/?tag=georgvreill-20"><img alt="Oxo Cold Brew Coffee Maker" class="right-float" src="https://images-na.ssl-images-amazon.com/images/P/B00JVSVM36.01.LZZZZZZZ.jpg" style="width: 300px;"/></a> <p>I often enjoy cold brew coffee in summer. I bought an <a class="reference external" href="https://www.oxo.com/cold-brew-coffee-maker.html">Oxo Cold Brew Coffee Maker</a> one winter when it was on sale at Bed, Bath &amp; Beyond. Before that, I used a <a class="reference external" href="https://www.organiccottonmart.com/blogs/sustainable-lifestyle/nut-milk-bag-vs-cheesecloth">nut milk bag</a> in a jar. I like the Oxo and it gets high marks in many reviews, such as <a class="reference external" href="https://www.homegrounds.co/oxo-cold-brew-coffee-maker-review/">HomeGrounds</a> or <a class="reference external" href="https://www.nytimes.com/wirecutter/reviews/best-cold-brew-coffee-maker/">Wirecutter</a>. It's easy to use, easy to clean, and makes a good brew.</p> <p>The only downside to making your own cold brew coffee is that you must plan ahead. You can make hot coffee in a few minutes, but cold brew takes hours.</p> <p>I have used this recipe for a number of years. It makes a smooth, less acidic coffee.</p> <div class="section" id="ingredients"> <h3>Ingredients</h3> <ul class="simple"> <li>24 fl oz water</li> <li>6 oz fresh <em>coarsely ground</em> coffee. Store-bought pre-ground coffee is too fine.</li> </ul> <p>This will half-fill the Oxo jar. It <strong>yields about 16 fl oz</strong> (1 pt) of cold brew coffee. You can double the quantities in the Oxo, if you like.</p> </div> <div class="section" id="instructions"> <h3>Instructions</h3> <ul class="simple"> <li>Grind the coffee beans coarsely.</li> <li>Place the ground coffee in the Oxo jar.</li> <li>Pour water through the rain sprinkler top.</li> <li>Swirl gently to ensure that all coffee grounds are wet. (I once followed a recipe that called for vigorous stirring. Never again! The grounds absorbed so much more water that I only got half the yield.)</li> <li>Some people put the jar in the fridge at this point. I don't bother.</li> <li>Wait for the coffee to brew! Some instructions say 12 to 24 hours. I usually wait 6–8 hours.</li> <li>Put the jar on the stand. First, make sure that the switch is <em>closed</em>.</li> <li>Place the flask under the stand. Push the switch down to release the brew.</li> <li>Let the cold brew coffee drain. This will take 10–15 minutes.</li> <li>Refrigerate the cold brew.</li> <li>Use the coffee grounds to <a class="reference external" href="https://www.southernliving.com/garden/coffee-grounds-for-hydrangeas">turn hydrangeas blue</a> or <a class="reference external" href="https://www.healthline.com/nutrition/uses-for-coffee-grounds">exfoliate your skin</a>.</li> </ul> </div> <div class="section" id="adjustments"> <h3>Adjustments</h3> <p>You can vary the ratio of coffee to water and how long you steep the mixture.</p> </div> <div class="section" id="drinks"> <h3>Drinks</h3> <ul class="simple"> <li>2 fl oz of cold brew</li> <li>6 fl oz milk</li> </ul> <p><a class="reference external" href="https://www.forkinthekitchen.com/how-to-make-cold-brew-coffee/">Fork in the Kitchen</a> has some suggestions if you don't have a coffee grinder or an Oxo maker.</p> </div> Mon, 24 Jul 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-07-24:/2023/07/24/ColdBrewCoffeeRecipe.html Compressing Tar Files in Parallel https://www.georgevreilly.com/blog/2023/02/21/CompressingTarFilesInParallel.html <p>TL;DR: use <tt class="docutils literal">tar <span class="pre">-I</span> pigz</tt> or <tt class="docutils literal">tar <span class="pre">-I</span> lbzip2</tt> to compress large tar files much more quickly.</p> <p>I investigated various ways of compressing a 7GiB tar file.</p> <p>The built-in <tt class="docutils literal"><span class="pre">--gzip</span></tt> and <tt class="docutils literal"><span class="pre">--bzip2</span></tt> compression methods in GNU <tt class="docutils literal">tar</tt> are single-threaded. If you invoke an external compressor with <tt class="docutils literal"><span class="pre">--use-compress-program</span></tt>, you can get some huge reductions in compression time, with slightly worse compression ratios.</p> <p>You can use <a class="reference external" href="https://zlib.net/pigz/">pigz</a> as a parallel replacement for <tt class="docutils literal">gzip</tt> and <a class="reference external" href="https://linux.die.net/man/1/lbzip2">lbzip2</a> as a parallel version of <tt class="docutils literal">bzip2</tt>. Both of them will make heavy use of all the cores in your system, greatly reducing the <em>real</em> time relative to the <em>user</em> time.</p> <p>Single-threaded compression timing: <tt class="docutils literal">gzip</tt> is a lot faster than <tt class="docutils literal">bzip2</tt>:</p> <pre class="literal-block"> $ time tar --bzip2 -cf huge-bzip2.tar.bz2 hugedir real 13m15.352s user 12m53.972s sys 0m16.029s $ time tar --gzip -cf huge-gzip.tar.gz hugedir real 5m56.489s user 5m30.271s sys 0m14.633s </pre> <p><tt class="docutils literal">fast</tt> parallel compression timing: <tt class="docutils literal">pigz</tt> is the clear winner:</p> <pre class="literal-block"> $ time tar --use-compress-program='lbzip2 --fast' \ -cf huge-lbzip2-fast.tar.bz2 hugedir real 2m35.967s user 11m38.865s sys 0m26.981s $ time tar --use-compress-program='pigz --fast' \ -cf huge-pigz-fast.tar.gz hugedir real 0m58.222s user 3m22.134s sys 0m17.357s </pre> <p><tt class="docutils literal">best</tt> parallel compression timing: <tt class="docutils literal">lbzip2</tt> is much quicker than <tt class="docutils literal">pigz</tt>:</p> <pre class="literal-block"> $ time tar --use-compress-program='lbzip2 --best' \ -cf huge-lbzip2-best.tar.bz2 hugedir real 1m44.365s user 11m38.277s sys 0m13.551s $ time tar --use-compress-program='pigz --best' \ -cf huge-pigz-best.tar.gz hugedir real 2m27.694s user 16m20.441s sys 0m16.092s </pre> <p>Compressed file sizes: <tt class="docutils literal">bzip2</tt> family compresses better than <tt class="docutils literal">gzip</tt> family; <tt class="docutils literal">best</tt> is smaller than default compression level which is smaller than <tt class="docutils literal">fast</tt>:</p> <pre class="literal-block"> $ ls -lSr -rw-r--r-- 1 user group 2460438578 Feb 22 03:03 huge-lbzip2-best.tar.bz2 -rw-r--r-- 1 user group 2461172874 Feb 22 03:19 huge-bzip2.tar.bz2 -rw-r--r-- 1 user group 2689784220 Feb 22 03:00 huge-lbzip2-fast.tar.bz2 -rw-r--r-- 1 user group 2691286852 Feb 22 03:06 huge-pigz-best.tar.gz -rw-r--r-- 1 user group 2704591997 Feb 22 03:25 huge-gzip.tar.gz -rw-r--r-- 1 user group 2950547862 Feb 22 03:01 huge-pigz-fast.tar.gz -rw-r--r-- 1 user group 7365222400 Feb 22 03:00 huge.tar </pre> Tue, 21 Feb 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-02-21:/blog/2023/02/21/CompressingTarFilesInParallel.html Implementing the Tree command in Rust, part 2: Printing Trees https://www.georgevreilly.com/blog/2023/01/24/TreeInRust2PrintingTrees.html <p>In <a class="reference external" href="https://www.georgevreilly.com/blog/2023/01/23/TreeInRust1WalkDirectories.html">Part 1</a>, we saw how to walk directory trees, recursively using <tt class="docutils literal"><span class="pre">fs::read_dir</span></tt> to construct an in-memory tree of <tt class="docutils literal">FileNode</tt>s. In Part 2, we'll implement the rest of the core of the <a class="reference external" href="https://en.wikipedia.org/wiki/Tree_(command)">tree command</a>: printing the directory tree with <a class="reference external" href="https://www.compart.com/en/unicode/block/U+2500">Box Drawing</a> characters.</p> <p>Let's take a look at some output from <tt class="docutils literal">tree</tt>:</p> <pre class="literal-block"> . ├── alloc.rs ├── ascii.rs ├── os │&nbsp;&nbsp; ├── wasi │&nbsp;&nbsp; │&nbsp;&nbsp; ├── ffi.rs │&nbsp;&nbsp; │&nbsp;&nbsp; ├── mod.rs ➊ │&nbsp;&nbsp; │&nbsp;&nbsp; └── net ➋ │&nbsp;&nbsp; │&nbsp;&nbsp; └── mod.rs │&nbsp;&nbsp; └── windows │&nbsp;&nbsp; ├── ffi.rs ➌ │&nbsp;&nbsp; ├── fs.rs │&nbsp;&nbsp; ├── io │&nbsp;&nbsp; │&nbsp;&nbsp; └── tests.rs │&nbsp;&nbsp; ├── mod.rs │&nbsp;&nbsp; └── thread.rs ├── personality │&nbsp;&nbsp; ├── dwarf │&nbsp;&nbsp; │&nbsp;&nbsp; ├── eh.rs │&nbsp;&nbsp; │&nbsp;&nbsp; ├── mod.rs │&nbsp;&nbsp; │&nbsp;&nbsp; └── tests.rs │&nbsp;&nbsp; ├── emcc.rs │&nbsp;&nbsp; └── gcc.rs └── personality.rs </pre> <p>The first thing that we notice is that most entries at any level, such as ➊, are preceded by <tt class="docutils literal">├──</tt>, while the last entry, ➋, is preceded by <tt class="docutils literal">└──</tt>. This <a class="reference external" href="https://realpython.com/directory-tree-generator-python/">article</a> about building a directory tree generator in Python calls them the <em>tee</em> and <em>elbow</em> connectors, and I'm going to use that terminology.</p> <p>The second thing we notice is that there are multiple <em>prefixes</em> before the connectors, either <tt class="docutils literal">│&nbsp;&nbsp;</tt>&nbsp;(<em>pipe</em>) or <tt class="docutils literal">&nbsp;&nbsp; </tt>&nbsp;(<em>space</em>), one prefix for each level. The rule is that children of a last entry, such as <tt class="docutils literal">os/windows</tt> ➌, get the space prefix, while children of other entries, such as <tt class="docutils literal">os/wasi</tt> or <tt class="docutils literal">personality</tt>, get the pipe prefix.</p> <p>For both connectors and prefixes, the last entry at a particular level gets special treatment.</p> <div class="section" id="the-print-tree-function"> <h3>The <tt class="docutils literal">print_tree</tt> function</h3> <p>A classic technique with recursion is to create a pair of functions: an outer public function that calls a private helper function with the initial set of parameters to visit recursively.</p> <p>Our <tt class="docutils literal">print_tree</tt> function uses an inner <tt class="docutils literal">visit</tt> function to recursively do almost all of the work.</p> <pre class="code rust literal-block"> <span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">print_tree</span><span class="p">(</span><span class="n">root</span>: <span class="kp">&amp;</span><span class="kt">str</span><span class="p">,</span><span class="w"> </span><span class="n">dir</span>: <span class="kp">&amp;</span><span class="nc">Directory</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">OTHER_CHILD</span>: <span class="kp">&amp;</span><span class="kt">str</span> <span class="o">=</span><span class="w"> </span><span class="s">&quot;│ &quot;</span><span class="p">;</span><span class="w"> </span><span class="c1">// prefix: pipe </span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">OTHER_ENTRY</span>: <span class="kp">&amp;</span><span class="kt">str</span> <span class="o">=</span><span class="w"> </span><span class="s">&quot;├── &quot;</span><span class="p">;</span><span class="w"> </span><span class="c1">// connector: tee </span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">FINAL_CHILD</span>: <span class="kp">&amp;</span><span class="kt">str</span> <span class="o">=</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">;</span><span class="w"> </span><span class="c1">// prefix: no more siblings </span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">FINAL_ENTRY</span>: <span class="kp">&amp;</span><span class="kt">str</span> <span class="o">=</span><span class="w"> </span><span class="s">&quot;└── &quot;</span><span class="p">;</span><span class="w"> </span><span class="c1">// connector: elbow </span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">&quot;{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">root</span><span class="p">);</span><span class="w"> </span><span class="err">➊</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="w"> </span><span class="n">f</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">visit</span><span class="p">(</span><span class="n">dir</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">);</span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">{} directories, {} files&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="p">,</span><span class="w"> </span><span class="n">f</span><span class="p">);</span><span class="w"> </span><span class="k">fn</span> <span class="nf">visit</span><span class="p">(</span><span class="n">node</span>: <span class="kp">&amp;</span><span class="nc">Directory</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span>: <span class="kp">&amp;</span><span class="kt">str</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">➋</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">dirs</span>: <span class="kt">usize</span> <span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// counting this directory ➌ </span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">files</span>: <span class="kt">usize</span> <span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">count</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">node</span><span class="p">.</span><span class="n">entries</span><span class="p">.</span><span class="n">len</span><span class="p">();</span><span class="w"> </span><span class="err">➍</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">entry</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="o">&amp;</span><span class="n">node</span><span class="p">.</span><span class="n">entries</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">count</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">connector</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">count</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">FINAL_ENTRY</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">OTHER_ENTRY</span><span class="w"> </span><span class="p">};</span><span class="w"> </span><span class="err">➎</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">entry</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">FileTree</span>::<span class="n">DirNode</span><span class="p">(</span><span class="n">sub_dir</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">➏</span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">&quot;{}{}{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span><span class="p">,</span><span class="w"> </span><span class="n">connector</span><span class="p">,</span><span class="w"> </span><span class="n">sub_dir</span><span class="p">.</span><span class="n">name</span><span class="p">);</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">new_prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">format!</span><span class="p">(</span><span class="w"> </span><span class="err">➐</span><span class="w"> </span><span class="s">&quot;{}{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span><span class="p">,</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">count</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">FINAL_CHILD</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">OTHER_CHILD</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">);</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="w"> </span><span class="n">f</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">visit</span><span class="p">(</span><span class="o">&amp;</span><span class="n">sub_dir</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">new_prefix</span><span class="p">);</span><span class="w"> </span><span class="err">➑</span><span class="w"> </span><span class="n">dirs</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">d</span><span class="p">;</span><span class="w"> </span><span class="n">files</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">f</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">FileTree</span>::<span class="n">LinkNode</span><span class="p">(</span><span class="n">symlink</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="w"> </span><span class="s">&quot;{}{}{} -&gt; {}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span><span class="p">,</span><span class="w"> </span><span class="n">connector</span><span class="p">,</span><span class="w"> </span><span class="n">symlink</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">symlink</span><span class="p">.</span><span class="n">target</span><span class="p">);</span><span class="w"> </span><span class="n">files</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">FileTree</span>::<span class="n">FileNode</span><span class="p">(</span><span class="n">file</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">&quot;{}{}{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span><span class="p">,</span><span class="w"> </span><span class="n">connector</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">name</span><span class="p">);</span><span class="w"> </span><span class="n">files</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">(</span><span class="n">dirs</span><span class="p">,</span><span class="w"> </span><span class="n">files</span><span class="p">)</span><span class="w"> </span><span class="err">➒</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span> </pre> <ol class="arabic simple"> <li>The outer function, <tt class="docutils literal">print_tree</tt>, simply prints the name of the root node on a line by itself; calls the inner <tt class="docutils literal">visit</tt> function with the <tt class="docutils literal">dir</tt> node and an empty prefix; and finally prints the number of directories and files visited. This is for compatibility with the output of <tt class="docutils literal">tree</tt>.</li> <li>The inner <tt class="docutils literal">visit</tt> function takes two parameters: <tt class="docutils literal">node</tt>, a <tt class="docutils literal">Directory</tt>, and <tt class="docutils literal">prefix</tt>, a string which is initially empty.</li> <li>Keep track of the number of <tt class="docutils literal">dirs</tt> and <tt class="docutils literal">files</tt> seen at this level and in sub-directories.</li> <li>We count downwards from the number of entries in this directory to zero. When <tt class="docutils literal">count</tt> is zero, we are on the last entry, which gets special treatment.</li> <li>Compute the connector, <tt class="docutils literal">└──</tt> (<em>elbow</em>) for the last entry; <tt class="docutils literal">├──</tt> (<em>tee</em>) otherwise.</li> <li>Match the <tt class="docutils literal"><span class="pre">FileTree::DirNode</span></tt> variant and <a class="reference external" href="https://doc.rust-lang.org/reference/patterns.html#destructuring">destructure</a> the value into <tt class="docutils literal">sub_dir</tt>, a <tt class="docutils literal">&amp;Directory</tt>.</li> <li>Before recursively visiting a sub-directory, we compute a new prefix, by appending the appropriate sub-prefix to the current prefix. If there are further entries (<tt class="docutils literal">count &gt; 0</tt>), the sub-prefix for the current level is <tt class="docutils literal">│&nbsp;&nbsp;</tt>&nbsp;(<em>pipe</em>); otherwise, it's <tt class="docutils literal">&nbsp;&nbsp; </tt>&nbsp;(<em>spaces</em>).</li> <li>Call <tt class="docutils literal">visit</tt> recursively, then add to the running totals of <tt class="docutils literal">dirs</tt> and <tt class="docutils literal">files</tt>.</li> <li><tt class="docutils literal">visit</tt> returns a tuple of the counts of directories and files that were recursively visited.</li> </ol> <p>One subtlety that is not obvious from the above is that <tt class="docutils literal">OTHER_CHILD</tt> actually contains two <a class="reference external" href="https://en.wikipedia.org/wiki/Non-breaking_space">non-breaking spaces</a>:</p> <pre class="code rust literal-block"> <span class="k">const</span><span class="w"> </span><span class="n">OTHER_CHILD</span>: <span class="kp">&amp;</span><span class="kt">str</span> <span class="o">=</span><span class="w"> </span><span class="s">&quot;│</span><span class="se">\u{00A0}\u{00A0}</span><span class="s"> &quot;</span><span class="p">;</span><span class="w"> </span><span class="c1">// prefix: pipe</span> </pre> <p>This is for compatibility with the output of <tt class="docutils literal">tree</tt>:</p> <pre class="code bash literal-block"> $ diff &lt;<span class="o">(</span>cargo run -q -- ./tests<span class="o">)</span> &lt;<span class="o">(</span>tree ./tests<span class="o">)</span> <span class="o">&amp;&amp;</span> <span class="nb">echo</span> <span class="s2">&quot;no difference&quot;</span> no difference </pre> <p>Using <a class="reference external" href="https://www.georgevreilly.com/blog/2022/01/31/DiffFileFragment.html">process substitution</a> to generate two different inputs for <tt class="docutils literal">diff</tt>.</p> </div> <div class="section" id="the-main-function"> <h3>The <tt class="docutils literal">main</tt> function</h3> <p>Let's tie it all together.</p> <pre class="code rust literal-block"> <span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span>-&gt; <span class="nc">io</span>::<span class="nb">Result</span><span class="o">&lt;</span><span class="p">()</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">root</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">env</span>::<span class="n">args</span><span class="p">().</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="n">unwrap_or</span><span class="p">(</span><span class="s">&quot;.&quot;</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span><span class="w"> </span><span class="err">➊</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">dir</span>: <span class="nc">Directory</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dir_walk</span><span class="p">(</span><span class="w"> </span><span class="err">➋</span><span class="w"> </span><span class="o">&amp;</span><span class="n">PathBuf</span>::<span class="n">from</span><span class="p">(</span><span class="n">root</span><span class="p">.</span><span class="n">clone</span><span class="p">()),</span><span class="w"> </span><span class="err">➌</span><span class="w"> </span><span class="n">is_not_hidden</span><span class="p">,</span><span class="w"> </span><span class="n">sort_by_name</span><span class="p">)</span><span class="o">?</span><span class="p">;</span><span class="w"> </span><span class="err">➍</span><span class="w"> </span><span class="n">print_tree</span><span class="p">(</span><span class="o">&amp;</span><span class="n">root</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">dir</span><span class="p">);</span><span class="w"> </span><span class="err">➎</span><span class="w"> </span><span class="nb">Ok</span><span class="p">(())</span><span class="w"> </span><span class="err">➏</span><span class="w"> </span><span class="p">}</span> </pre> <ol class="arabic simple"> <li>The simplest possible way to get a single, optional command-line argument. If omitted, we default to <tt class="docutils literal">.</tt>, the current directory. For more sophisticated argument parsing, we could use <a class="reference external" href="https://docs.rs/clap/latest/clap/">Clap</a>.</li> <li>Use <tt class="docutils literal">dir_walk</tt> from <a class="reference external" href="https://www.georgevreilly.com/blog/2023/01/23/TreeInRust1WalkDirectories.html">Part 1</a> to recursively build a directory of <tt class="docutils literal">FileTree</tt> nodes.</li> <li>Create a <tt class="docutils literal">PathBuf</tt> from <tt class="docutils literal">root</tt>, a string; <tt class="docutils literal">clone</tt> is needed because <tt class="docutils literal"><span class="pre">PathBuf::from</span></tt> takes ownership of the string buffer. Use the <tt class="docutils literal">is_not_hidden</tt> filter and the <tt class="docutils literal">sort_by_name</tt> comparator from <a class="reference external" href="https://www.georgevreilly.com/blog/2023/01/23/TreeInRust1WalkDirectories.html">Part 1</a>.</li> <li>The <a class="reference external" href="https://doc.rust-lang.org/reference/expressions/operator-expr.html#the-question-mark-operator">postfix question mark operator</a>, <tt class="docutils literal">?</tt>, is used to propagate errors.</li> <li>Let <tt class="docutils literal">print_tree</tt> draw the diagram.</li> <li>Return the <tt class="docutils literal">Ok</tt> <a class="reference external" href="https://doc.rust-lang.org/std/primitive.unit.html">unit</a> result to indicate success.</li> </ol> </div> <div class="section" id="baum"> <h3>Baum</h3> <p>You can find the <a class="reference external" href="https://github.com/georgevreilly/baum">Baum</a> source code on GitHub.</p> <p>In Part 3, we'll discuss testing.</p> </div> <div class="section" id="resources"> <h3>Resources</h3> <ul class="simple"> <li><a class="reference external" href="https://github.com/Old-Man-Programmer/tree/">Official tree source</a>: The actual source for <tt class="docutils literal">tree</tt>, written in old-school C.</li> <li><a class="reference external" href="https://two-wrongs.com/draw-a-tree-structure-with-only-css.html">Draw a Tree Structure With Only CSS</a>: Use CSS to draw links in nested, unordered lists.</li> <li><a class="reference external" href="https://realpython.com/directory-tree-generator-python/">Build a Python Directory Tree Generator for the Command Line</a>.</li> <li>Kevin Newton has implemented <a class="reference external" href="https://github.com/kddnewton/tree">Tree in Multiple Languages</a>.</li> <li><a class="reference external" href="https://github.com/dduan/tre">Tre</a> is a modern alternative to <tt class="docutils literal">tree</tt> in Rust.</li> </ul> </div> Tue, 24 Jan 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-01-24:/blog/2023/01/24/TreeInRust2PrintingTrees.html Implementing the Tree command in Rust, part 1: Walking Directories https://www.georgevreilly.com/blog/2023/01/23/TreeInRust1WalkDirectories.html <img alt="tree tree core/src/num for Rust" class="right-float" src="https://www.georgevreilly.com/content/binary/rust-core-src-num-tree.png" style="width: 160px;"/> <p>I've been learning Rust lately. I started by reading several books, including <a class="reference external" href="https://www.manning.com/books/rust-in-action">Rust in Action</a>, <a class="reference external" href="https://www.manning.com/books/code-like-a-pro-in-rust">Code Like a Pro in Rust</a>, and most of <a class="reference external" href="https://learning.oreilly.com/library/view/programming-rust-2nd/9781492052586/">Programming Rust</a>. Now, I'm starting to actually write code.</p> <p>I read the <a class="reference external" href="https://www.goodreads.com/review/show/5183138397">Command-Line Rust</a> book last month, which challenged readers to write our own implementations of the <a class="reference external" href="https://en.wikipedia.org/wiki/Tree_(command)">tree command</a>.</p> <p>I decided to accept the challenge.</p> <p>At its simplest, <tt class="docutils literal">tree</tt> simply prints a directory tree, using some of the Unicode <a class="reference external" href="https://www.compart.com/en/unicode/block/U+2500">Box Drawing</a> characters to show the hierarchical relationship, as in the image at right.</p> <p>I've split the code into two phases, which will be covered in two blog posts.</p> <ol class="arabic simple"> <li>Walking the directory tree on disk to build an in-memory tree.</li> <li>Pretty-printing the in-memory tree.</li> </ol> <p>While it's certainly possible to print a subtree as it's being read, separating the two phases yields code that is cleaner, simpler, and more testable.</p> <p>In future, I will insert a third phase, <em>processing</em>, between the reading and writing phases, by a weak analogy with Extract-Transform-Load (<a class="reference external" href="https://en.wikipedia.org/wiki/Extract,_transform,_load">ETL</a>).</p> <div class="section" id="walking-the-directory-tree"> <h3>Walking the Directory Tree</h3> <p>There are three kinds of file tree node that I care about: <tt class="docutils literal">File</tt>, <tt class="docutils literal">Directory</tt>, and <tt class="docutils literal">Symlink</tt>. These are the variants exposed by Rust's <a class="reference external" href="https://doc.rust-lang.org/std/fs/struct.FileType.html">FileType</a>.</p> <ul class="simple"> <li><tt class="docutils literal">File</tt> has a name and file system metadata;</li> <li><tt class="docutils literal">Symlink</tt> has a name, a target, and metadata;</li> <li><tt class="docutils literal">Directory</tt> has a name and a list of child file tree nodes.</li> </ul> <p>Here, <em>name</em> refers to the last component of a path; e.g., the <tt class="docutils literal">gamma</tt> in <tt class="docutils literal">alpha/beta/gamma</tt>. The file system metadata is not currently used, but will be in future.</p> <pre class="code rust literal-block"> <span class="cp">#[derive(Debug)]</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">File</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">metadata</span>: <span class="nc">fs</span>::<span class="n">Metadata</span><span class="p">,</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="cp">#[derive(Debug)]</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Symlink</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">target</span>: <span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">metadata</span>: <span class="nc">fs</span>::<span class="n">Metadata</span><span class="p">,</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="cp">#[derive(Debug)]</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Directory</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">entries</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">FileTree</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="p">}</span> </pre> <p>File and directory paths are not guaranteed to be UTF-8. Indeed, Unix file paths are an arbitrary sequence of bytes, while Windows file paths are an opaque sequence of 16-bit integers. You might think that I should be using <tt class="docutils literal">OsString</tt> here, since it holds a <a class="reference external" href="https://doc.rust-lang.org/std/ffi/struct.OsString.html">platform-native string</a>. <tt class="docutils literal">String</tt> has to be valid UTF-8; <tt class="docutils literal">OsString</tt> doesn't. Unfortunately, it's not easy to look at the actual data in an <tt class="docutils literal">OsString</tt>, unless you convert it (possibly lossily) to a <tt class="docutils literal">String</tt>. See <a class="reference external" href="https://docs.rs/bstr/0.2.8/bstr/#file-paths-and-os-strings">File paths and OS strings</a> for more.</p> <p>The obvious way to represent a file tree node in Rust is as an <a class="reference external" href="https://hashrust.com/blog/why-rust-enums-are-so-cool/">enum</a> with three tuple-like variants.</p> <pre class="code rust literal-block"> <span class="cp">#[derive(Debug)]</span><span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">FileTree</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">DirNode</span><span class="p">(</span><span class="n">Directory</span><span class="p">),</span><span class="w"> </span><span class="n">FileNode</span><span class="p">(</span><span class="n">File</span><span class="p">),</span><span class="w"> </span><span class="n">LinkNode</span><span class="p">(</span><span class="n">Symlink</span><span class="p">),</span><span class="w"> </span><span class="p">}</span> </pre> <p>Here, each variant in the enum holds a struct of similar name. We will be able to take advantage of Rust's <a class="reference external" href="https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html#destructuring-enums">pattern matching</a> to handle each variant.</p> <p>We'll use <tt class="docutils literal"><span class="pre">fs::read_dir</span></tt> to read each directory in the hierarchy. The <a class="reference external" href="https://doc.rust-lang.org/std/fs/struct.ReadDir.html">read_dir</a> function returns an iterator that yields instances of <tt class="docutils literal"><span class="pre">io::Result&lt;DirEntry&gt;</span></tt>. If a <tt class="docutils literal">DirEntry</tt> is a directory, we can recursively invoke our <tt class="docutils literal">dir_walk</tt> function to read the child directory and add its contents to our in-memory tree.</p> <p>The <a class="reference external" href="https://docs.rs/walkdir/latest/walkdir/">walkdir</a> crate also walks through a directory tree, but it hides the recursion from you. It's an excellent choice otherwise.</p> <div class="section" id="skipping-and-sorting"> <h4>Skipping and Sorting</h4> <p>In each directory that we read, we need to consider two factors.</p> <ol class="arabic simple"> <li>Which entries to skip, such as hidden files.</li> <li>How to sort the entries.</li> </ol> <p>We almost always want to skip <a class="reference external" href="https://en.wikipedia.org/wiki/Hidden_file_and_hidden_directory">hidden files and directories</a>—on Unix, those entries whose names start with the <tt class="docutils literal">.</tt> character. Every directory includes entries for <tt class="docutils literal">.</tt> (itself) and <tt class="docutils literal">..</tt> (parent directory), and may include other hidden files or directories, such as <tt class="docutils literal">.vimrc</tt> or <tt class="docutils literal">.git</tt>.</p> <p>On Windows, hidden files are controlled by an <a class="reference external" href="https://www.raymond.cc/blog/reset-system-and-hidden-attributes-for-files-or-folders-caused-by-virus/">attribute</a>, not by their name.</p> <p>For more complicated usage, we might want to skip <a class="reference external" href="https://git-scm.com/docs/gitignore">ignored files</a>, as specified in <tt class="docutils literal">.gitignore</tt>.</p> <p>The simplest useful filter for entry names is one that rejects hidden files and directories.</p> <pre class="code rust literal-block"> <span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">is_not_hidden</span><span class="p">(</span><span class="n">name</span>: <span class="kp">&amp;</span><span class="kt">str</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="kt">bool</span> <span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">!</span><span class="n">name</span><span class="p">.</span><span class="n">starts_with</span><span class="p">(</span><span class="sc">'.'</span><span class="p">);</span><span class="w"> </span><span class="p">}</span> </pre> <p>Disk I/O is <a class="reference external" href="https://louwrentius.com/understanding-storage-performance-iops-and-latency.html">costly and slow</a>, compared to memory access. It's far more efficient to not read a directory at all than it is to eliminate a subtree at a later stage. Even if the OS has cached the relevant directory contents, there's still a <a class="reference external" href="https://gms.tf/on-the-costs-of-syscalls.html">cost to the syscall</a> to retrieve that data from the kernel.</p> <p>There is <a class="reference external" href="https://stackoverflow.com/a/8977490/6364">no specific order</a> to entries in a directory or to the results returned by low-level APIs like <tt class="docutils literal"><span class="pre">fs::read_dir</span></tt>. By default, <tt class="docutils literal">ls</tt> sorts entries alphabetically, but it can also sort by creation time, modification time, or size, in ascending or descending order.</p> <p>Unix filesystems are case-sensitive, but Mac filesystems (APFS and HFS+) are case-insensitive by default, although they preserve the case of the original filename. Windows' filesystems (NTFS, exFAT, and FAT32) are <a class="reference external" href="https://learn.microsoft.com/en-us/windows/win32/fileio/filesystem-functionality-comparison">likewise</a> case-preserving and case-insensitive.</p> <p>Here is a case-sensitive <a class="reference external" href="https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort_by">comparator</a> for use with <tt class="docutils literal">sort_by</tt>:</p> <pre class="code rust literal-block"> <span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">sort_by_name</span><span class="p">(</span><span class="n">a</span>: <span class="kp">&amp;</span><span class="nc">fs</span>::<span class="n">DirEntry</span><span class="p">,</span><span class="w"> </span><span class="n">b</span>: <span class="kp">&amp;</span><span class="nc">fs</span>::<span class="n">DirEntry</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nc">Ordering</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">a_name</span>: <span class="nb">String</span> <span class="o">=</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">path</span><span class="p">().</span><span class="n">file_name</span><span class="p">().</span><span class="n">unwrap</span><span class="p">().</span><span class="n">to_str</span><span class="p">().</span><span class="n">unwrap</span><span class="p">().</span><span class="n">into</span><span class="p">();</span><span class="w"> </span><span class="err">➊</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">b_name</span>: <span class="nb">String</span> <span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">path</span><span class="p">().</span><span class="n">file_name</span><span class="p">().</span><span class="n">unwrap</span><span class="p">().</span><span class="n">to_str</span><span class="p">().</span><span class="n">unwrap</span><span class="p">().</span><span class="n">into</span><span class="p">();</span><span class="w"> </span><span class="n">a_name</span><span class="p">.</span><span class="n">cmp</span><span class="p">(</span><span class="o">&amp;</span><span class="n">b_name</span><span class="p">)</span><span class="w"> </span><span class="err">➋</span><span class="w"> </span><span class="p">}</span> </pre> <ol class="arabic simple"> <li>This messy expression is necessary to get the <em>name</em> as a <tt class="docutils literal">String</tt>.</li> <li><tt class="docutils literal">cmp</tt> returns <tt class="docutils literal">Less</tt>, <tt class="docutils literal">Equal</tt>, or <tt class="docutils literal">Greater</tt> from the <tt class="docutils literal">Ordering</tt> enum.</li> </ol> <p>More on <tt class="docutils literal">Ordering</tt> <a class="reference external" href="https://www.philipdaniels.com/blog/2019/rust-equality-and-ordering/">here</a>.</p> </div> </div> <div class="section" id="the-dir-walk-function"> <h3>The <tt class="docutils literal">dir_walk</tt> function</h3> <p>Finally, the recursive <tt class="docutils literal">dir_walk</tt> function that creates the tree of <tt class="docutils literal">FileTree</tt> nodes.</p> <pre class="code rust literal-block"> <span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">dir_walk</span><span class="p">(</span><span class="w"> </span><span class="n">root</span>: <span class="kp">&amp;</span><span class="nc">PathBuf</span><span class="p">,</span><span class="w"> </span><span class="n">filter</span>: <span class="nc">fn</span><span class="p">(</span><span class="n">name</span>: <span class="kp">&amp;</span><span class="kt">str</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="err">➊</span><span class="w"> </span><span class="n">compare</span>: <span class="nc">fn</span><span class="p">(</span><span class="n">a</span>: <span class="kp">&amp;</span><span class="nc">fs</span>::<span class="n">DirEntry</span><span class="p">,</span><span class="w"> </span><span class="n">b</span>: <span class="kp">&amp;</span><span class="nc">fs</span>::<span class="n">DirEntry</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nc">Ordering</span><span class="p">,</span><span class="w"> </span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nc">io</span>::<span class="nb">Result</span><span class="o">&lt;</span><span class="n">Directory</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">entries</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">fs</span>::<span class="n">DirEntry</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fs</span>::<span class="n">read_dir</span><span class="p">(</span><span class="n">root</span><span class="p">)</span><span class="o">?</span><span class="w"> </span><span class="p">.</span><span class="n">filter_map</span><span class="p">(</span><span class="o">|</span><span class="n">result</span><span class="o">|</span><span class="w"> </span><span class="n">result</span><span class="p">.</span><span class="n">ok</span><span class="p">())</span><span class="w"> </span><span class="p">.</span><span class="n">collect</span><span class="p">();</span><span class="w"> </span><span class="err">➋</span><span class="w"> </span><span class="n">entries</span><span class="p">.</span><span class="n">sort_by</span><span class="p">(</span><span class="n">compare</span><span class="p">);</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">directory</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">FileTree</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Vec</span>::<span class="n">with_capacity</span><span class="p">(</span><span class="n">entries</span><span class="p">.</span><span class="n">len</span><span class="p">());</span><span class="w"> </span><span class="err">➌</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="n">path</span><span class="p">();</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">name</span>: <span class="nb">String</span> <span class="o">=</span><span class="w"> </span><span class="n">path</span><span class="p">.</span><span class="n">file_name</span><span class="p">().</span><span class="n">unwrap</span><span class="p">().</span><span class="n">to_str</span><span class="p">().</span><span class="n">unwrap</span><span class="p">().</span><span class="n">into</span><span class="p">();</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">filter</span><span class="p">(</span><span class="o">&amp;</span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">➍</span><span class="w"> </span><span class="k">continue</span><span class="p">;</span><span class="w"> </span><span class="p">};</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">metadata</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fs</span>::<span class="n">metadata</span><span class="p">(</span><span class="o">&amp;</span><span class="n">path</span><span class="p">).</span><span class="n">unwrap</span><span class="p">();</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">node</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">path</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">➎</span><span class="w"> </span><span class="n">path</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">path</span><span class="p">.</span><span class="n">is_dir</span><span class="p">()</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">FileTree</span>::<span class="n">DirNode</span><span class="p">(</span><span class="w"> </span><span class="err">➏</span><span class="w"> </span><span class="n">dir_walk</span><span class="p">(</span><span class="o">&amp;</span><span class="n">root</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">name</span><span class="p">),</span><span class="w"> </span><span class="n">filter</span><span class="p">,</span><span class="w"> </span><span class="n">compare</span><span class="p">)</span><span class="o">?</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">path</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">path</span><span class="p">.</span><span class="n">is_symlink</span><span class="p">()</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">FileTree</span>::<span class="n">LinkNode</span><span class="p">(</span><span class="n">Symlink</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">name</span>: <span class="nc">name</span><span class="p">.</span><span class="n">into</span><span class="p">(),</span><span class="w"> </span><span class="n">target</span>: <span class="nc">fs</span>::<span class="n">read_link</span><span class="p">(</span><span class="n">path</span><span class="p">).</span><span class="n">unwrap</span><span class="p">().</span><span class="n">to_string_lossy</span><span class="p">().</span><span class="n">to_string</span><span class="p">(),</span><span class="w"> </span><span class="n">metadata</span>: <span class="nc">metadata</span><span class="p">,</span><span class="w"> </span><span class="p">}),</span><span class="w"> </span><span class="n">path</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">path</span><span class="p">.</span><span class="n">is_file</span><span class="p">()</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">FileTree</span>::<span class="n">FileNode</span><span class="p">(</span><span class="n">File</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">name</span>: <span class="nc">name</span><span class="p">.</span><span class="n">into</span><span class="p">(),</span><span class="w"> </span><span class="n">metadata</span>: <span class="nc">metadata</span><span class="p">,</span><span class="w"> </span><span class="p">}),</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="fm">unreachable!</span><span class="p">(),</span><span class="w"> </span><span class="p">};</span><span class="w"> </span><span class="n">directory</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">node</span><span class="p">);</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">root</span><span class="w"> </span><span class="p">.</span><span class="n">file_name</span><span class="p">()</span><span class="w"> </span><span class="p">.</span><span class="n">unwrap_or</span><span class="p">(</span><span class="n">OsStr</span>::<span class="n">new</span><span class="p">(</span><span class="s">&quot;.&quot;</span><span class="p">))</span><span class="w"> </span><span class="err">➐</span><span class="w"> </span><span class="p">.</span><span class="n">to_str</span><span class="p">()</span><span class="w"> </span><span class="p">.</span><span class="n">unwrap</span><span class="p">()</span><span class="w"> </span><span class="p">.</span><span class="n">into</span><span class="p">();</span><span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">Directory</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="err">➑</span><span class="w"> </span><span class="n">name</span>: <span class="nc">name</span><span class="p">,</span><span class="w"> </span><span class="n">entries</span>: <span class="nc">directory</span><span class="p">,</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">}</span> </pre> <ol class="arabic simple"> <li>Currently, the <tt class="docutils literal">filter</tt> and <tt class="docutils literal">compare</tt> parameters are <tt class="docutils literal">fn</tt>s. They could probably be <tt class="docutils literal">FnMut</tt> traits.</li> <li>Read directory. Discard any <tt class="docutils literal">Error</tt> results. Collect into a <tt class="docutils literal">Vec</tt>.</li> <li>We'll need at most this many entries.</li> <li>Use <tt class="docutils literal">filter</tt> to discard names that won't be visited.</li> <li>Match the path as a <tt class="docutils literal">DirNode</tt>, <tt class="docutils literal">LinkNode</tt>, or <tt class="docutils literal">FileNode</tt>, by using <a class="reference external" href="https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html#extra-conditionals-with-match-guards">match guards</a>.</li> <li>Visit the subdirectory recursively.</li> <li>If <tt class="docutils literal">root</tt> was <tt class="docutils literal">&quot;.&quot;</tt>, the <tt class="docutils literal">file_name()</tt> will be <tt class="docutils literal">None</tt>.</li> <li>Return a <tt class="docutils literal">Directory</tt> for this directory, wrapped in an <tt class="docutils literal"><span class="pre">io::Result</span></tt>.</li> </ol> <p>In <a class="reference external" href="https://www.georgevreilly.com/blog/2023/01/24/TreeInRust2PrintingTrees.html">Part 2</a>, we'll print the directory tree.</p> </div> Mon, 23 Jan 2023 07:00:00 GMT tag:www.georgevreilly.com,2023-01-23:/blog/2023/01/23/TreeInRust1WalkDirectories.html fsymbols for Unicode weirdness https://www.georgevreilly.com/blog/2022/12/31/FSymbolsForUnicodeWeirdness.html <p>My display name on Twitter currently looks like @ɢᴇᴏʀɢᴇᴠʀᴇɪʟʟʏ@ᴛᴇᴄʜ.ʟɢʙᴛ, an attempt to route around Twitter's apparent censorship of Mastodon information.</p> <p>I used the <a href="https://fsymbols.com/generators/">FSymbols Generators</a> to produce several variants.</p> <div class="codehilite"><pre><span></span>@𝕘𝕖𝕠𝕣𝕘𝕖𝕧𝕣𝕖𝕚𝕝𝕝𝕪@𝕥𝕖𝕔𝕙.𝕝𝕘𝕓𝕥 ʇqƃʅ.ɥɔǝʇ@ʎʅʅᴉǝɹʌǝƃɹoǝƃ@ @𝗀𝖾𝗈𝗋𝗀𝖾𝗏𝗋𝖾𝗂𝗅𝗅𝗒@𝗍𝖾𝖼𝗁.𝗅𝗀𝖻𝗍 @𝘨𝘦𝘰𝘳𝘨𝘦𝘷𝘳𝘦𝘪𝘭𝘭𝘺@𝘵𝘦𝘤𝘩.𝘭𝘨𝘣𝘵 @𝑔𝑒𝑜𝑟𝑔𝑒𝑣𝑟𝑒𝑖𝑙𝑙𝑦@𝑡𝑒𝑐ℎ.𝑙𝑔𝑏𝑡 @𝙜𝙚𝙤𝙧𝙜𝙚𝙫𝙧𝙚𝙞𝙡𝙡𝙮@𝙩𝙚𝙘𝙝.𝙡𝙜𝙗𝙩 @𝚐𝚎𝚘𝚛𝚐𝚎𝚟𝚛𝚎𝚒𝚕𝚕𝚢@𝚝𝚎𝚌𝚑.𝚕𝚐𝚋𝚝 @𝔤𝔢𝔬𝔯𝔤𝔢𝔳𝔯𝔢𝔦𝔩𝔩𝔶@𝔱𝔢𝔠𝔥.𝔩𝔤𝔟𝔱 </pre></div> <p>Many of these variants come from <a href="https://www.compart.com/en/unicode/block/U+1D400">Unicode Block "Mathematical Alphanumeric Symbols"</a>.</p> <p>There are a lot more things you can do with Unicode than just <a href="https://www.georgevreilly.com/blog/2016/02/12/UnicodeUpsideDownMappingPart2.html">upside-down text</a>.</p> Sun, 01 Jan 2023 03:35:00 GMT tag:www.georgevreilly.com,2022-12-31:/blog/2022/12/31/FSymbolsForUnicodeWeirdness.html Backwards Ranges in Python https://www.georgevreilly.com/blog/2022/12/19/BackwardsRangesInPython.html <p>In Python, if you want to specify a sequence of numbers from <code>a</code> up to (but excluding) <code>b</code>, you can write <code>range(a, b)</code>. This generates the sequence <code>a, a+1, a+2, ..., b-1</code>. You start at <code>a</code> and keep going until the next number would be <code>b</code>.</p> <p>In Python 3, <code>range</code> is <em>lazy</em> and the values in the sequence do not materialize until you consume the range.</p> <div class="codehilite"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">12</span><span class="p">)</span> <span class="go">range(3, 12)</span> <span class="gp">&gt;&gt;&gt; </span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">12</span><span class="p">))</span> <span class="go">[3, 4, 5, 6, 7, 8, 9, 10, 11]</span> </pre></div> <p>Trey Hunner makes the point that <a href="https://treyhunner.com/2018/02/python-range-is-not-an-iterator/">range is a lazy iterable</a> rather than an iterator.</p> <p>You can also <em>step</em> by an increment other than one: <code>range(a, b, s)</code>. This generates <code>a, a+s, a+2*s, ..., b-s</code> (assuming that <code>(b - a) % s == 0</code>; i.e., <code>a</code> and <code>b</code> are separated by an exact multiple of <code>s</code>.)</p> <div class="codehilite"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span> <span class="go">[3, 6, 9]</span> </pre></div> <p>What if you want to count down? <code>range(b, a, -s)</code> won't do what you want.</p> <div class="codehilite"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">))</span> <span class="go">[12, 9, 6]</span> </pre></div> <p>Why? Because you're starting at <code>b</code>, a value that doesn't appear in the forward range, and you're ending before you reach <code>a</code>, a value that is certainly in the forward range. You have to subtract <code>s</code> from both <code>b</code> and <code>a</code>:</p> <p>When you use <code>range(b-s, a-s, -s)</code>, you get <code>b-s, b-2*s, ..., a+s, a</code>.</p> <div class="codehilite"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">12</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">))</span> <span class="go">[9, 6, 3]</span> <span class="gp">&gt;&gt;&gt; </span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">12</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">)),</span> <span class="nb">list</span><span class="p">(</span><span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">3</span><span class="p">)))</span> <span class="go">([9, 6, 3], [9, 6, 3])</span> </pre></div> Mon, 19 Dec 2022 22:20:00 GMT tag:www.georgevreilly.com,2022-12-19:/blog/2022/12/19/BackwardsRangesInPython.html Ulysses at 100 https://www.georgevreilly.com/blog/2022/02/02/UlyssesAt100.html <a class="reference external image-reference" href="https://www.irishtimes.com/news/ireland/irish-news/an-post-launches-new-stamps-to-celebrate-centenary-of-ulysses-1.4787040"><img alt="new stamps celebrating the centenary of Ulysses" src="https://www.irishtimes.com/polopoly_fs/1.4787039.1643276940!/image/image.jpg_gen/derivatives/box_620_330/image.jpg"/></a> <p>On 2nd February 1882, in the Dublin suburb of Rathgar, a son was given unto John and May Joyce. James Joyce celebrated his 40th birthday in Paris on 2nd February 1922 by receiving the first printed copy of his novel <em>Ulysses</em>. Parts of it had already been published in literary magazines and the book was eagerly received by the cognoscenti. It took more than a decade for <em>Ulysses</em> to be published in Britain and the United States. Censors had considered the book obscene, but the courts established that it had legitimate literary merit.</p> <p>For decades, <em>Ulysses</em> was poorly received in Ireland. The book was considered <a class="reference external" href="https://www.irishtimes.com/news/ireland/irish-news/the-year-of-ulysses-2022-marks-centenary-of-joyce-s-experimental-masterpiece-1.4766055">blasphemous</a> and obscene by many. Worse, Joyce had freely borrowed from life, populating the pages of <em>Ulysses</em> with people he had known in Dublin.</p> <p>By the time of the centenary of Joyce's birth in 1982, attitudes had changed in Ireland. <em>Ulysses</em> was now celebrated, if not widely read. RTÉ Radio broadcast a <a class="reference external" href="https://www.rte.ie/archives/exhibitions/681-history-of-rte/706-rte-1980s/327476-ulysses-broadcast/">25-hour reading</a> of the entire book.</p> <p>I was a schoolboy of almost seventeen in Dublin in February 1982. I did not, alas, listen to the <a class="reference external" href="https://archive.org/details/Ulysses-Audiobook-Merged">RTÉ recording</a> at the time, but at some point that year, I started reading <em>Ulysses</em> for myself. And like so many would-be readers before and since, I hit Episode 3, &quot;Proteus&quot;, which opens with &quot;Ineluctable modality of the visible&quot; and promptly dives into Stephen Dedalus's impenetrable thoughts. If I could give some advice to myself 40 years ago, it would be to &quot;skip over the hard bits&quot;. I'm in good company on that recommendation. Daniel Mulhall, Ireland's Ambassador to the US and author of the recent <a class="reference external" href="https://www.amazon.com/dp/1848408293/?tag=georgvreill-20">Ulysses: A Reader's Odyssey</a>, gives the same advice. Episode 4, &quot;Calypso&quot;, introduces us to Leopold Bloom and is far more enjoyable.</p> <p>Unfortunately, I did not have the benefit of that advice then, and I had little to do with <em>Ulysses</em> for the next two decades. In <a class="reference external" href="https://www.georgevreilly.com/blog/2003/06/11/Bloomsday.html">2003</a>, I took part in the <a class="reference external" href="https://www.wildgeeseseattle.org/">Wild Geese Players of Seattle</a>'s staged reading of Episodes 8 and 9, &quot;Lestrygonians&quot; and &quot;Scylla and Charybdis&quot;. In 2004, I helped adapt the next episode, &quot;Wandering Rocks&quot;, for that year's staged reading. When Kieran O'Malley, the group's original founder, moved back to Ireland in 2005 or 2006, I took over as dramaturg. I've led the Geese for many years now, and I've <a class="reference external" href="https://github.com/WildGeeseSeattle/Ulysses">adapted scripts</a> for the entire book.</p> <p>Why? The connection with Dublin that I share with Joyce is certainly part of it. I've come to love the book. (Most of it; there are certainly parts that I find tedious.) It's a book in which very little happens, and yet it encompasses everything. We get an extremely rounded picture of Bloom and his inner life. It's funny and sad and erudite and annoying and wise. Joyce has distilled the human condition into one summer's day in Dublin.</p> <p>And now it is the centenary of the publication of <em>Ulysses</em>. I posted some <a class="reference external" href="https://www.wildgeeseseattle.org/ulysses-at-100.html">centenary material</a> at the Wild Geese website.</p> <p><em>Ulysses</em> has become an ineluctable part of my life.</p> Wed, 02 Feb 2022 07:00:00 GMT tag:www.georgevreilly.com,2022-02-02:/blog/2022/02/02/UlyssesAt100.html Diffing a fragment of a file https://www.georgevreilly.com/blog/2022/01/31/DiffFileFragment.html <p>A while back, I had extracted some code out of a large file into a separate file and made some modifications. I wanted to check that the differences were minimal. Let's say that the extracted code had been between lines 123 and 456 of <tt class="docutils literal">large_old_file</tt>.</p> <pre class="code bash literal-block"> diff -u &lt;<span class="o">(</span>sed -n <span class="s1">'123,456p;457q'</span> large_old_file<span class="o">)</span> new_file </pre> <p>What's happening here?</p> <ul class="simple"> <li><tt class="docutils literal">sed <span class="pre">-n</span> '123,456p'</tt> is printing lines 123–456 of <tt class="docutils literal">large_old_file</tt>.</li> <li>The <tt class="docutils literal">457q</tt> tells sed to abandon the file at line 457. Otherwise, it will keep reading all the way to the end.</li> <li>The <tt class="docutils literal">&lt;(sed <span class="pre">...)</span></tt> is an example of <a class="reference external" href="https://tldp.org/LDP/abs/html/process-sub.html">process substitution</a>. The <em>output</em> of the <tt class="docutils literal">sed</tt> invocation becomes the first <em>input</em> of the <tt class="docutils literal">diff</tt> command.</li> </ul> <p>A similar example: <a class="reference external" href="https://www.georgevreilly.com/blog/2017/01/11/DiffTransformedFile.html">Diff a Transformed File</a>.</p> <p>BTW, these days, I usually use <a class="reference external" href="https://github.com/dandavison/delta">delta</a> for diffing at the command line, especially with Git.</p> Mon, 31 Jan 2022 07:00:00 GMT tag:www.georgevreilly.com,2022-01-31:/blog/2022/01/31/DiffFileFragment.html 40 Years of Programming https://www.georgevreilly.com/blog/2022/01/31/40YearsOfProgramming.html <p>40 years ago this month, I sat down at a computer and wrote a program. (Or &quot;programme&quot;, as I spelled it then.) It was the first time I had ever used a computer. Very few people had used computers in 1982, in Ireland or elsewhere.</p> <p>What was the program? No idea. Just a few lines of AppleSoft Basic. But it was enough to get me hooked and change my life.</p> <p>I still get a hit when a little bit of code unlocks in my brain. It's quite addictive. There's always more to learn and to see.</p> <p>I wrote more about this in 2012: <a class="reference external" href="https://www.georgevreilly.com/blog/2012/01/26/30YearsOfProgramming.html">30 Years of Programming</a>.</p> Mon, 31 Jan 2022 07:00:00 GMT tag:www.georgevreilly.com,2022-01-31:/blog/2022/01/31/40YearsOfProgramming.html On Circumnavigating the Aubreyiad Again https://www.georgevreilly.com/blog/2021/12/30/CircumnavigatingAubreyiad.html <p>At the beginning of 2021, prompted by Russell Crowe's defense of <em>Master and Commander</em>, I began yet another re-read of the twenty Aubrey-Maturin novels. Or, as the fandom would have it, another circumnavigation. It's probably my fifth or sixth circumnavigation, since I bought the complete boxed set as a Christmas present to myself in the early aughts.</p> <p>I completed the twentieth book, <em>Blue at the Mizzen</em>, yesterday, and also the few pages of the final, unfinished novel, <em>21</em>. (I also read about <a class="reference external" href="https://www.goodreads.com/user/year_in_books/2021/3723742">120 other books</a> in 2021, down from a stupendous <a class="reference external" href="https://www.goodreads.com/user/year_in_books/2020/3723742">200 books in 2020</a>, but that's neither here nor there.)</p> <blockquote class="twitter-tweet"> <p lang="en" dir="ltr">I think I&#39;m due for another re-read of Patrick O&#39;Brian&#39;s Aubrey/Maturin novels (all 6,500 pages) and a rewatch of Master and Commander. <a href="https://t.co/gVf9IBan7e">pic.twitter.com/gVf9IBan7e</a></p>&mdash; George V. Reilly (@georgevreilly) <a href="https://twitter.com/georgevreilly/status/1350913122345783297?ref_src=twsrc%5Etfw"> January 17, 2021</a> </blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><p>Why did I put myself through re-reading 6,500 pages of a dense <em>roman-fleuve</em> yet again? For the sheer pleasure of joining up once more with my old friends, Captain Jack Aubrey and Dr Stephen Maturin, in their 15-year fight against Napoleon.</p> <p>They are an unlikely pair of friends. Jack Aubrey, a big, hearty English naval officer, is utterly competent in his domain, magnificent at sea but naïve and easily duped on land. Stephen Maturin, the illegitimate son of an Irish officer and a Catalan lady, is a renowned physician and naturalist, a former <a class="reference external" href="https://en.wikipedia.org/wiki/Society_of_United_Irishmen">United Irishman</a> turned British intelligence agent, a Catholic in a Protestant service, and a perpetual landlubber and sloven. They have little in common, save a shared love of music and of natural philosophy. Both are Fellows of the <a class="reference external" href="https://royalsociety.org/about-us/history/">Royal Society</a>—Jack, to many's surprise, is a mathematician and astronomer.</p> <p>And yet, they are fast friends and Stephen follows Jack from ship to ship. A captain must hold himself aloof from his crew and his officers. He is the sole authority, often months of sailing away from his superiors. He dines alone, save when invited to the officers' wardroom or when he invites them to join him. Stephen, as Jack's particular friend, is exempt from the normal strictures, allowing Jack to retain his humanity on the long voyages.</p> <p>It is the friendship and the two main characters that hold me, along with the adventure and the travel. O'Brian immersed himself in the eighteenth and early nineteenth centuries, and his encyclopaedic knowledge helped him bring the era to life with incredible verisimilitude. O'Brian was an accomplished storyteller and often <a class="reference external" href="https://quotingobrian.tumblr.com/">very funny</a>.</p> <p>The characters sound and act like people of the time, not like transplanted twentieth century Americans. Jack, Stephen, and the other characters would be at home in the pages of Jane Austen (sister to two Royal Navy officers).</p> <p><a class="reference external" href="https://www.tor.com/series/re-reading-patrick-obrians-aubrey-maturin-series/">Jo Walton's re-read</a> will give you a taste of the books.</p> Thu, 30 Dec 2021 07:00:00 GMT tag:www.georgevreilly.com,2021-12-30:/blog/2021/12/30/CircumnavigatingAubreyiad.html Review: Crafting Interpreters https://www.georgevreilly.com/blog/2021/12/28/ReviewCraftingInterpreters.html <a class="reference external image-reference" href="https://www.amazon.com/dp/0990582930/?tag=georgvreill-20"><img alt="Crafting Interpreters" class="right-float" src="https://images-na.ssl-images-amazon.com/images/I/41-7uSeOyCL._SX398_BO1,204,203,200_.jpg"/></a> <div class="line-block"> <div class="line">Title: <a class="reference external" href="https://craftinginterpreters.com/">Crafting Interpreters</a></div> <div class="line">Author: Robert Nystrom</div> <div class="line">Rating: ★ ★ ★ ★ ★</div> <div class="line">Publisher: Genever Benning</div> <div class="line">Copyright: 2021</div> <div class="line">ISBN: <a class="reference external" href="https://www.amazon.com/dp/0990582930/?tag=georgvreill-20">978-0990582939</a></div> <div class="line">Pages: 640</div> <div class="line">Keywords: programming, interpreters</div> <div class="line">Reading period: 10–28 December, 2021</div> </div> <p>I've read hundreds of technical books over the last 40 years. <em>Crafting Interpreters</em> is an instant classic, and far more readable and fun than many of the classics.</p> <p>Nystrom covers a lot of ground in this book, building two very different interpreters for Lox, a small dynamic language of his own design. He takes us through <em>every line</em> of jlox, a Java-based tree-walk interpreter, and of clox, a bytecode virtual machine written in C.</p> <p>For the first implementation, jlox, he covers such topics as scanning, parsing expressions with recursive descent, evaluating expressions, control flow, functions and closures, classes, and inheritance.</p> <p>Starting with an empty slate, Nystrom adds just enough code to implement the topic of each chapter, having a working albeit incomplete implementation of the interpreter by the end of the chapter. He adds new code as he goes, inserting an extra <tt class="docutils literal">case</tt> into a <tt class="docutils literal">switch</tt> here or writing a new function there, or replacing a few lines of an earlier implementation with something that's just been explained. Knuth's <a class="reference external" href="https://en.wikipedia.org/wiki/Literate_programming">Literate Programming</a> explains a finished implementation, broken into separate pieces for exposition. Nystrom's continual, ever-evolving exposition is slower to get to the point, but it's excellent pedagogy. I would be remiss if I didn't mention the hundreds of hand-drawn illustrations, which add a quirky flavor to the tone of the book. He has a blog post on how he <a class="reference external" href="http://journal.stuffwithstuff.com/2020/04/05/crafting-crafting-interpreters/">pulled this organization off</a> and another on how he created a <a class="reference external" href="http://journal.stuffwithstuff.com/2021/07/29/640-pages-in-15-months/">physical book</a> from the text.</p> <p>clox is a very different second implementation of a Lox interpreter. Instead of a slow interpreter walking an abstract syntax tree, he develops a stack-based virtual machine, compiles Lox into bytecode, and interprets the bytecode. He covers theory and practical considerations for creating a bytecode virtual machine, makes use of Pratt’s “top-down operator precedence parsing”, and implements closures and classes in C. In jlox, he used Java's <tt class="docutils literal">HashMap</tt> to manage identifiers and relied on Java's garbage collection for memory management. For clox, he implements a hash table and a mark-and-sweep garbage collector. Although he has to cover similar topics (parsing, local variables, closures) each time, he finds a fresh perspective for the second implementation.</p> <p>I read the entire book for free at <a class="reference external" href="https://craftinginterpreters.com/">https://craftinginterpreters.com/</a>, but I liked it so much that I've ordered a physical copy. In fact, I actually read much of the book on the website in 2020, but life intervened and I didn't finish it, so this month, I read it again from the start.</p> <p>This book is not a textbook and you don't get an exhaustive introduction to building interpreters, much less compilers. In the final year of my Computer Science degree at Trinity College Dublin in 1986–87, I studied the <a class="reference external" href="https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools">Dragon Book</a> when the first edition was brand new. <em>Crafting Interpreters</em> is a lot more fun than the Dragon Book.</p> <p>Highly recommended!</p> Tue, 28 Dec 2021 07:00:00 GMT tag:www.georgevreilly.com,2021-12-28:/blog/2021/12/28/ReviewCraftingInterpreters.html Path Traversal Attacks https://www.georgevreilly.com/blog/2021/10/05/PathTraversalAttacks.html <p>I was surprised to read this evening that the Apache Web Server just fixed an actively exploited path traversal flaw.</p> <blockquote class="twitter-tweet"> <p lang="en" dir="ltr"> 🚨 Apache has disclosed an *actively exploited* Path traversal flaw in the <a href="https://twitter.com/hashtag/opensource?src=hash&amp;ref_src=twsrc%5Etfw">#opensource</a> &quot;httpd&quot; server. Over 112,000 exposed Apache servers run version 2.4.49, and should be upgraded now!<br> New fix checks for encoded path traversal characters e.g. /../.%2E/<a href="https://t.co/1tLNc3LAul">https://t.co/1tLNc3LAul</a> <a href="https://t.co/mDHLEU3k9N">pic.twitter.com/mDHLEU3k9N</a> </p>&mdash; Ax Sharma (@Ax_Sharma) <a href="https://twitter.com/Ax_Sharma/status/1445391350053183500?ref_src=twsrc%5Etfw">October 5, 2021</a> </blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><p>Apparently, it was <a class="reference external" href="https://github.com/apache/httpd/commit/4c79fd280dfa3eede5a6f3baebc7ef2e55b3eb6a">introduced over a year ago</a>.</p> <p>I'm gobsmacked that Apache didn't have a robust suite of tests for this.</p> <p>Directory Traversal attacks have been a problem for web servers since the beginning. <a class="reference external" href="https://owasp.org/www-community/attacks/Path_Traversal">OWASP</a>, <a class="reference external" href="https://portswigger.net/web-security/file-path-traversal">PortSwigger</a>, and <a class="reference external" href="https://spanning.com/blog/directory-traversal-web-based-application-security-part-8/">Spanning</a> all have explanations that you can read. The essence is that you make a request to a URL that looks like <tt class="docutils literal"><span class="pre">http://example.com/cgi-bin/../../../../etc/passwd</span></tt> and, voilà, you get access to something that you shouldn't. Each of the <tt class="docutils literal">..</tt> path segments climbs up a level of the file system. Even the simplest web server knows better than to blindly allow a sequence of <tt class="docutils literal">..</tt> path segments, so you have to be a little clever about how you express them.</p> <div class="section" id="iis-unicode-exploit"> <h3>IIS Unicode Exploit</h3> <p>I remember when I worked on the IIS development team at Microsoft in 1997–2004, we got hit by <a class="reference external" href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2000-0884">CVE-2000-0884</a> in 2000, which made use of an <a class="reference external" href="https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings">overlong UTF-8 encoding</a>.</p> <p>URLs allow <a class="reference external" href="https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding">percent encoding</a> for characters that can't be sent literally. For example, <tt class="docutils literal">%3D</tt> encodes an <tt class="docutils literal">=</tt> as the two-digit hexadecimal value of <tt class="docutils literal">=</tt>’s ASCII code. UTF-8 characters beyond U+007F require two or more bytes of storage, each of which can be percent encoded; e.g., U+00C1 (<tt class="docutils literal">Á</tt>) is encoded as the <tt class="docutils literal">C3 81</tt> byte pair in UTF-8, and as <tt class="docutils literal">%C3%81</tt> in percent encoding.</p> <p>The slash character, <tt class="docutils literal">/</tt> or U+002F, can be percent encoded as <tt class="docutils literal">%2F</tt>. IIS 4 and 5 were smart enough to treat <tt class="docutils literal">%2F</tt> as a slash and to defend against sequences like <tt class="docutils literal"><span class="pre">..%2F..%2F</span></tt>. However, the attackers encoded a slash as <tt class="docutils literal">%C0%AF</tt>—a sequence that is burned into my brain. This two-byte UTF-8 sequence can be decoded as U+002F, though it should not be treated as valid as it is <a class="reference external" href="https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings">overlong</a>: the five payload bits in the leading byte are all zero.</p> <p>The <a class="reference external" href="https://www.giac.org/paper/gcih/115/iis-unicode-exploit/101163">GIAC paper</a> explains in some detail how this could be exploited.</p> </div> <div class="section" id="windows-security-push"> <h3>Windows Security Push</h3> <p>Windows XP went on sale in late 2001, touted as the most secure version of Windows ever. (It was, at that time.)</p> <p>Right around Christmas 2001, the <a class="reference external" href="https://www.giac.org/paper/gcih/274/windows-xp-upnp-exploits/102906">UPnP vulnerabilty</a> was disclosed. Brian Valentine, the Senior VP who ran Windows, threw a shitfit. It was announced that <em>all</em> of Windows would spend the month of February 2002 undergoing security training, so that we could <a class="reference external" href="https://owasp.org/www-community/Threat_Modeling">threat model</a> and review our code.</p> <p>For IIS 6, which would be released in Windows Server 2003, we had fundamentally rearchitected it with a new worker process model (inspired by Apache's) and we had rewritten much of it. There was a new kernel mode driver, http.sys, that terminated all requests and routed them to the appropriate handler in kernel or user mode. I was part of the http.sys dev team at that point.</p> <p>IIS had already gotten serious about security by then. We had to, after <a class="reference external" href="https://en.wikipedia.org/wiki/Code_Red_(computer_worm)">Code Red</a>, <a class="reference external" href="https://en.wikipedia.org/wiki/Nimda">Nimda</a>, the Unicode exploit, and others. <a class="reference external" href="https://www.linkedin.com/in/mikehow/">Mike Howard</a> had been the IIS Security Program Manager before he went on to bigger responsiblities. A lot of the first edition of his <a class="reference external" href="https://www.amazon.com/Writing-Secure-Second-Developer-Practices/dp/0735617228">Writing Secure Code</a> book was based on his experience with securing IIS, and a lot of the second edition benefited from the Security Push experience.</p> <p>Since http.sys was new and an obvious target, our team actually spent two months carefully reviewing everything. It turned out that we had done a good job over the previous couple of years and we didn't find much to worry about.</p> <p>We did identify that the URL canonicalization in http.sys was overly complicated. I rewrote that component and I created a ton of unit tests for it. Developers writing unit tests was not common at Microsoft back in 2002: we had a separate caste of testers to write tests.</p> <p>I've been out of the loop since I left IIS in 2004, but to my knowledge, there were no further vulnerabilities in URL handling.</p> <p>I'm surprised and disappointed that Apache would mess up path traversal in the 2020s.</p> </div> Tue, 05 Oct 2021 07:00:00 GMT tag:www.georgevreilly.com,2021-10-05:/blog/2021/10/05/PathTraversalAttacks.html Accidentally Quadratic: Python List Membership https://www.georgevreilly.com/blog/2021/10/04/AccidentallyQuadraticPythonListMembership.html <p>We had a performance regression in a test suite recently when the median test time jumped by two minutes.</p> <a class="reference external image-reference" href="https://www.bigocheatsheet.com/"><img alt="Big O Cheat Sheet" src="https://www.georgevreilly.com/content/binary/bigochart.gif"/></a> <p>We tracked it down to this (simplified) code fragment:</p> <pre class="code python literal-block"> <span class="n">task_inclusions</span> <span class="o">=</span> <span class="p">[</span> <span class="n">some_collection_of_tasks</span><span class="p">()</span> <span class="p">]</span> <span class="n">invalid_tasks</span> <span class="o">=</span> <span class="p">[</span><span class="n">t</span><span class="o">.</span><span class="n">task_id</span><span class="p">()</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">airflow_tasks</span> <span class="k">if</span> <span class="n">t</span><span class="o">.</span><span class="n">task_id</span><span class="p">()</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">task_inclusions</span><span class="p">]</span> </pre> <p>This looks fairly innocuous—and it was—until the size of the result returned from <tt class="docutils literal">some_collection_of_tasks()</tt> jumped from a few hundred to a few thousand.</p> <p>The <a class="reference external" href="https://docs.python.org/3/reference/expressions.html#membership-test-operations">in comparison operator</a> conveniently works with all of Python's standard sequences and collections, but its efficiency varies. For a <tt class="docutils literal">list</tt> and other sequences, <tt class="docutils literal">in</tt> must search linearly through all the elements until it finds a matching element <em>or</em> the list is exhausted. In other words, <tt class="docutils literal">x in some_list</tt> takes <span class="formula"><i>O</i>(<i>n</i>)</span> time. For a <tt class="docutils literal">set</tt> or a <tt class="docutils literal">dict</tt>, however, <tt class="docutils literal">x in container</tt> takes, on average, only <span class="formula"><i>O</i>(1)</span> time. See <a class="reference external" href="https://wiki.python.org/moin/TimeComplexity">Time Complexity</a> for more.</p> <p>The <tt class="docutils literal">invalid_tasks</tt> list comprehension was explicitly looping through one list, <tt class="docutils literal">airflow_tasks</tt>, and implicitly doing a linear search through <tt class="docutils literal">task_inclusions</tt> for each value of <tt class="docutils literal">t</tt>. The nested loop was hidden and its effect only became apparent when <tt class="docutils literal">task_inclusions</tt> grew large.</p> <p>The list comprehension was actually taking <span class="formula"><i>O</i>(<i>n</i><sup>2</sup>)</span> time. When <span class="formula"><i>n</i></span> was comparatively small (a few hundred), this wasn't a problem. When <span class="formula"><i>n</i></span> grew to several thousand, it became a big problem.</p> <p>This is a classic example of an <a class="reference external" href="https://accidentallyquadratic.tumblr.com/">accidentally quadratic</a> algorithm. Indeed, Nelson describes a very similar problem with <a class="reference external" href="https://accidentallyquadratic.tumblr.com/post/161243900944/mercurial-changegroup-application">Mercurial changegroups</a>.</p> <p>This performance regression was compounded because this fragment of code was being called thousands of times—I believe once for each task— making the overall cost cubic, <span class="formula"><i>O</i>(<i>n</i><sup>3</sup>)</span>.</p> <p>The fix here is similar: Use a <tt class="docutils literal">set</tt> instead of a <tt class="docutils literal">list</tt> and get <span class="formula"><i>O</i>(1)</span> membership testing. The <tt class="docutils literal">invalid_tasks</tt> list comprehension now takes the expected <span class="formula"><i>O</i>(<i>n</i>)</span> time.</p> <pre class="code python literal-block"> <span class="n">task_inclusions</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span> <span class="n">some_collection_of_tasks</span><span class="p">()</span> <span class="p">)</span> <span class="n">invalid_tasks</span> <span class="o">=</span> <span class="p">[</span><span class="n">t</span><span class="o">.</span><span class="n">task_id</span><span class="p">()</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">airflow_tasks</span> <span class="k">if</span> <span class="n">t</span><span class="o">.</span><span class="n">task_id</span><span class="p">()</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">task_inclusions</span><span class="p">]</span> </pre> <p>More at <a class="reference external" href="https://www.coengoedegebure.com/understanding-big-o-notation/">Understanding Big-O Notation</a> and the <a class="reference external" href="https://www.bigocheatsheet.com/">Big-O Cheat Sheet</a>.</p> Mon, 04 Oct 2021 07:00:00 GMT tag:www.georgevreilly.com,2021-10-04:/blog/2021/10/04/AccidentallyQuadraticPythonListMembership.html Passphrase Generators https://www.georgevreilly.com/blog/2021/05/10/PassphraseGenerators.html <a class="reference external image-reference" href="https://xkcd.com/936/"><img alt="Password Strength" src="https://imgs.xkcd.com/comics/password_strength.png"/></a> <p>I've been using <a class="reference external" href="https://en.wikipedia.org/wiki/Password_manager">password managers</a> for at least 15 years to keep track of all my passwords. I have separate, distinct, strong passwords for hundreds of sites, and I've only memorized the handful that I need to actually type regularly.</p> <p>I started out with the <a class="reference external" href="https://www.georgevreilly.com/blog/2006/02/06/200KeePassEntries.html">KeePass</a> desktop app originally, but I switched to the online <a class="reference external" href="https://www.georgevreilly.com/blog/2016/01/07/DicewareAndLastpass.html">LastPass</a> app about a decade ago. At work, we use <a class="reference external" href="https://1password.com/">1Password</a>.</p> <p>When I register for a site, LastPass generates a random password for me, such as:</p> <pre class="literal-block"> tV%5joS$U6^uY5xU T2oEUY!g70Iv1b&amp;I 8kNHg9*A5GMR9%8D </pre> <p>LastPass securely syncs my passwords between machines and devices. Its browser integration and its Android and iPhone apps mean that I rarely ever have to actually type any of those ugly messes in.</p> <p>But when I do have to type in such a password, it's unpleasant in a browser. It doesn't help that LastPass in some cases displays passwords in a sans-serif font that makes it easy to <a class="reference external" href="https://typography.guru/journal/letters-symbols-misrecognition/">misrecognize</a> letters such as <tt class="docutils literal">Il</tt>, <tt class="docutils literal">0O</tt>, <tt class="docutils literal">5S</tt>, or <tt class="docutils literal">8B</tt>. It's far more painful in an Android app, where you have to switch the keyboard in and out of symbol mode. It's usually even worse in iPhone apps, which rarely offer you an option to see your password in the clear as you're laboriously typing it, so it's easy to make a mistake. When I tried to use a remote control to enter my Netflix and Amazon Prime passwords into a new set-top box, I got so annoyed that I brought down a real keyboard and plugged it into the USB port.</p> <p><a class="reference external" href="https://theintercept.com/2015/03/26/passphrases-can-memorize-attackers-cant-guess/">Passphrases</a> have nice properties compared to random passwords: they're human readable, they're much easier—if longer—to type, and you can actually remember them if you have to. A passphrase of at least five words (chosen by a secure random generator) is computationally infeasible to crack.</p> <p>The ur-example of random passphrase generators is <a class="reference external" href="https://en.wikipedia.org/wiki/Diceware">Diceware</a> from 1995. There are various problems with the Diceware wordlist, which are rectified by more modern lists, such as the <a class="reference external" href="https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases">EFF Wordlists</a>.</p> <p>Which would you rather type? The <a class="reference external" href="http://www.catb.org/jargon/html/L/line-noise.html">line noise</a> above or one of these passphrases?:</p> <pre class="literal-block"> confident starfish aftermost elsewhere jasmine shun baggage chaps reward cuddle avenue rut pardon skating earlobe latter blissful snippet jolt corroding upstage-divinely-ninth-unfilled-skeleton SkimmingMachinistBlessHesitancyKissableRink </pre> <p>When I want to generate a random passphrase, I tend to use either the <a class="reference external" href="https://github.com/ulif/diceware">Python diceware</a> command-line tool or Glenn Rempe's JavaScript-based <a class="reference external" href="https://www.rempe.us/diceware/#eff">Diceware website</a>. Both use cryptographic random number generators to generate excellent passphrases.</p> <p>The <a class="reference external" href="https://1password.com/password-generator/">1Password Online Generator</a> (in Memorable Password mode) also generates passphrases, as do the desktop and browser versions of 1Password.</p> <p>My master password for LastPass is a passphrase, as is my laptop password. I'm also using <a class="reference external" href="https://authy.com/">Authy</a> for 2FA, but that's a post for another time.</p> <div class="admonition tip"> <p class="first admonition-title">Tip</p> <p>If you have to supply answers for one of those misbegotten <a class="reference external" href="https://www.okta.com/blog/2021/03/security-questions/">security questions</a>, such as your favorite movie or your first car, <em>do not answer truthfully</em>. Truthful answers increase your risk of identity theft. The answers are often guessable, can frequently be learned easily about you, and may be obtained through a password breach on another site.</p> <p>Instead, generate a passphrase as the &quot;answer&quot; <em>and store it and the question in the Notes field of your password manager</em>. If you have to supply the answer to a security question over the phone to a customer service rep, you'll be thankful that you chose something that you can clearly say aloud.</p> <p class="last">Also <a class="reference external" href="https://www.mentalfloss.com/article/522136/taking-facebook-quizzes-could-put-you-risk-identity-theft">Facebook quizzes</a> and memes like &quot;Your porn name is your middle name and the first car you had&quot; are trying to obtain your answers to common security questions. Don't answer them.</p> </div> Mon, 10 May 2021 07:00:00 GMT tag:www.georgevreilly.com,2021-05-10:/blog/2021/05/10/PassphraseGenerators.html Punctuating James Joyce https://www.georgevreilly.com/blog/2021/05/08/PunctuatingJamesJoyce.html <a class="reference external image-reference" href="https://www.writermag.com/improve-your-writing/revision-grammar/punctuation-bootcamp/"><img alt="Punctuation Boot Camp: Our ultimate grammar guide" src="https://cdn.writermag.com/2018/07/punctuationbootcamp_news-e1540567976133.jpg"/></a> <p>In <a class="reference external" href="https://lithub.com/the-punctuation-marks-loved-and-hated-by-famous-writers/">The Punctuation Marks Loved (and Hated) by Famous Writers</a>, Emily Temple relays a range of opinions from writers such as Tom Wolfe, Elmore Leonard, and Ursula K. Le Guin on periods, semicolons, hyphens and more.</p> <p>On commas:</p> <blockquote> <p>Listens to the sound of the sentence, and is always right, Bob: Toni Morrison</p> <blockquote> [On her editor, Bob Gottlieb, who famously “was always inserting commas into Morrison’s sentences and she was always taking them out”] We read the same way. We think the same way. He is overwhelmingly aggressive about commas and all sorts of things. He does not understand that commas are for pauses and breath. He thinks commas are for grammatical things. We have come to an understanding, but it is still a fight.</blockquote> </blockquote> <p>On periods:</p> <blockquote> <p>Tolerates it, if he must: Cormac McCarthy</p> <blockquote> <p>I believe in periods, in capitals, in the occasional comma, and that’s it.</p> <ul class="simple"> <li></li> </ul> <p>James Joyce is a good model for punctuation. He keeps it to an absolute minimum. There’s no reason to blot the page up with weird little marks. I mean, if you write properly you shouldn’t have to punctuate.</p> </blockquote> </blockquote> <p>My own prose tends towards longer sentences, often sprinkled with dashes, parentheses, and semicolons.</p> <p>Since 2004, I've adapted all of James Joyce's <em>Ulysses</em> for staged readings by the <a class="reference external" href="https://www.wildgeeseseattle.org/">Wild Geese Players of Seattle</a>, and I'm in the Morrison camp, not the McCarthy–Joyce one.</p> <p>Paragraphs like these work on the printed page. (More or less.)</p> <blockquote> <p>The tear is bloody near your eye. Talking through his bloody hat. Fitter for him go home to the little sleepwalking bitch he married, Mooney, the bumbailiff's daughter, mother kept a kip in Hardwicke street, that used to be stravaging about the landings Bantam Lyons told me that was stopping there at two in the morning without a stitch on her, exposing her person, open to all comers, fair field and no favour.</p> <p class="attribution">&mdash;Anonymous narrator, Episode 12, “Cyclops”, L400</p> </blockquote> <p></p> <blockquote> <p>Martin Cunningham forgot to give us his <a class="reference external" href="http://www.jjon.org/joyce-s-allusions/spellingbee-conundrum">spellingbee conundrum</a> this morning. It is amusing to view the unpar one ar alleled embarra two ars is it? double ess ment of a harassed pedlar while gauging au the symmetry with a y of a peeled pear under a cemetery wall. Silly, isn't it? Cemetery put in of course on account of the symmetry.</p> <p class="attribution">&mdash;Mr Bloom, Episode 7, “Aeolus”, L170</p> </blockquote> <p>But imagine trying to read those sentences <em>aloud</em> during a performance and bring the sense of the text to the audience.</p> <p>As an aide to my performers, I've introduced “cadence bars” (denoted by ‘≀’) to the scripts to augment Joyce's sparse punctuation and to bring out the individual fragments.</p> <blockquote> The tear is bloody near your eye. Talking through his bloody hat. Fitter for him go home ≀ to the little sleepwalking bitch he married, Mooney, the bum·bailiff's daughter, mother kept a kip in Hardwicke street, that used to be stravaging about the landings ≀ Bantam Lyons told me ≀ that was stopping there at two in the morning ≀ without a stitch on her, exposing her person, open to all comers, fair field and no favour.</blockquote> <p></p> <blockquote> Martin Cunningham forgot to give us his spelling·bee conundrum this morning. It is amusing to view the ≀ unpar ≀ one ar ≀ alleled ≀ embarra ≀ two ars is it? ≀ double ess ≀ ment ≀ of a harassed pedlar ≀ while gauging ≀ au ≀ the symmetry ≀ with a y ≀ of a peeled pear ≀ under a cemetery wall. Silly, isn't it? Cemetery put in of course ≀ on account of the symmetry.</blockquote> <p>I've also added some pseudo-hyphens (bum·bailiff, spelling·bee, what·do·you·call·him) to counteract Joyce's Germanic habit of stringing several words into one.</p> <p>This seems to help, though some of our readers have to fight a tendency to pause too much when they encounter a ‘≀’ symbol.</p> Sat, 08 May 2021 07:00:00 GMT tag:www.georgevreilly.com,2021-05-08:/blog/2021/05/08/PunctuatingJamesJoyce.html Now You Have 32 Problems https://www.georgevreilly.com/blog/2020/04/23/regex-32-problems.html <p></p> <blockquote> <p>Some people, when confronted with a problem, think “I know, I'll use regular expressions.” <a class="reference external" href="http://regex.info/blog/2006-09-15/247">Now they have two problems</a>.</p> <blockquote> — Jaime Zawinksi</blockquote> </blockquote> <p>A Twitter thread about <a class="reference external" href="https://twitter.com/nbashaw/status/1253186961482715136">very long regexes</a> reminded me of the <a class="reference external" href="https://www.georgevreilly.com/blog/2009/07/11/64bitWindows7.html">longest regex</a> that I ever ran afoul of, a particularly horrible multilevel mess that had worked acceptably on the 32-bit .NET CLR, but brought the 64-bit CLR to its knees.</p> <blockquote> <p>Whenever I ran our ASP.NET web application [on Win64], it would go berserk, eat up all 4GB of my physical RAM, push the working set of IIS's w3wp.exe to <em>12GB</em>, and max out one of my 4&nbsp;cores! The only way to maintain any sanity was to run <tt class="docutils literal">iisreset</tt> every 20&nbsp;minutes to gently kill the process.</p> <p>WinDbg and Process Explorer showed that the rogue thread was stuck in a loop in <tt class="docutils literal">mscorjit!LifetimesListInteriorBlocksHelperIterative&lt;GCInfoLiveRecordManipulator&gt;</tt>. I passed a minidump on to my former colleagues in IIS, who sent it to the CLR team. They said:</p> <blockquote> The only thing I can tell is that it is Regex, and some regex expression compiled down to a method with 456KB of IL. That is <em>huge</em>, and yes 12GB of RAM consumed for something like that is expected.</blockquote> <p>With that clue, I was able to track down the problem, a particularly foul regex, built from a 10KB string, with 32&nbsp;alternating expressions, each of which contains dozens of alternated subexpressions. The string is built from many smaller strings, so it's not obvious in the source just how ugly it is.</p> </blockquote> <p>I never wrote a followup post explaining how I dealt with this beast.</p> <p>The regex was used on the <a class="reference external" href="https://www.cozi.com/calendar/">Cozi calendar</a> to parse appointments in everyday language, such as “Ann/John Dinner out Friday at 8pm” or “John's birthday every Dec. 7”. These would get translated into (possibly recurring) <a class="reference external" href="https://tools.ietf.org/html/rfc5545">iCalendar</a> appointments.</p> <p>Some of the subexpressions mentioned above looked like:</p> <ul class="simple"> <li><tt class="docutils literal">ordinals = <span class="pre">&quot;1st|2nd|...|31st&quot;</span></tt></li> <li><tt class="docutils literal">short_days = <span class="pre">&quot;Sun|Mon|...|Sat&quot;</span></tt></li> <li><tt class="docutils literal">full_days = <span class="pre">&quot;Sunday|Monday|...|Saturday&quot;</span></tt></li> <li><tt class="docutils literal">short_months = <span class="pre">&quot;Jan|Feb|...|Dec&quot;</span></tt></li> <li><tt class="docutils literal">full_months = <span class="pre">&quot;January|February|...|December&quot;</span></tt></li> <li><tt class="docutils literal">recurrence = <span class="pre">&quot;((every|each)?</span> (first|second|third|fourth|fifth|last)? &quot; + &quot;(&quot; + short_days + &quot;|&quot; + full_days + &quot;)&quot; + ...</tt></li> </ul> <p>I've elided the intermediate values but they were spelled out in the original. Some of the simpler subexpressions were repeated several times, nested inside others.</p> <p>This all screamed that a <em>grammar</em> and a <em>real parser</em> were needed, but the test suite also screamed <em>here be dragons!</em></p> <p>I resisted the temptation to rewrite the appointment parser from scratch with a proper grammar, or to experiment with a real natural language parser, though it remained on my personal todo list for the rest of my time at Cozi. We were migrating from C# to Python at that point, and the legacy appointment parser was one of the few remaining pieces that prevented us from shutting down the .NET servers.</p> <p>Instead, I changed the appointment parser code so that it didn't attempt to match the entire 10KB monster in one go. I looped through each of the 32 top-level disjunctions, manually performing the alternation. If any one of those matched, then I had what I needed. Reducing the regexes to a few hundred characters each tamed the combinatorial explosion of backtracking state.</p> <p>Regexes definitely have a place, but do not try to implement a full grammar as a single regular expression.</p> Thu, 23 Apr 2020 07:00:00 GMT tag:www.georgevreilly.com,2020-04-23:/blog/2020/04/23/regex-32-problems.html Weirdest Birthday Ever https://www.georgevreilly.com/blog/2020/03/15/WeirdestBirthdayEver.html <p>When I said that Emma and I would be spending <a class="reference external" href="https://www.georgevreilly.com/blog/2019/11/22/Dublin2020.html">2020 in Dublin</a>, I could not possibly have anticipated what would be happening in Seattle while we were gone.</p> <p>Today is my 55th birthday and it's the weirdest birthday ever, in what must be the weirdest week that most of us have lived through. (So far.)</p> <p>COVID-19 is all that anyone can talk about: where it's spreading, how it's being handled, what comes next.</p> <p>I started working from home on Tuesday, March 10th. Emma's general health and immune system are not good. My parents, who live nearby, are now both 80 years old and neither is in great health. It seemed prudent to minimize my risk of passing something on to any of them. Since then, Stripe has closed most offices, as have many other companies.</p> <p>Ireland has closed schools, banned large gatherings, and is generally trying not to become like Italy. St Patrick's Day parades are cancelled. Most pubs have not yet closed and I wish they would, since they are now a public health risk.</p> <p>If Ireland institutes a full lockdown, we'll move in with my parents for the duration—assuming that none of us are showing signs of COVID-19.</p> <p>Seattle has been ground zero for coronavirus in the US. All of the initial deaths were there. Some hospitals are already overwhelmed, and I'm sure others will be as the number of cases rises exponentially. Governor Inslee and other state and city leaders have been doing a good job of managing the crisis.</p> <p>I wish I could say the same about the United States as a whole. It was <em>always</em> obvious that Trump was wholly unfit to be president, but it's never been more clear over the past month. I fear for my adopted country.</p> <p>Anyway, we're heading off to my parents soon, for a subdued birthday celebration.</p> Sun, 15 Mar 2020 07:00:00 GMT tag:www.georgevreilly.com,2020-03-15:/blog/2020/03/15/WeirdestBirthdayEver.html Dublin for 2020 https://www.georgevreilly.com/blog/2019/11/22/Dublin2020.html <a class="reference external image-reference" href="https://www.irishcalendars.ie/products/dublin-calendar"><img alt="Dublin Calendar 2020" src="https://www.georgevreilly.com/content/binary/dublin-calendar-2020.jpg"/></a> <p>I left in the Eighties; I'm going back in the Twenties.</p> <p>I am transferring to a Dublin-based team at <a class="reference external" href="https://stripe.com/">Stripe</a> for a one-year rotation. Emma and I will be moving to Dublin just before Christmas. Emma has never lived in Ireland and I haven't lived there since January 1989. After 30 years in the US, I'm about to spend a year in my hometown.</p> <p>I grew up in Dublin, earned a Bachelor's degree in Computer Science at Trinity College Dublin in 1987, and moved to the US in 1989 to get a Master's degree in Comp Sci at Brown University in Providence, RI. Microsoft moved me to Seattle, WA in 1992, where I've lived ever since. Between 1992 and 2005, I worked at Microsoft three times for a total of ten years. I joined Stripe in Seattle in mid-2018, after eleven years at two startups, Cozi and Cookbrite/MetaBrite.</p> <p>Emma and I met in 1997, and married and bought a house in 2000. We are now frantically trying to get that house into shape to rent out during our year's absence. We've been working all-out for several weeks and now there's only four weeks to go. She carried the burden by herself last week, as I spent the week in Dublin, meeting my new team. I had done so much painting of interior rooms before I went to Dublin two weeks ago, that the fingerprint readers on my laptop and phones were rejecting my fingerprint. My fingerprint works again, but there's still a little painting to be done.</p> <p>We're excited and nervous. It's not easy to pick up and move after spending decades in one city. We have plenty of storage in our Seattle house and had little impetus—until now—to shed stuff. And there's lots and lots of stuff. Thousands upon thousands of books, the impedimenta of various hobbies, and twenty years of odds and ends. We've purged a lot but the end is not yet in sight. I'll be spending the second week of December in the San Jose and San Francisco, so I'll need to spend Thanksgiving working on the house.</p> <p>Stripe has been great in providing relocation assistance to us, providing services in both Seattle and Dublin. I lined up an apartment in Dublin the day before I returned. We'll be living at the top of a Georgian house on Adelaide Road. It's not too small, but it'll feel cramped compared to our Seattle house.</p> <p>A few weeks ago, I moved from an infrastructure team in the Stripe Seattle office to a new security team (Anti-Abuse) in the Stripe Dublin office. Not only is the team new, all of the team members are new to Stripe: most joined in August and the longest-tenured joined in March. They've already accomplished quite a bit, but they've left a few things for me to do.</p> <p>My family are excited too. I have two siblings living in Ireland, but they're both over two hours' drive from my parents' home in Dublin. My sister's in rural Cork and one of my brothers is in rural Mayo. We're all going down to Cork for a few days at Christmas.</p> <p>Ireland, here we come!</p> Fri, 22 Nov 2019 07:00:00 GMT tag:www.georgevreilly.com,2019-11-22:/blog/2019/11/22/Dublin2020.html A Use for Octal: Calculating Modulo 36 from Modulo 9 https://www.georgevreilly.com/blog/2019/09/15/use-for-octal.html <!-- --> <blockquote> (I posted an <a class="reference external" href="https://weblogs.asp.net/george_v_reilly/284388">earlier version</a> of this in December 2004 on my old technical blog. A discussion at work last week about 36-bit computers at the <a class="reference external" href="https://livingcomputers.org/">Living Computers Museum</a> prompted me to write an updated post with improved explanations and much better typography.)</blockquote> <p>I've been programming in C since 1985 and C++ since 1991, but I've never found a use for <a class="reference external" href="https://en.wikipedia.org/wiki/Octal">octal</a> representation until [2004], aside from the permissions argument for <a class="reference external" href="http://en.wikipedia.org/wiki/Chmod">chmod</a>. Octal has always seemed as vestigial as a human appendix, a leftover from the early days of computers, when <a class="reference external" href="https://en.wikipedia.org/wiki/Word_(computer_architecture)">word sizes</a> were often a multiple of three: 6-, 12-, 24-, or 36-bits wide. All modern computers use word sizes that are powers of two—16-, 32-, or 64-bits wide—with 8-bit bytes, so octal is less useful than hex, which evenly subdivides bytes and words. I've done a lot of bit twiddling and hexadecimal has always been indispensable, while octal has remained a curiosity.</p> <p>The other day [in 2004], a mathematician friend described to me a problem that he had solved at a previous company. They were designing hardware that emulated some old <a class="reference external" href="https://retrocomputing.stackexchange.com/questions/11801/what-was-the-rationale-behind-36-bit-computer-architectures">36-bit computers</a>. For backward compatibility, the various shift instructions had to accept an arbitrarily large shift count, <span class="formula"><i>k</i></span>, and shift left or right by <span class="formula">(<i>k</i><span class="textrm"> mod </span>36)</span>. Now, divisions are not cheap to implement in hardware, so they needed to come up with an alternate approach to calculate the modulus.</p> <p>My friend tried to do something with the factors of 36: <span class="formula">4×9</span>. Four and nine are <a class="reference external" href="https://artofproblemsolving.com/wiki/index.php/Relatively_prime">relatively prime</a>: they have no common factors other than one. By the <a class="reference external" href="https://medium.com/@astartekraus/the-chinese-remainder-theorem-ea110f48248c">Chinese Remainder Theorem</a> therefore, the combination of <span class="formula"><i>k</i><span class="textrm"> mod </span>4</span> and <span class="formula"><i>k</i><span class="textrm"> mod </span>9</span> is enough to uniquely determine <span class="formula"><i>k</i><span class="textrm"> mod </span>36</span>. By inspection, this is true for the following table of “residues”. All the integers in the range <span class="formula">[0, 36)</span> appear exactly once.</p> <table border="1" class="docutils align-right"> <colgroup> <col width="18%"/> <col width="9%"/> <col width="9%"/> <col width="9%"/> <col width="9%"/> <col width="9%"/> <col width="9%"/> <col width="9%"/> <col width="9%"/> <col width="9%"/> </colgroup> <thead valign="bottom"> <tr><th class="head">4 \ 9</th> <th class="head">0</th> <th class="head">1</th> <th class="head">2</th> <th class="head">3</th> <th class="head">4</th> <th class="head">5</th> <th class="head">6</th> <th class="head">7</th> <th class="head">8</th> </tr> </thead> <tbody valign="top"> <tr><td>0</td> <td>0</td> <td>28</td> <td>20</td> <td>12</td> <td>4</td> <td>32</td> <td>24</td> <td>16</td> <td>8</td> </tr> <tr><td>1</td> <td>9</td> <td>1</td> <td>29</td> <td>21</td> <td>13</td> <td>5</td> <td>33</td> <td>25</td> <td>17</td> </tr> <tr><td>2</td> <td>18</td> <td>10</td> <td>2</td> <td>30</td> <td>22</td> <td>14</td> <td>6</td> <td>34</td> <td>26</td> </tr> <tr><td>3</td> <td>27</td> <td>19</td> <td>11</td> <td>3</td> <td>31</td> <td>23</td> <td>15</td> <td>7</td> <td>35</td> </tr> </tbody> </table> <p>Calculating <span class="formula"><i>k</i><span class="textrm"> mod </span>4</span> is easy in hardware: it's the two least-significant bits.</p> <p>How to calculate <span class="formula"><i>k</i><span class="textrm"> mod </span>9</span> in hardware is not so obvious.</p> <div class="section" id="shifting-and-masking"> <h3>Shifting and Masking</h3> <p>Several programming languages now provide a <tt class="docutils literal">0b</tt> prefix for binary literals to go along with the <tt class="docutils literal">0x</tt> prefix for hex literals and the <tt class="docutils literal">0o</tt> prefix for octal literals. (Older languages, such as C, use a <tt class="docutils literal">0</tt> prefix for octal and have no <tt class="docutils literal">0b</tt> prefix.) See the discussion in <a class="reference external" href="https://github.com/golang/proposal/blob/master/design/19308-number-literals.md">Go number literals</a> for more detail on <tt class="docutils literal">0b</tt>, including a list of languages that now support this notation.</p> <p><span class="formula">2<sup><i>n</i></sup></span>, written in binary, looks like <tt class="docutils literal">1</tt> followed by <span class="formula"><i>n</i></span>&nbsp;<tt class="docutils literal">0</tt>s. For example, <span class="formula">2<sup>3</sup> = 1000<sub>2</sub></span>. In C-like languages, <span class="formula">2<sup><i>n</i></sup></span> can be written as <tt class="docutils literal">1 &lt;&lt; n</tt>.</p> <p>Similarly, <span class="formula">2<sup><i>n</i></sup> − 1</span>, <tt class="docutils literal">(1 &lt;&lt; n) - 1</tt>, written in binary, looks like <span class="formula"><i>n</i></span>&nbsp;<tt class="docutils literal">1</tt>s. For example, <span class="formula">2<sup>5</sup> − 1 = 31<sub>10</sub> = 11111<sub>2</sub></span>.</p> <p>We can <strong>multiply</strong> an unsigned integer, <tt class="docutils literal">u</tt>, by <span class="formula">2<sup><i>n</i></sup></span> by <strong>shifting</strong> <tt class="docutils literal">u</tt> <strong>left</strong> by <span class="formula"><i>n</i></span> bits, <tt class="docutils literal">u &lt;&lt; n</tt>, introducing <span class="formula"><i>n</i></span>&nbsp;zeroes as the low-order bits. For example, using 8-bit numbers without loss of generality, written as modern Go/Rust number literals:</p> <pre class="code rust literal-block"> <span class="mb">0b_0001_0101</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mb">0b_1010_1000</span> </pre> <p>Similarly, we can <strong>divide</strong> <tt class="docutils literal">u</tt> by <span class="formula">2<sup><i>n</i></sup></span> by <strong>shifting</strong> <tt class="docutils literal">u</tt> <strong>right</strong> by <span class="formula"><i>n</i></span>&nbsp;bits, <tt class="docutils literal">u &gt;&gt;&gt; n</tt>, which drops the <span class="formula"><i>n</i></span>&nbsp;low-order bits and introduces <span class="formula"><i>n</i></span>&nbsp;zeroes as the high-order bits.</p> <pre class="code rust literal-block"> <span class="mb">0b_0101_0110</span><span class="w"> </span><span class="o">&gt;&gt;&gt;</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mb">0b_0000_1010</span> </pre> <p>A sign-extending or arithmetic right shift introduces <span class="formula"><i>n</i></span>&nbsp;copies of the sign bit as the high-order bits. In some languages, such as Java and JavaScript, <tt class="docutils literal">&gt;&gt;</tt>&nbsp;means an arithmetic right shift and <tt class="docutils literal">&gt;&gt;&gt;</tt>&nbsp;means a zero-extending right shift. In other languages, including C, C++, and Go, there is only a <tt class="docutils literal">&gt;&gt;</tt> operator and sign-extension generally depends upon the type of the left operand, <tt class="docutils literal">signed</tt> or <tt class="docutils literal">unsigned</tt>. However, sign extension is not guaranteed in C/C++.</p> <pre class="code rust literal-block"> <span class="mb">0b_0101_0110</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mb">0b_0001_0101</span><span class="w"> </span><span class="mb">0b_1001_0110</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mb">0b_1110_0101</span> </pre> <p>Finally, we can find the <strong>remainder</strong> of dividing <tt class="docutils literal">u</tt> by <span class="formula">2<sup><i>n</i></sup></span> by <strong>masking</strong> <tt class="docutils literal">u</tt> with <span class="formula">2<sup><i>n</i></sup> − 1</span>, that is, <strong>bitwise-and</strong> with <tt class="docutils literal">(1 &lt;&lt; n) - 1</tt>, to extract the <span class="formula"><i>n</i></span>&nbsp;<strong>low-order bits</strong>:</p> <pre class="code rust literal-block"> <span class="mb">0b_0101_0110</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mb">0b_0000_0111</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mb">0b_0000_0110</span> </pre> <p>In other words, <tt class="docutils literal">u % 8 == u &amp; 7</tt> and <tt class="docutils literal">u / 8 == u &gt;&gt; 3</tt>.</p> <p>Read <a class="reference external" href="https://en.wikipedia.org/wiki/Bitwise_operation">bitwise operations</a> for more background.</p> </div> <div class="section" id="casting-out-nines"> <h3>Casting Out Nines</h3> <p>There's an old trick for checking the results of arithmetic operations, known as <a class="reference external" href="http://mathworld.wolfram.com/CastingOutNines.html">casting out nines</a> or <a class="reference external" href="http://web.archive.org/web/20060101140519/http://web.mit.edu/mwpstr/www/dropnine.htm">dropping nines</a>.</p> <p>Add up the decimal digits of each number. Apply the arithmetic operation to these digit sums. They should be <a class="reference external" href="https://en.wikipedia.org/wiki/Congruence_relation">congruent</a>, modulo 9.</p> <p>For example, <span class="formula">12, 345×8, 765 = 108, 203, 925</span>.</p> <p>To check the multiplication, compute the <a class="reference external" href="https://en.wikipedia.org/wiki/Digit_sum">digit sum</a> of each number, by adding up each decimal digit:</p> <div class="line-block"> <div class="line"><span class="formula">1 + 2 + 3 + 4 + 5 = 15 ≡ 6 (<span class="textrm"> mod </span>9)</span></div> <div class="line">Note: <span class="formula">12, 345<span class="textrm">  mod </span>9 = 6</span></div> </div> <p>and</p> <div class="line-block"> <div class="line"><span class="formula">8 + 7 + 6 + 5 = 26 ≡ 8 (<span class="textrm"> mod </span>9)</span></div> <div class="line">Note: <span class="formula">8, 765<span class="textrm">  mod </span>9 = 8</span></div> </div> <p>Take the first two digit sums, modulo 9, and multiply them:</p> <div class="line-block"> <div class="line"><span class="formula">6×8 = 48 ≡ 3 (<span class="textrm"> mod </span>9)</span></div> <div class="line">Note: <span class="formula">15×26 = 390 ≡ 3 (<span class="textrm"> mod </span>9)</span></div> </div> <p>Check against the sum of the digits of the product:</p> <div class="line-block"> <div class="line"><span class="formula">1 + 0 + 8 + 2 + 0 + 3 + 9 + 2 + 5 = 30 ≡ 3 (<span class="textrm"> mod </span>9)</span></div> <div class="line">Note: <span class="formula">108, 203, 925<span class="textrm">  mod </span>9 = 3</span></div> </div> <p>This works because <span class="formula">10<sup><i>n</i></sup> ≡ 1 (<span class="textrm"> mod </span>9)</span>.</p> <p>Consider 758:</p> <div class="formula"> 758 = 7×100 + 5×10 + 8 </div> <div class="formula"> 758 = 7×(9 + 1)×(9 + 1) + 5×(9 + 1) + 8 </div> <div class="formula"> 758 = 7×(9<sup>2</sup> + 2×9 + 1) + 5×(9 + 1) + 8 </div> <p>Dropping the nines from each term leaves the digit sum, which is <em>congruent</em> to the original number modulo nine:</p> <div class="formula"> 7×1 + 5×1 + 8 = 7 + 5 + 8 = 20 ≡ 2 (<span class="textrm"> mod </span>9) </div> <p>Checking: <span class="formula">758<span class="textrm">  mod </span>9 = 2</span>.</p> <p><a class="reference external" href="https://www.math.nyu.edu/faculty/hausner/congruence.pdf">Congruences</a> have a number of useful properties.</p> </div> <div class="section" id="casting-out-elevens"> <h3>Casting Out Elevens</h3> <p>Let's use 11, instead of 9. Since <span class="formula">10 = 11 − 1</span>, then <span class="formula">10<sup><i>n</i></sup> ≡  − 1<sup><i>n</i></sup> (<span class="textrm">mod </span>11)</span>.</p> <p>Consider 5234:</p> <div class="formula"> 5234 = 5×10<sup>3</sup> + 2×10<sup>2</sup> + 3×10<sup>1</sup> + 4×10<sup>0</sup> </div> <div class="formula"> 5234 = 5×(11 − 1)×(11 − 1)×(11 − 1) + 2×(11 − 1)×(11 − 1) + 3×(11 − 1) + 4 </div> <div class="formula"> 5234 = 5×(11<sup>3</sup> − 3×11<sup>2</sup>×1 + 3×11×1<sup>2</sup> − 1<sup>3</sup>) + 2×(11<sup>2</sup> − 2×11×1 + 1<sup>2</sup>) + 3×(11 − 1) + 4 </div> <p>Dropping the elevens from each term leaves the alternating digit sum:</p> <div class="formula"> 5× − 1 + 2×1 + 3× − 1 + 4 =  − 5 + 2 − 3 + 4 =  − 2 ≡ 9 (<span class="textrm"> mod </span>11) </div> <p>It's more convenient to proceed rightwards from the least significant digit, <span class="formula">4 − 3 + 2 − 5</span>.</p> <p>Checking: <span class="formula">5234<span class="textrm">  mod </span>11 = 9</span>.</p> <p>To cast out elevens, we calculate the <a class="reference external" href="https://en.wikipedia.org/wiki/Alternating_sum">alternating sum</a> <em>from right to left</em>.</p> <p>Casting out elevens catches some <a class="reference external" href="http://mathyear2013.blogspot.com/2013/01/casting-out-elevens.html">transposition errors</a>, unlike casting out nines. For more, see <a class="reference external" href="https://artofproblemsolving.com/wiki/index.php/Divisibility_rules/Rule_for_11_proof">divisibility rule for 11</a> and <a class="reference external" href="https://en.wikipedia.org/wiki/Divisibility_rule#Proof_using_basic_algebra">proof for alternating sum</a>.</p> </div> <div class="section" id="modulo-9"> <h3>Modulo 9</h3> <p>At last, we turn to base 8, octal. Nine bears the same relationship to eight in octal, as eleven does to ten in decimal: <span class="formula">9<sub>10</sub> = 11<sub>8</sub></span>, base plus one, and <span class="formula">8<sup><i>n</i></sup> ≡  − 1<sup><i>n</i></sup> (<span class="textrm">mod </span>9)</span>.</p> <p>We can calculate <span class="formula"><i>k</i><span class="textrm"> mod </span>9</span> in base 8 by alternately adding and subtracting the octal digits, from right to left. For example, <span class="formula">1234<sub>8</sub><span class="textrm">  mod </span>9 = 4 − 3 + 2 − 1 = 2</span>. This gives the right answer.</p> <p>Here's a simple, albeit incomplete, algorithm in Go. We're masking and shifting three bits at a time, which is tantamount to working with the octal representation of <tt class="docutils literal">k</tt>.</p> <pre class="code go literal-block"> <span class="kd">func</span><span class="w"> </span><span class="nx">Mod9</span><span class="p">(</span><span class="nx">k</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nx">sign</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">+</span><span class="mi">1</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">k</span><span class="p">;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">t</span><span class="p">;</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">&gt;&gt;=</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mi">7</span><span class="p">)</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">sign</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="nx">sign</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">-</span><span class="nx">sign</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nx">m</span><span class="p">)</span><span class="w"> </span><span class="p">}</span> </pre> <p>What about <span class="formula">617<sub>8</sub></span>?</p> <div class="formula"> 7 − 1 + 6 = 12 ≡ 3 (<span class="textrm"> mod </span>9) </div> <div class="formula"> 617<sub>8</sub><span class="textrm">  mod </span>9 = 3 </div> <p>And <span class="formula">6172<sub>8</sub></span>?</p> <div class="formula"> 2 − 7 + 1 − 6 =  − 10 ≡ 8 (<span class="textrm"> mod </span>9) </div> <div class="formula"> 6172<sub>8</sub><span class="textrm">  mod </span>9 = 8 </div> <p>Almost there!</p> <blockquote> Casting out “octal-elevens” (<span class="formula">11<sub>8</sub> = 9<sub>10</sub></span>) in octal, by an alternating sum of the base-eight digits, computes a small number <em>congruent</em> to the original number number modulo nine.</blockquote> <p>The algorithm above is calculating numbers that are congruent to the correct answer modulo nine, but which may be outside the desired range. If the intermediate sum dips below zero or rises above eight, we have to add nine or subtract nine respectively to keep the running total in the range <span class="formula">[0, 9)</span>.</p> <p>Here's a complete algorithm for Modulo 9 in Go, computing the alternating sum of the octal digits:</p> <pre class="code go literal-block"> <span class="kd">func</span><span class="w"> </span><span class="nx">Mod9</span><span class="p">(</span><span class="nx">k</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">negative</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">k</span><span class="p">;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">t</span><span class="p">;</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">&gt;&gt;=</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mi">7</span><span class="p">)</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">negative</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">9</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">9</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">9</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1">// assert(0 &lt;= m &amp;&amp; m &lt; 9)</span><span class="w"> </span><span class="nx">negative</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">!</span><span class="nx">negative</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nx">m</span><span class="p">)</span><span class="w"> </span><span class="p">}</span> </pre> <p>Clearly, this algorithm can be implemented in much simpler circuitry than that required to compute a remainder through full-blown division.</p> </div> <div class="section" id="modulo-36"> <h3>Modulo 36</h3> <p>We now have enough to calculate <span class="formula"><i>k</i><span class="textrm"> mod </span>36</span> from <tt class="docutils literal">Mod9</tt> and the Chinese Remainder Theorem:</p> <pre class="code go literal-block"> <span class="kd">func</span><span class="w"> </span><span class="nx">Mod36</span><span class="p">(</span><span class="nx">k</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">Residues</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">9</span><span class="p">]</span><span class="kt">uint</span><span class="p">{</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">28</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">,</span><span class="w"> </span><span class="mi">12</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">32</span><span class="p">,</span><span class="w"> </span><span class="mi">24</span><span class="p">,</span><span class="w"> </span><span class="mi">16</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="mi">9</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">29</span><span class="p">,</span><span class="w"> </span><span class="mi">21</span><span class="p">,</span><span class="w"> </span><span class="mi">13</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">33</span><span class="p">,</span><span class="w"> </span><span class="mi">25</span><span class="p">,</span><span class="w"> </span><span class="mi">17</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="mi">18</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">30</span><span class="p">,</span><span class="w"> </span><span class="mi">22</span><span class="p">,</span><span class="w"> </span><span class="mi">14</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">,</span><span class="w"> </span><span class="mi">34</span><span class="p">,</span><span class="w"> </span><span class="mi">26</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="mi">27</span><span class="p">,</span><span class="w"> </span><span class="mi">19</span><span class="p">,</span><span class="w"> </span><span class="mi">11</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">31</span><span class="p">,</span><span class="w"> </span><span class="mi">23</span><span class="p">,</span><span class="w"> </span><span class="mi">15</span><span class="p">,</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="mi">35</span><span class="p">},</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">Residues</span><span class="p">[</span><span class="nx">k</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mi">3</span><span class="p">][</span><span class="nx">Mod9</span><span class="p">(</span><span class="nx">k</span><span class="p">)]</span><span class="w"> </span><span class="p">}</span> </pre> <p>My friend says that he later learned that similar tricks were used in classic 36-bit hardware.</p> <p>I looked everywhere I could think of to see if I could find this algorithm to calculate modulo 9 described. I found something that hinted at it in Knuth's <a class="reference external" href="http://www-cs-faculty.stanford.edu/~knuth/taocp.html">Seminumerical Algorithms</a>, §4.4.C, discussing <a class="reference external" href="https://books.google.com/books?id=Zu-HAwAAQBAJ&amp;pg=PT532&amp;lpg=PT532&amp;dq=octal+cast+out+nines+modulo+36&amp;source=bl&amp;ots=9nglVlTuaU&amp;sig=ACfU3U0_RR51okwrvfY3WwC0xBudfLGhuw&amp;hl=en&amp;sa=X&amp;ved=2ahUKEwih44eUxc_kAhVVo54KHcgKDeEQ6AEwDXoECAgQAg#v=onepage&amp;q=octal%20cast%20out%20nines%20modulo%2036&amp;f=false">converting octal integers to decimal</a> by hand, where he mentions using casting out nines in octal and in decimal to check the result. There was no mention of it in Warren's marvelous <a class="reference external" href="http://www.informit.com/articles/article.asp?p=28678">Hacker's Delight</a> or in <a class="reference external" href="http://home.pipeline.com/~hbaker1/hakmem/hakmem.html">HAKMEM</a>.</p> <p>I tried to come up with an analytic way to calculate the elements of the <span class="formula">9<i>x</i>4</span> table. The best that I found is <span class="formula">(72 − 8×(<i>k</i><span class="textrm"> mod </span>9) + 9×(<i>k</i><span class="textrm"> mod </span>4))<span class="textrm">  mod </span>36</span>! The inner expression yields a number in the range <span class="formula">[0, 99]</span>, which can be reduced to <span class="formula">[0, 36)</span> by subtracting 36 at most twice. From <a class="reference external" href="http://www-cs-faculty.stanford.edu/~knuth/gkp.html">Concrete Mathematics</a>, mod 36 can be derived from mod 4 and mod 9 by looking at the [0][1] and [1][0] elements of the table: <span class="formula">(9×(<i>k</i><span class="textrm"> mod </span>4) + 28×(<i>k</i><span class="textrm"> mod </span>9))<span class="textrm">  mod </span>36</span>. It works, but it's even worse. A table lookup is clearly more efficient.</p> <p>Most, if not all, of the computer architectures designed in the last forty years use a word size that is a power of two. Useful relationships like shifting and masking are one big reason why non-power-of-two word sizes have gone out of fashion.</p> <p>Another big reason is the success of C and Unix, which have a bias towards 8-bit bytes. <a class="reference external" href="http://www.parashift.com/c++-faq-lite/intrinsic-types.html">C doesn't require 8-bit bytes</a>, but there's a lot of software which tacitly assumes that <tt class="docutils literal">char</tt> has exactly 8 bits.</p> <p>On systems with 9-bit bytes, like the 36-bit computers, octal is useful, since a 9-bit byte can hold all values up to <span class="formula">777<sub>8</sub></span> and the word size is a multiple of three.</p> <p>And there you have it: an unexpected use for octal notation. It's not exactly an important use, but then 36-bit computers aren't exactly important any more either.</p> </div> Sun, 15 Sep 2019 07:00:00 GMT tag:www.georgevreilly.com,2019-09-15:/blog/2019/09/15/use-for-octal.html Decrypting Blackbox secrets at build time with Paperkey https://www.georgevreilly.com/blog/2019/09/02/gpg-blackbox-paperkey.html <blockquote> “Security is 1% technology plus 99% following the procedures correctly” — Tom Limoncelli</blockquote> <p>Having dealt with GPG last week at work, I remembered that I had intended to write a blog post about how we used <a class="reference external" href="https://gnupg.org/">GPG</a>, <a class="reference external" href="https://github.com/StackExchange/blackbox">Blackbox</a>, and <a class="reference external" href="http://www.jabberwocky.com/software/paperkey/">Paperkey</a> to store secrets in Git at my <a class="reference external" href="https://www.georgevreilly.com/blog/2018/12/31/2018-review.html">previous job</a>.</p> <p>We used Blackbox to manage secrets that were needed during development, build, deployment, and runtime. These secrets included AWS credentials, Docker registry credentials, our private PyPI credentials, database credentials, and certificates. We wanted these secrets to be under version control, but also to be secure.</p> <p>For example, we had a <tt class="docutils literal">credentials.sh</tt> that exported environment variables, which was managed by Blackbox:</p> <pre class="code bash literal-block"> <span class="c1"># Save current value of xtrace option from $-; disable echoing of executed commands </span><span class="o">{</span> <span class="k">if</span> <span class="nb">echo</span> <span class="nv">$-</span> <span class="p">|</span> grep -q <span class="s2">&quot;x&quot;</span><span class="p">;</span> <span class="k">then</span> <span class="nv">XT</span><span class="o">=</span><span class="s2">&quot;-x&quot;</span><span class="p">;</span> <span class="k">else</span> <span class="nv">XT</span><span class="o">=</span><span class="s2">&quot;+x&quot;</span><span class="p">;</span> <span class="k">fi</span><span class="p">;</span> <span class="nb">set</span> +x<span class="p">;</span> <span class="o">}</span> <span class="m">2</span>&gt;/dev/null <span class="c1"># export environment variables, many containing secrets </span><span class="nb">export</span> <span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span><span class="s1">'...'</span> <span class="nb">export</span> <span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span><span class="s1">'...'</span> <span class="nb">export</span> <span class="nv">PYPI_USER</span><span class="o">=</span><span class="s1">'build'</span> <span class="nb">export</span> <span class="nv">PYPI_PASSWD</span><span class="o">=</span><span class="s1">'...'</span> <span class="nb">export</span> <span class="nv">PIP_INDEX_URL</span><span class="o">=</span><span class="s2">&quot;https://</span><span class="nv">$PYPI_USER</span><span class="s2">:</span><span class="nv">$PYPI_PASSWD</span><span class="s2">&#64;pypi.example.com/pypi/&quot;</span> <span class="c1"># Restore previous value of xtrace option </span><span class="nb">set</span> <span class="nv">$XT</span> </pre> <p>The <tt class="docutils literal">XT</tt> prologue ensures that even if this script is <tt class="docutils literal">source</tt>’d with <a class="reference external" href="https://renenyffenegger.ch/notes/Linux/shell/bash/built-in/set/x">set -x</a> (debug tracing) enabled, that executing this script will not leak secrets into build logs. The epilogue turns the <tt class="docutils literal">xtrace</tt> option back on again if it was on at the start.</p> <p>We used <a class="reference external" href="https://www.lastpass.com/">LastPass</a> to manage personal credentials that were needed in a browser, but it wasn't suitable for automated use in CI.</p> <div class="section" id="how-blackbox-works"> <h3>How Blackbox Works</h3> <p>Blackbox builds on top of <a class="reference external" href="https://gnupg.org/">GNU Privacy Guard</a> (aka GnuPG aka GPG) to automate the secure management of a set of files containing secrets that are “encrypted at rest” and stored in a Version Control System (VCS), such as Git. These registered files are owned collectively by a set of administrators, each of whom has their own separate keypair (a public key and a private key) stored in their own keyrings. The administrators' public keys are also present in Blackbox's keyring, which is stored in the VCS. Using Blackbox's commands, any administrator can decrypt a file containing secrets, update the secrets in the file, encrypt the updated secrets file, and commit that encrypted file into the VCS. Administrators can be removed from a Blackbox installation, after which they will not be able to decrypt the updated secrets files<a class="footnote-reference" href="https://www.georgevreilly.com/blog/2019/09/02/gpg-blackbox-paperkey.html/#revocation" id="footnote-reference-1">[1]</a>.</p> <p>How does Blackbox encrypt a file so that any administrator can decrypt it? It uses GPG to encrypt the file for multiple recipients, say, <a class="reference external" href="https://en.wikipedia.org/wiki/Alice_and_Bob#Cast_of_characters">Alice, Bob, and Carol</a>.</p> <p>When GPG encrypts a file, it:</p> <ul class="simple"> <li>creates a random <em>session key</em> for <a class="reference external" href="https://www.ssl2buy.com/wiki/symmetric-vs-asymmetric-encryption-what-are-differences">symmetric encryption</a></li> <li>writes a header for <em>each recipient</em>, containing:<ul> <li>the ID of the recipient's public key</li> <li>the result of asymmetrically encrypting the session key with the recipient's public key</li> </ul> </li> <li>possibly signs the data</li> <li>compresses the (signed) data</li> <li>symmetrically encrypts the compressed data with the session key</li> <li>writes the encrypted, compressed data</li> </ul> <p>Only the recipients have the private keys (in theory, at least). Therefore, only a recipient can decrypt the encrypted file.</p> <p>To decrypt the file for a recipient, GPG:</p> <ul class="simple"> <li>finds the encrypted session key packet whose keyID matches the recipient's public key</li> <li>decrypts the session key using the recipient's private key</li> <li>decrypts the encrypted, compressed data using the session key</li> <li>decompresses the decrypted data</li> <li>verifies the signature, if present</li> <li>writes the cleartext</li> </ul> <p>This is a hybrid scheme. Symmetric encryption is a lot <a class="reference external" href="https://www.ssl2buy.com/wiki/symmetric-vs-asymmetric-encryption-what-are-differences">faster and more compact</a> than public key/private key asymmetric encryption, so it's used to encrypt the actual data. Furthermore, if the data were entirely encrypted with a recipient's public key, then encrypting for <em>N</em> recipients would mean that the size of the result would be proportional to the (number of recipients) × (the length of the original data). With the hybrid scheme, the header grows a <a class="reference external" href="https://security.stackexchange.com/questions/8245/gpg-file-size-with-multiple-recipients">few hundred bytes</a> for each recipient but the data is encrypted only once, with faster encryption.</p> <p>Blackbox encrypts a registered file with all of the administrators as the recipients, so any administrator can decrypt the file.</p> <div class="figure"> <a class="reference external image-reference" href="http://www.cse.tkk.fi/fi/opinnot/T-110.5240/2009/luennot-files/Lecture%202.pdf"><img alt="Typical PGP Message" src="https://www.georgevreilly.com/content/binary/typical-pgp-message.jpg"/></a> <p class="caption">Typical PGP Message</p> <div class="legend"> (Figure from <a class="reference external" href="http://www.cse.tkk.fi/fi/opinnot/T-110.5240/2009/luennot-files/Lecture%202.pdf">Network Security: Email Security, PKI</a>, Tuomas Aura)</div> </div> <p>You can use <tt class="docutils literal">gpg <span class="pre">--list-packets</span></tt> to dump the contents of any GPG message. <a class="reference external" href="https://begriffs.com/posts/2016-11-05-advanced-intro-gnupg.html">An Advanced Intro to GnuPG</a> dives into the message format in more detail.</p> <p>Going back to my original example, <tt class="docutils literal">credentials.sh</tt> is a file registered in <tt class="docutils literal"><span class="pre">blackbox-files.txt</span></tt>. This file should never be committed to the VCS—add it to <a class="reference external" href="https://git-scm.com/docs/gitignore">gitignore</a> to prevent accidentally committing it. Instead, <tt class="docutils literal">credentials.sh.gpg</tt> is committed. Since the latter is a binary file, comparing two versions in cleartext is tricky.</p> <table class="docutils footnote" frame="void" id="revocation" rules="none"> <colgroup><col class="label"/><col /></colgroup> <tbody valign="top"> <tr><td class="label"><a class="fn-backref" href="https://www.georgevreilly.com/blog/2019/09/02/gpg-blackbox-paperkey.html/#footnote-reference-1">[1]</a></td><td>If they have a snapshot of the VCS before their access was revoked, they will still be able to decrypt the secrets as they were then. In principle, you should be changing passwords and certificates every time someone's access is revoked.</td></tr> </tbody> </table> </div> <div class="section" id="private-keys-and-paperkey"> <h3>Private Keys and Paperkey</h3> <p>Administrators can encrypt and decrypt Blackbox'd files because they have their private key on a local keyring.</p> <p>Getting a private key onto other hosts can be tricky. We developed this technique when we were using Atlassian's hosted Bamboo CI service. We later used it with hosted Jenkins at Cloudbees. Because we were using a hosted Continuous Integration (CI) service, we had limited control over what we could install. If I remember correctly, Bamboo had support for secret environment variables, but did not provide a way to store a keyring file. There was also a limit on the length of the environment variables, I believe.</p> <p>We were able to get past this by using <a class="reference external" href="http://www.jabberwocky.com/software/paperkey/">Paperkey</a> to (de)serialize the secret key. Paperkey can extract just the secret part of a secret key: ‘Due to metadata and redundancy, OpenPGP secret keys are significantly larger than just the &quot;secret bits&quot;. In fact, the secret key contains a complete copy of the public key.’</p> <p>We created a keypair for the CI on a secure host, serialized the secret with Paperkey, and pasted the secret into the CI's UI to become an environment variable. At build time, we used Paperkey on the CI box to deserialize the secret key from the environment variable, before decrypting the secrets needed with Blackbox.</p> <p>To create the CI keypair, follow the portion of the <a class="reference external" href="https://github.com/StackExchange/blackbox#set-up-automated-users-or-role-accounts">Blackbox &quot;role accounts&quot; instructions</a> that create a sub-key with no password for <tt class="docutils literal">ci&#64;example.com</tt>.</p> <p>Then, serialize the public key and the secret with Paperkey:</p> <pre class="code bash literal-block"> <span class="nb">cd</span> /tmp/NEWMASTER gpg --homedir . --export ci&#64;example.com <span class="se">\ </span> <span class="p">|</span> base64 &gt; public_key.txt gpg --homedir . --export-secret-keys ci&#64;example.com <span class="se">\ </span> <span class="p">|</span> paperkey --output-type<span class="o">=</span>raw <span class="se">\ </span> <span class="p">|</span> base64 &gt; secret.txt </pre> <p>Copy and paste the contents of <tt class="docutils literal">public_key.txt</tt> to the <tt class="docutils literal">GPG_PUBLIC_KEY</tt> environment variable in the CI. Similarly, copy <tt class="docutils literal">secret.txt</tt> to <tt class="docutils literal">GPG_SECRET</tt>.</p> <p>Securely delete everything in <tt class="docutils literal">/tmp/NEWMASTER</tt>.</p> <p>We used a script like this on the CI to reconstitute the keypair and to decrypt the other secrets from Blackbox:</p> <pre class="code bash literal-block"> <span class="ch">#!/usr/bin/env bash </span> <span class="c1"># Run during a CI build to decrypt all Blackbox-encrypted files in this repo. # Can also be used interactively. </span> <span class="nb">set</span> -ex <span class="c1"># Root of Git working tree </span><span class="nv">SERVICES_DIR</span><span class="o">=</span><span class="s2">&quot;</span><span class="k">$(</span><span class="nb">cd</span> <span class="s2">&quot;</span><span class="k">$(</span>dirname <span class="s2">&quot;</span><span class="nv">$0</span><span class="s2">&quot;</span><span class="k">)</span><span class="s2">/..&quot;</span><span class="p">;</span> <span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span> <span class="k">if</span> <span class="o">[</span> <span class="s2">&quot;</span><span class="nv">$CI_BUILD</span><span class="s2">&quot;</span> <span class="o">=</span> <span class="s2">&quot;true&quot;</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span> <span class="nv">GPG_HOMEDIR</span><span class="o">=</span><span class="s2">&quot;</span><span class="k">$(</span>mktemp -d -t gnupg.XXX<span class="k">)</span><span class="s2">&quot;</span> <span class="nv">SECRET_KEY_FILE</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$GPG_HOMEDIR</span><span class="s2">/secret.key&quot;</span> <span class="nv">PUBLIC_KEY_FILE</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$GPG_HOMEDIR</span><span class="s2">/public_key.gpg&quot;</span> <span class="c1"># this variable is how you can customize how GPG is used in Blackbox </span> <span class="nv">GPG</span><span class="o">=</span><span class="s2">&quot;gpg --homedir=</span><span class="nv">$GPG_HOMEDIR</span><span class="s2">&quot;</span> <span class="c1"># Remove secrets from filesystem on exit. </span> <span class="k">function</span> clean_up <span class="o">{</span> <span class="c1"># TODO: use shred, if available </span> rm -rf <span class="s2">&quot;</span><span class="nv">$GPG_HOMEDIR</span><span class="s2">&quot;</span> <span class="o">}</span> <span class="nb">trap</span> clean_up EXIT<span class="p">;</span> <span class="nb">echo</span> <span class="s2">&quot;Unpacking keys; exiting debug mode to redact...&quot;</span> <span class="nb">set</span> +x <span class="k">if</span> <span class="o">[</span> -z <span class="s2">&quot;</span><span class="nv">$GPG_PUBLIC_KEY</span><span class="s2">&quot;</span> -o -z <span class="s2">&quot;</span><span class="nv">$GPG_SECRET</span><span class="s2">&quot;</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span> <span class="nb">echo</span> <span class="s2">&quot;Missing CI credential env vars for GPG key and secret&quot;</span> <span class="nb">exit</span> <span class="m">1</span> <span class="k">fi</span> <span class="c1"># unpack public key </span> <span class="nb">echo</span> <span class="s2">&quot;</span><span class="nv">$GPG_PUBLIC_KEY</span><span class="s2">&quot;</span> <span class="p">|</span> base64 --decode &gt; <span class="s2">&quot;</span><span class="nv">$PUBLIC_KEY_FILE</span><span class="s2">&quot;</span> <span class="c1"># unpack secret key </span> <span class="nb">echo</span> <span class="s2">&quot;</span><span class="nv">$GPG_SECRET</span><span class="s2">&quot;</span> <span class="p">|</span> base64 --decode &gt; <span class="s2">&quot;</span><span class="nv">$SECRET_KEY_FILE</span><span class="s2">&quot;</span> <span class="nb">echo</span> <span class="s2">&quot;Secrets unpacked...&quot;</span> <span class="nb">set</span> -x <span class="c1"># reconstitute and import full key into $GPG_HOMEDIR </span> paperkey --pubring <span class="s2">&quot;</span><span class="nv">$PUBLIC_KEY_FILE</span><span class="s2">&quot;</span> --secrets <span class="s2">&quot;</span><span class="nv">$SECRET_KEY_FILE</span><span class="s2">&quot;</span> <span class="se">\ </span> <span class="p">|</span> <span class="nv">$GPG</span> --import <span class="c1"># TODO: vendor Blackbox </span> <span class="nv">BLACKBOX_DIR</span><span class="o">=</span><span class="s2">&quot;</span><span class="k">$(</span>mktemp -d -t blackbox.XXX<span class="k">)</span><span class="s2">&quot;</span> <span class="nv">BLACKBOX_BIN</span><span class="o">=</span><span class="nv">$BLACKBOX_DIR</span>/bin <span class="c1"># Shallow clone of Blackbox with most-recent commit only </span> git clone --depth <span class="m">1</span> https://github.com/StackExchange/blackbox.git <span class="nv">$BLACKBOX_DIR</span> <span class="k">else</span> <span class="c1"># So that you only have to enter your password once when running interactively </span> <span class="nb">eval</span> <span class="s2">&quot;</span><span class="k">$(</span>gpg-agent --daemon<span class="k">)</span><span class="s2">&quot;</span> <span class="c1"># No custom GPG_HOMEDIR needed </span> <span class="nv">GPG</span><span class="o">=</span><span class="s2">&quot;gpg&quot;</span> <span class="nv">BLACKBOX_POSTDEPLOY</span><span class="o">=</span><span class="s2">&quot;</span><span class="k">$(</span><span class="nb">command</span> -v blackbox_postdeploy<span class="k">)</span><span class="s2">&quot;</span> <span class="o">||</span> <span class="nv">ret</span><span class="o">=</span><span class="nv">$?</span> <span class="k">if</span> <span class="o">[</span> -n <span class="s2">&quot;</span><span class="nv">$BLACKBOX_POSTDEPLOY</span><span class="s2">&quot;</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span> <span class="c1"># Use the Blackbox that's on the path </span> <span class="nv">BLACKBOX_BIN</span><span class="o">=</span><span class="s2">&quot;</span><span class="k">$(</span>dirname <span class="nv">$BLACKBOX_POSTDEPLOY</span><span class="k">)</span><span class="s2">&quot;</span> <span class="k">else</span> <span class="c1"># Assume Blackbox is checked out in a sibling dir to $SERVICES_DIR </span> <span class="nv">BLACKBOX_BIN</span><span class="o">=</span><span class="s2">&quot;</span><span class="k">$(</span><span class="nb">cd</span> <span class="s2">&quot;</span><span class="nv">$SERVICES_DIR</span><span class="s2">/..&quot;</span><span class="p">;</span> <span class="nb">pwd</span><span class="k">)</span><span class="s2">&quot;</span>/blackbox/bin <span class="k">if</span> <span class="o">[</span> ! -f <span class="s2">&quot;</span><span class="nv">$BLACKBOX_BIN</span><span class="s2">/blackbox_postdeploy&quot;</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span> <span class="nb">echo</span> <span class="s2">&quot;Can't find Blackbox binaries&quot;</span> <span class="nb">exit</span> <span class="m">1</span> <span class="k">fi</span> <span class="k">fi</span> <span class="k">fi</span> <span class="c1"># decrypt secrets in $SERVICES_DIR using custom GPG_HOMEDIR </span><span class="nv">GPG</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$GPG</span><span class="s2">&quot;</span> <span class="nv">$BLACKBOX_BIN</span>/blackbox_postdeploy <span class="c1"># test that decryption worked </span>grep <span class="s1">'congrats!'</span> test_secret.txt </pre> <p>At the end of the build, run <tt class="docutils literal">blackbox_shred_all_files</tt> to destroy any decrypted files.</p> </div> <div class="section" id="more-reading"> <h3>More Reading</h3> <ul class="simple"> <li><a class="reference external" href="https://github.com/StackExchange/blackbox">Blackbox</a></li> <li><a class="reference external" href="http://www.jabberwocky.com/software/paperkey/">Paperkey</a></li> <li><a class="reference external" href="http://www.linux-magazine.com/Online/Features/Protect-your-Documents-with-GPG">Protect your documents with GPG</a></li> <li><a class="reference external" href="https://davesteele.github.io/gpg/2014/09/20/anatomy-of-a-gpg-key/">Anatomy of a GPG Key</a></li> <li><a class="reference external" href="https://alexcabal.com/creating-the-perfect-gpg-keypair">Creating the perfect GPG keypair</a></li> <li><a class="reference external" href="https://www.darkcoding.net/software/how-gpg-works-encrypt/">How GPG works: Encrypt</a></li> <li><a class="reference external" href="https://gist.github.com/chrisroos/1205934">GPG import and export</a></li> <li><a class="reference external" href="https://begriffs.com/posts/2016-11-05-advanced-intro-gnupg.html">An Advanced Intro to GnuPG</a></li> <li><a class="reference external" href="http://www.cse.tkk.fi/fi/opinnot/T-110.5240/2009/luennot-files/Lecture%202.pdf">Network Security: Email Security, PKI</a></li> </ul> </div> Mon, 02 Sep 2019 07:00:00 GMT tag:www.georgevreilly.com,2019-09-02:/blog/2019/09/02/gpg-blackbox-paperkey.html