Some stories from developing puzlink.js

I wrote a new puzzlehunt tool called puzlink.js, which I’ve hosted on my website. I thought about doing a post similar to my cromulence post where I talk about all the experimentation and implementation, and promite it and stuff. But I was lazy so I didn’t. Instead, here’s a few development stories.

Save me from my standard library. I picked up this phrase from Write code that is easy to delete and have loved it since. Things I have written that I wish I didn’t have to: DefaultMap, Bitset, range, product, windows, accumulate, memoize. The largest and most-used util has to be LogNum, which reimplements math in the log semiring. I spent a decent amount of the last two weeks just writing stuff then extracting utils I inlined a lot.

The chi-squared test. When I was in high school, studying statistics was about learning all these different tests and when you’d apply each one. That was when I first learned about the chi-squared test, where you do an elaborate series of procedures to divine some sort of number to compare with another number. I didn’t understand what any of the numbers were, at the time. I never thought about the chi-squared test again until I read the puzlink source code, where it’s used to detect unusual letter distributions. So I had to relearn what it was.

The chi-squared test made a lot more sense once I read about the math. Let’s say that, in the English letter distribution, the letter E has relative frequency $p$ . We then observe $n$ letters, drawn iid from this distribution. The number of times E appears is a random variable, say $X \sim B(n, p)$ . Then $\frac{\left(X - p\right)^2}{p}$ is, in the limit, distributed as $N(0, 1)^2$ . Summing over all letters gives the sum of the squares of standard normals, which is the chi-squared distribution.

Frequentist on the weekdays. Look, Bayesian statistics is cool and all, but I think I’m only Bayesian on the weekends. Why calculate three probabilities when one can do the trick? Instead of having to think about the likelihood, prior, and marginal, I only have to compute the marginal. And often, computing the marginal is tricky in and of itself.

Suppose I have $n$ words, of lengths $\ell_1, \ldots, \ell_n$ , and we’re given $k_1, \ldots, k_{n - 1}$ . What is the probability that, for all $i$ , the last $k_i$ letters of $\ell_i$ match the first $k_i$ letters of $\ell_{i + 1}$ ? Of course, we’d need some information about word distribution, but part of the problem is figuring out what information we can collect that’s either quick to find or reasonable to precompute.

Client-side semantic relatedness. I wanted to implement links of the form “contains a word from (category)”. You can hardcode several categories, but you can’t hardcode all of them, so it’d be nice to have a general solution. And I wanted to be able to do this entirely client-side, with a total data size under 1 MB.

I thought of using word embeddings but fitting things in 1 MB seemed hard. If you have standard 300-dimension vectors of 32-bit floats, that’s around 850 distinct words. You can use hash or quantize to maybe get up to 20k words, but we have at least 50k words we’re potentially interested in. Plus, once you have similar vectors, how do you describe their relationship? I ended up using WordNet, which had ontology data. This gets us checks for words in the same catgory, and in under 500 kB.

Poset of cones under containment. A major idea I stole from Collective.jl was that of features, which are binary properties that a word can have, like “has at least two consonants in a row”. Many of the features I’ve generalized into metrics: in this case, the metric is “maximum number of consonants in a row”, which we can turn into features by requiring “at least two” of the metric, or “exactly three”, and so on. Using metrics lets us reuse computation and remove redundant feature reports: if we know every word has exactly three consonants in a row, we don’t have to report every word has at least two consonants in a row.

I then tried to generalize further, by making metrics multi-dimensional. A two-dimensional metric is something like “has (number) bigrams, each repeating (number) times”. Converting this to a feature is about specifying a region of the space we’re interested in, and our regions happened to be affine cones generated by a subset of the standard basis. Being the nerd I am, I was excited to use some of the math words I learned in college in “real life”. Like, I started thinking about eliminating redundant cones by considering the poset under containment, taking the subposet induced by a subset of the input words, then keeping only the extrema. This worked! But none of the metrics I ended up using were multi-dimensional anyway, so I got rid of all that code.

Comments