Puzzlehunts and institutions

This is my write-up on MIT Mystery Hunt 2024. While it’s aimed at a general audience, I’ll assume you know what a puzzlehunt is and what the MIT Mystery Hunt is; briefly, a puzzlehunt is an event where teams compete to solve a series of puzzles, and the Mystery Hunt is the world’s oldest, largest puzzlehunt. For more details, see my write-ups from 2020, 2021, 2022, 2023.

As an aside: I consider my last four write-ups, the Two hundred puzzles series, to be complete. I mean, four times fifty is two hundred, right? Consider this a spiritual sequel of sorts, mostly because I like thinking about the non-puzzly aspects of puzzlehunts.

Introduction

In this post, I’ll take cues from institutional theory to drive my analysis of the MIT Mystery Hunt and the role that the 2024 Hunt plays in its history. I’m going off the textbook Why Institutions Matter for this post, mostly because I don’t know anything about political science and I couldn’t be bothered to read anything else.

What is an institution? It’s some form of social organization, like marriage, or a government. Depending on who you ask, a university or the stock market could also count. There isn’t a widely-agreed upon definition, and I don’t think I can give one that’ll help. Instead, here’s characteristics that make something more institution-y:

Rules and practices. Governments have written rules called laws, and practices like court trials. These rules and practices constrain the people in the institution: they define what is and isn’t allowed.
Structures of meaning. Stock markets are a way for companies to raise money, and they help ease economic growth. Narratives like these explain why rules exist, and people can identify themselves with these narratives.
Resilience over time. A university’s president may change, but their responsibilities are stable. Roles produce recurring behavior. Institutions have predictable patterns. When they do change, they often change incrementally.
Structures of resources. The assets of people married to each other are often jointly owned and come with expectations on how they’re used. The resource structures empower the people in the institution: they create agency.

In Why Institutions Matter, there’s a chapter dedicated to rules, practices, and narratives, which are the three ways that an institution shapes someone’s behavior. There’s this excellent table in the book that summarizes the difference, and I’ll give an abbreviated and edited version here:

	Rules	Practices	Narratives
Recognition	Formally constructed and recorded	Shown through conduct	Expressed through words
Examples	Constitutions, laws	Ceremonies, meeting behavior	Speeches, tweets
Examples (Hunt)	MIT guidelines, Hunt rules	Writing and solving puzzles	Write-ups, discussions
Enacted through	Writing, interpretation	Consistent rehearsal	Explanation, persuasion
Interconnection	Rules formalize practices	Practices encourage narratives	Narratives justify rules
Sanctioned by	Formal rewards and punishments	Social disapproval, isolation	Credibility, reputation
Tensions	Differences created by rule gaps	Inconsistent expectations	Unpredictable reactions

This is what I’ll focus on in this post: the rules, practices, and narratives around Hunt, and how they interact. I’ll end by looking at the so-called “paradox” of institutions: how can institutions change when they’re so stable?

Rules

All puzzlehunts have some sort of rules:

restrictions on team size (maximum team size, recommended team size),
what answers are (say, strings of alphanumeric characters),
guessing limits (a maximum number of guesses, or timeout windows),
hint release and usage,
disallowing things like hacking, posting spoilers, being on more than one team, etc.

Mystery Hunt has another set of rules that aren’t puzzle-related at all, inherited from being an MIT event. Rules about campus access, health and safety, conduct on-campus, and minor supervision.

Puzzlehunt runners create, enact, and enforce their own rules. Rules are enacted through writing and distributing them before the Hunt, through answering rules questions, and through making announcements. Rules are enforced through the Hunt website, communication from the running team, or outright disqualification. It’s the creation of the rules that’s interesting to me—what’s the process for deciding the rules?

The earliest record I could find for modern-ish Hunt rules is the introduction from 2002. A similar page exists for the next few Hunts: see 2003, 2005, 2006, 2011, and 2012. It probably exists for the years between too, but I can’t find them in the archives. Not sure about 2013 or 2014, but 2014 did mention rules in kickoff slides. Since then, Hunt websites have either had an FAQ, as in 2015, 2016, 2018, 2021, and 2022, or a rules page like 2017, 2019, 2020, 2023, and 2024. Some notes:

From 2002 to 2012, it’s mentioned puzzles have a “word or short phrase” as answers.
From 2002 to 2005, teams called the running team to verify their answers. Hence the phrase call in an answer.
Since 2005, the running team referred to themselves as headquarters or HQ.
From 2006 to 2010, instead of teams calling HQ to verify answers, they instead sent a request for HQ to call them, a callback.
From 2011 to 2019, instead of saying or spelling the answer over the phone, teams type the answer in a form.
In 2011, there’s the first mechanism for getting and using free answers.
In 2012, there’s the first rule about throttling automatic website requests.
In 2015, there’s the first formal hint system, where teams can ask yes/no questions on puzzles.
Since 2020, instead of teams getting a call from HQ verifying their answers, they get feedback from the website.
Since 2020, there’s a formal hint system, where teams can ask for freeform hints.
In 2020, there’s the first mechanism for getting and using free unlocks.
In 2021, it’s mentioned answers are “converted to uppercase and stripped of alphanumeric characters.”
Since 2022, there’s no mention of how answers are checked.

The Hunt doesn’t have a set of meta-rules that dictate how rules are created or changed, unlike how a government might have a constitution. The closest thing would be the Puzzle Club constitution. It mentions:

The Implementation Team is responsible for the actual design and implementation of the current year’s Hunt. Under normal circumstances, the members of the Implementation Team are taken from the members of the winning team in the previous year’s Hunt. The definition of abnormal circumstances shall be decided by a vote of the Ordinary Members. Under normal circumstances the Implementation Team is to be given complete autonomy in the construction of the Hunt.

Note that this is the only rule that states who writes the Hunt each year. While it’s been a longstanding practice for the team that wins the Hunt to write the next one, it’s only when the constitution was written (in around 2017 or so) that this practice became a formal rule. This is in line with institutional theory: many rules formalize existing practices, rather than create them.

The rules themselves are formal, but the process in which they’re written is an informal practice. There’s an unwritten expectation that teams keep the rules structure similar. Because it’s a practice, there’s no formal way that it’s enforced. Every change that the Hunt authors make to the rules comes with the potential for social disapproval, which is how practices are sanctioned. In 2020, the decision to remove HQ calls for answer verification was met with some social resistance; see the discussion on Puzzlvaria.

It’s interesting to note what’s not in the rules. Since 2015, there’s no mention that an answer is a “word or short phrase”. Indeed, in 2020, there was a round whose answers were all emoji; in 2023, there was a round whose answers were… not words or short phrases. Or consider how MIT rules require the Hunt to distribute first aid kits and namebadges, but doesn’t specify their contents. In 2014, the first aid kit was used for a puzzle; this year, the namebadges were used for a puzzle. Gaps in the rules are a way that rules can empower people, rather than constrain them.

Writing and distributing rules forms only part of enacting them; the rest is how these rules are interpreted. While the rules for Hunt in 2024 were unchanged from the rules in 2023, there were situations that the writing teams could’ve interpreted differently. Consider how this year, the Hunt website was down for the first few hours of the Hunt. After a few hours, TTBNL, the team running this year’s Hunt, responded by releasing some puzzles through Google Docs, and making a Google Form where teams could submit answers and get HQ calls for verification. Maybe teammate, who ran the 2023 Hunt, would’ve done a similar thing. Maybe not. There’s no rules dictating how the writing team should respond here, a gap that leads to tension when it’s not discussed by those in power. TTBNL handled the issue with grace, but I can imagine a dozen ways it could’ve been worse.

Practices

What puzzlehunters call convention would be classified as practices under our rules, practices, and narratives framework. In particular, what Hunt puzzles are, how they’re written, and how they’re solved, are almost entirely decided by practice. When we discussed rules, the only rules that constrained puzzles are hunt-wide rules on safety, and that puzzles had answers. Despite the limited formal restrictions, there’s grown to be a puzzlehunt canon, as described in, say, Introduction to Puzzlehunts. Puzzlehunts are expected to have clearly marked metapuzzles, puzzles which combine other puzzle answers. The strangest practice is extraction, techniques used to get an answer out of data; the linked post mentions first letters, indexing, ordering, encoding, recursion, and others.

In Why Institutions Matter, Lowndes and Roberts write that practices are enacted through consistent rehearsal. Indeed, these practices developed over time. Consider the 1997 Hunt. Puzzle 17 tells you that the FREE spaces spell out the answer word, something that’d probably go unstated in a puzzlehunt published today. The Hunts from the 1980s are even more different: some don’t even have metapuzzles. The most visible rehearsal is the Hunt happening every year, along with lots of other puzzlehunts since. Other forms of rehearsal are the archiving of past Hunts and the creation and distribution of puzzlehunt tools.

Maybe it’s just me, but it looks like puzzlehunts have only become more like each other over time, due to this rehearsal. Maybe consistency was harder in the past because these practices weren’t codified in archives or tools like they are now. Now, consistency itself isn’t a bad thing; other forms of media have conventions too. Sometimes I wonder if we only follow practices because we’re actors in an institution that reinforces them, as opposed to being justified by goals like making participants have fun.

As discussed earlier, though, if we stray too far from practices, we are sanctioned through social disapproval. Critique is only possible because practices are rehearsed: if there wasn’t anything to compare against, there’d be no baseline for calling something good or bad. Thus, practices form narratives that judge practices. Practices are enacted through rehearsal, but they’re enforced through narrative. Stronger practices mean that disapproval is stronger when they’re broken.

Tensions arise when practices aren’t strong enough to set expectations. Some perennial questions among the Mystery Hunt community fall here: Who is the audience of the Hunt? What is the importance of its story? How big should the Hunt be, how long should it take? The 2024 Hunt had the most puzzles the Hunt has ever had, and the winning team, Death & Mayhem, finished Monday 5 AM. There’s general consensus that this was too large and the Hunt took too long, but how much too large and how much too long isn’t agreed on. Of the data I could easily find, the winning team finished Hunt on:

Saturday 4 AM in 2017,
Saturday 10 PM in 2012,
Sunday 2 AM in 2014,
Sunday 5 AM in 2010, 2011, and 2015,
Sunday 10 AM in 2022,
Sunday 2 PM in 2020,
Sunday 4 PM in 2018,
Sunday 6 PM in 2016, 2019, and 2021,
Monday 3 AM in 2009,
Monday 7 AM in 2023,
Monday 3 PM in 2013.

Although the winning team in most of these Hunts finished on Sunday, there’s still a decent amount of variance. I’ve heard people say that the 2017 Hunt was too short, and I’ve heard others say it was fine. In a universe where the winning team always finished Hunt on Saturday night, I’d imagine that there wouldn’t be as big of an argument around this.

Narratives

Narratives carry the why behind rules and practices. Lowndes and Roberts take narratives as things conveyed through spoken words or symbols, and talk about things like speeches and stories. I’m taking a more expansive view of narrative, and including things like mass media and social media.

The speeches during Mystery Hunt happen in kickoff and wrapup. Kickoff is set within the Hunt: aside from logistics and safety information, it serves to start the story presented in the Hunt. Wrapup is a narrative around the Hunt: it’s a series of speeches about what happened and how. All the wrapups I’ve seen are structured around a slideshow and consist of a series of speeches. In this year’s wraup, TTBNL presented:

the goals they set for the Hunt, which included making the Hunt a good experience for smaller teams and giving authors lots of chances to write and act;
the story of the Hunt, which follows a rough three-act structure, all the in-character team interactions they did, and the coin;
some in the Hunt, including interesting metapuzzles and round structures, physical puzzles, and ones that had team submissions or interactions;
the Hunt construction process, from the puzzle writing timeline, to factchecking and testsolving;
the art and merch commissioned from artists;
videos of what happened in the events;
funny answer submissions, Hunt statistics, and the leaderboard;
sponsor reads, acknowledgements, and a call for contributions, either through donations or joining Puzzle Club.

While the content of wrapup is similar year-to-year (see 2020, 2021, 2022, and 2023), differences arise in the order of each section, the amount of time spent, who is mentioned and thanked, or what is and isn’t stressed. Wrapup enables the continued practices of Mystery Hunt by giving reasoning behind what happens and persuading listeners to contribute. As Why Institutions Matter says, narratives are enacted through explanation and persuasion, which is precisely what wrapup does.

Despite kickoff and wrapup being the only speeches during Hunt, most narratives happen outside of them. Consider the YouTube chat for this year’s wrapup. At around 22 minutes in, there’s a discussion about Hunt length. Then, around 28 minutes in, there’s a discussion on campus access. Both discussions continue for the next few minutes. Around 36 minutes in, there’s some discussion about accessibility. And there’s more discussion after that too, which I won’t go into.

This shows us how narratives relate to rules and practices. Narratives judge practices, either through criticism or praise. There’s criticism of the Hunt’s length, and praise for TTBNL’s focus on accessibility. And narratives justify why rules exist, as seen in the discussion on campus access, explaining that the restrictions come from MIT. Another good example of narratives justifying rules, though not from this chat, is from 2023. That year, teammate recommended a maximum team size of 60, rather than 75; this change was also justified in the wrapup.

These discussions happen during kickoff and wrapup, or when other people bump into each other in events, or when people talk about the Hunt in the meals and meetups surrounding it. These discussions also happen outside the Hunt, from the stories that people tell their friends, to media coverage about the Hunt, to comments on blog posts, to the Mystery Hunt subreddit. Centralized Discord servers dedicated to puzzlehunting exist now, like Puzzle World. The internet’s only made the narratives surrounding Hunt grow stronger over time, an institution becoming more self-reinforcing.

Narratives are sanctioned through credibility and reputation. That’s where the tension comes from too. Saying anything too out-of-pocket risks people disagreeing with you, or dismissing you. This is related to the Overton window: if you’re too far outside it, you’ll be dismissed.

I was thinking about this for a puzzlehunt I helped write, Galactic Puzzle Hunt 2023. For context, GPH 2023 centered around a card game, which was used as the medium for half the puzzles; the remaining puzzles were more “traditional” in the ways outlined in the previous section. We hoped to push the boundaries of what people considered a puzzlehunt (rather than branding ourselves as a “puzzle game”). Feedback showed that some solvers liked this, and some didn’t, which was pretty much what we expected.

While I can’t speak for all of ✈✈✈ Galactic Trendsetters ✈✈✈, my gut said that by writing a “gimmicky” hunt, we were risking having a smaller audience of puzzlers interested in future puzzlehunts we write. The wrapup for GPH 2023 justifies our decision to make an unconventional puzzlehunt, and I can imagine someone reading it and thinking less of us as a result. I sure hope this didn’t happen, but it’s part of the tensions inherent to shifting what people think.

Change

This discussion of GPH 2023 leads me to one of the most difficult questions in institutional theory: how do institutions change? Or, maybe: how do we change an institution? We’ve talked a lot about how institutions are self-reinforcing, so they’re resistant to change. And yet, they change anyway: new rules are made, old practices die out, narratives shift, old rules are edited, new practices form. My answer is that change results from the tensions between rules, practices, and narratives.

Why Institutions Matter has a whole chapter dedicated to institutional change. It begins with theories like path dependence, which says that change is costly, so critical junctures like political upheaval are necessary for change. Or the garbage can model, which shows that organizations tend to use existing solutions rather than making decisions from first principles. Both are criticized as “stop-go” models that insufficiently explain change. Instead, change can be gradual, and must be understood through several factors, like the interaction of actors, institutions, and environments, or the interaction of rules, practices, and narratives:

Rules, practices and narratives may reinforce one another (the “overdetermination” of relatively stable institutions) or abrade and undermine each other. If we think about the banking crisis in the UK, customers still abide by the formal rules of financial institutions (managing their bank accounts, loans and savings plans) but the narratives that used to reinforce institutional constraint (about the reliability and trustworthiness of the banking system) have weakened to the point that they no longer reinforce these rules and, indeed, start to make space for the evolution of alternative practices (buying gold or property or accessing the stock market independently). Creative spaces open up (for better or worse) when the “fit” between different elements of an institutional configuration weakens. Indeed, change strategies may focus upon just this—how to undermine existing narratives, or model new practices, as a precursor to changing formal rules.

What’s described here as a weakening of “fit” is what I’ve called tension in this post, arising from gaps in the rules, or inconsistent practices, or narratives on the edge of the Overton window. Consider how this year’s Hunt contributes to the tension between them. TTBNL’s response to the Hunt site being unavailable could influence how future Hunts respond to the same issue. The length of this year’s Hunt, coupled with the length of last year’s Hunt, will probably push the next Hunts to be shorter. The praise for this Hunt’s accessibility might encourage future Hunts to care more about accessibility too. These tensions are individually small, but they hold the potential for small future changes, that could build up to something larger.

How do we use this to cause change, though? Even if we’ve identified tension as the ultimate cause of change, this doesn’t tell us much about how to cause tension in the first place. I think this has to be specific to the institution. Puzzlehunt participants can themselves write and run puzzlehunts, in the hopes that other puzzlehunts adapt their practices; this is partly how I view GPH. Narratives can be advanced through writing blog posts (like this one), through starting discussions in communities, or through meetings and conferences. Change can happen through upheavals or critical junctures, but it can also happen through accumulating small, gradual changes. Change can happen through creation, but also through maintenance. I’ll end with another quote from the final chapter of Why Institutions Matter, which is on institutional design:

Once designed, institutions will wither away if they are not constantly and actively maintained. Maintenance serves not only to minimize the gaps in the institutional configuration but also to mitigate against contingent effects. So “design” involves not just institutional creation, but an ongoing commitment to enforce rules, model practices and rehearse stories. […]

We conclude that, in most cases, institutional designers can only hope to displace gradually pre-existing institutional configurations and cannot afford to neglect their new projects once they are in place. Attempts at intentional institutional change remain, with these provisos, worthwhile. However, those who seek to design and redesign institutions are best advised to approach their challenge with a reflexive and ironic cast of mind.

Conclusion

We talked about the Hunt through the framework of rules, practices, and narratives, and we talked about instutional change being driven by tensions between them, using Mystery Hunt 2024 as our main example throughout.

While I don’t see myself becoming a political analyst any time soon, reading Why Institutions Matter and taking on this case study has broadened my appreciation for politics, as a study of how organizations work. When I was an undergrad, I was involved in student groups that were quite different organizationally. This ranged from ESP, a stable student group with rapid turnover, to the Filipino Students Association, a club that had just started when I joined, to the Assassins Guild, a long-standing group with alumni involved that recovered after MIT’s pandemic closure. After the research I’ve done for this post, I now have more general language for describing how they’re different, language I can apply to the organizations I’m involved with now.

What fascinates me the most about institutions is change, and how intentional institutional change usually happens through gradual effort and improvisation. It’s interesting to me that Lowndes and Roberts chose to use the word design. They have this neat quote from Goodin that rings true to me about group creation in general:

The Myth of the Intentional Designer (still less the Myth of the Intentional Design) is greatly to be avoided in theories of institutional design. Typically, there is no single design or designer. There are just lots of localized attempts at partial design cutting across one another, and any sensible scheme for institutional design has to take account of that fact. Thus, even within the realm of our intentional interventions, what we should be aiming at is not the design of institutions directly. Rather, we should be aiming at designing schemes for designing institutions—schemes which will pay due regard to the multiplicity of designers and to the inevitably cross-cutting nature of their intentional interventions in the design process.

It’s about gardening as opposed to engineering. Architecting with change in mind. About design principles rather than design rules. To take an institutional perspective is to acknowledge the interlocking rules, practices, and narratives underlying an organization, and setting about to change them not from large, top-down reforms, but through small, gradual actions.