Categories
Uncategorized

Straw Man Proposal: Every Regex Should Have Its Own Class

Regular expressions are commonly written very casually on the fly based on some known examples. Regexes are densely packed with logic that is often a matter of one’s personal style as much as intentional decisions about what that regex should match or not. Many choices are overlooked or made unintentionally by the platform executing the regex. Some examples include whether or not to match across lines, or whether to be greedy (if the author even knows what that means).

A regular expression is usually pure implementation (unless it has embedded comments, which I’ve yet to see in the wild). I have a rule of thumb that most code logic should have two parts: what and how. Any non-trivial piece of logic should be wrapped in a function or class so that the next person coming by doesn’t have to execute the logic in their head to know what it’s doing. They can assume that the code does what it says it does unless they have further reason to doubt it. You could say this is another way of talking about the Single Level of Abstraction Principle.

The most important reason to give every regex its own class is for unit testing. Every regex should be accompanied with a set of examples of what it’s intended to match. Every regex represent bugs waiting to happen, so creating it initially with a set of unit tests prevent regressions of the original test cases and encourage accumulation of additional regressions tests.

Unit testing is a great mental hack to get around happy-path bias. I think regexes are naturally prone to happy-path bias.

Counter: Why not just a function?

Response: Not a bad point. I’m more confident in stating the proposal “Don’t use a regex directly”. In the programming cultures I had in mind, by which I mean those passionate about testing, static functions are frowned upon to the point that even if there’s not a good reason against one in a particular case, a true class is considered better style probably for consistency’s sake. In an FP codebase, I wouldn’t begrudge a regex wrapped in a function.

Counter: What about a checklist for writing regexes? To make sure you’ve considered subtleties like greediness.

Response: That makes sense in the imaginary world where code is written once and seldom changed. In the real world where code is a living document, tests ensure continued compliance.

Categories
Uncategorized

If a wheel keeps getting reinvented, the most important thing is for everyone to share the test cases that drove them to reinvent the wheel again.

Categories
Uncategorized

The Sorting Hat from Harry Potter is really a hash function.

Categories
Uncategorized

How to install Perl 6 in Ubuntu

The old version of this post was stupid. Just use https://github.com/nxadm/rakudo-pkg

Categories
Uncategorized

What is parsing?

This is not a pipe.

This is not the painting entitled “The Treachery of Images” by Rene Margritte.

This is an image of the painting “The Treachery of Images” by Rene Margritte.

123.45

This is not a number.

This is a piece of text containing numerals, symbols which have numeric values associated with them, each individually, and also together as a whole.

Parsing is the process of interpreting the representation of an idea to get at the idea itself.

Categories
Uncategorized

Podcasts

I’ve been a big podcast listener for several years. Here’s roughly the current list of podcasts I subscribe too, organized by how vehemently I recommend them.

Everyone Must Listen To

These are so good, it’s not worth explaining why, just listen to:

I Recommend

  • Planet Money 🔗
  • Tim Harford 🔗
    • 50 Things That Made the Modern Economy 🔗
    • Pop-Up Ideas 🔗
  • Flash Forward 🔗
  • BBC Analysis 🔗
  • TED Radio Hour 🔗
  • EconTalk 🔗
  • Embedded 🔗
  • BBC World Service Documentaries 🔗
    • It’s downright humbling to realize how diverse the world is.
  • BBC Seriously… 🔗
    • This one gets extra credit for being so sonically interesting.
  • Seminars about Long Term Thinking – The Long Now Foundation 🔗

I also listen to

Which is a recommendation in itself, just less strongly than the above.

  • ProPublica 🔗
  • C-Span After Words 🔗
  • NPR Story of the Day 🔗
  • Codebreaker 🔗
  • Intelligence Squared 🔗
  • The Infinite Monkey Cage 🔗
  • Reply All 🔗

Honorable Mention

I don’t really listen to these, but that’s no fault of theirs. They are worth checking out.

  • Hardcore History with Dan Carlin 🔗
  • The Joe Rogan Experience 🔗
  • Song Exploder 🔗
  • Democracy Now! 🔗
    • These guys do great journalism. I’ve contributed to them. I just can’t spare an hour a day on the daily news cycle.
  • Death, Sex and Money 🔗
Categories
Uncategorized

I love the CockroachDB logo

cockroachdbI know nothing about design but this is a great logo. The two circular arcs that make up the body and antennae create a partial Venn diagram, referencing the set theory and relational algebra that form the theoretical foundation for this and any relational database. The shape on the back of the cockroach evokes a funnel, the universal symbol for filtering: a fundamental database operation.

Categories
Uncategorized

Git freebase