glunty

Blog   /   Reading code in plain English   /   PILLAR

Reading regex, SQL, bash, and Excel in plain English

A practical guide to reading the small text languages developers read every day. Decode by intent first, then by tokens; with AI explanation notes.

There is a category of text that almost every working developer, data analyst, and operator reads but few people read well. It is regex. It is SQL. It is bash one-liners. It is Excel formulas with five nested function calls. It is the inherited curl command in the deployment runbook that nobody has touched in three years. These are the small text languages that mediate between humans and machines, and they share a property that makes them harder to read than ordinary code: they are deliberately compact, expressively dense, and unforgiving about what they actually mean.

This post walks through why these mini-languages exist, why they are hard to read even for experienced practitioners, and a strategy for reading them that scales beyond the moment of immediate confusion. It also covers when AI-assisted explanation actually helps and when it produces confident-sounding nonsense.

Why these mini-languages exist

Each of the four big readability traps developed for a real reason:

  • Regular expressions compress complex string-matching logic into a handful of characters because they originated in line-oriented Unix tools (grep, sed, awk) where the input was always a stream and the output had to be a single shell expression. There was no room for “match a phone number that may have parentheses and dashes” written out longhand. So ^\(?(\d{3})\)?[-\s]?(\d{3})[-\s]?(\d{4})$ happened, and once it worked it became canon.
  • SQL chose declarative syntax because the original goal in the 1970s was to let non-programmers express set logic. That trade succeeded; SQL is genuinely more accessible than the procedural alternatives of its era. But declarative means the engine decides the order of operations, and that hides what is actually happening, especially when you nest CTEs three deep with window functions on the inside.
  • Bash is what you get when 50 years of incremental additions accrue on top of a 1970s shell scripting language whose original goal was just to glue Unix tools together. Every flag, every parameter expansion, every ${var:-default} form was added because someone needed it once. The result is a language with extreme expressive density and no consistent design vocabulary.
  • Excel and Google Sheets formulas evolved as a UI for spreadsheet calculations where the only available metaphor was the cell. Formulas chain function calls inside other function calls because nesting is the only way to express composition without variable names. Functions like IFERROR(VLOOKUP(...)) are not awkward by accident; they are the natural shape of programs written without identifiers.

The common thread: each language traded readability for compactness because the use case demanded it. None of them are bad designs. They are, in the language of their domain, locally optimal. The cost is that anyone reading them later has to reverse the optimization.

Why they are hard to read even for experienced practitioners

The languages share four reading traps:

  1. Cryptic syntax. Regex (?P<name>...) is a Python named capture group. Bash ${var:-default} is parameter expansion with a fallback value. SQL WHERE name LIKE '%bob%' is a substring match using SQL’s wildcard character. Excel =IFERROR(VLOOKUP(A2,$D$2:$E$100,2,FALSE),"not found") is a lookup with an error fallback. None of these are intuitive on first reading. All of them are completely standard once you know.

  2. Order-of-operations is hidden. SQL does not run in the order it is written. The query parser reads SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ... LIMIT, but the engine evaluates FROM, then WHERE, then GROUP BY, then HAVING, then SELECT, then ORDER BY, then LIMIT. This is why SELECT col_alias cannot be used in WHERE but can be used in ORDER BY: the alias is not bound when WHERE runs. Regex evaluates left to right with backtracking on quantifiers; reading the pattern in order does not match the engine’s behavior. Bash runs left to right but processes redirections before the command, so cmd > file 2>&1 and cmd 2>&1 > file redirect different streams to different places.

  3. Vendor-specific blind spots. PCRE regex is not JavaScript regex. PostgreSQL is not MySQL. Bash is not POSIX sh. Excel is not Google Sheets. Each pair shares 90% of common syntax and diverges in the remaining 10%. The diverging cases are usually the interesting ones: lookbehind in JavaScript regex (added in ES2018, missing for years), MySQL group_concat (no standard equivalent), bash arrays (POSIX sh has none), Excel LET (Google Sheets gained it later with different semantics).

  4. Bugs are silent. A regex that should match a phone number but matches a substring of an email instead does not error. A SQL query that should join one-to-many but joins many-to-many does not error. A bash command that should delete logs but deletes everything because IFS was misset does not error. An Excel formula that should sum a range but sums the wrong range does not error. The output looks plausible. You only know it is wrong when you check.

These traps are well-documented; everyone who has worked with these languages has a war story. What they share is that careful reading on the front end is much cheaper than debugging on the back end.

A strategy for reading: by intent first, then by tokens

The strategy that scales is the same regardless of language: read for intent before reading for syntax.

Step 1: Establish what the writer was trying to achieve

Before parsing tokens, ask: what is this snippet supposed to do? The answer should be a one-sentence statement in plain English. For regex ^(\d{3})-(\d{4})$: “match a 3-digits dash 4-digits string from start to end.” For SQL SELECT u.id, COUNT(o.id) FROM users u LEFT JOIN orders o ON o.user_id = u.id GROUP BY u.id: “for each user, count their orders.” For bash find . -type f -name '*.log' -mtime +30 -exec rm {} \;: “delete log files older than 30 days under the current directory.”

If you cannot state the intent in one sentence, that is the problem. The snippet may be correct or incorrect, but you cannot evaluate it until you know what it is trying to do. Ask the author. If the author is no longer reachable, look for the surrounding context (comments, commit messages, documentation, or test cases that exercise this code).

Step 2: Decompose into independent clauses

Split the snippet into the smallest pieces that have meaning on their own. For regex, this means treating each group, anchor, or quantifier as a clause. For SQL, this means treating each clause (FROM, WHERE, JOIN, GROUP BY, HAVING, ORDER BY) and each subquery as independent. For bash, this means each command in a pipeline. For Excel, this means each function call.

Then describe each clause in plain English. The point is not to be exhaustive; the point is to make the structure visible. A regex with three groups becomes three statements. A SQL query with two CTEs becomes two named blocks plus a final query. A bash pipeline becomes a chain of “produces X, transforms to Y, filters to Z.”

Step 3: Verify each clause’s intent matches its syntax

This is where most reading errors get caught. The decomposition gives you intent statements; now check whether each syntax fragment actually does what its intent statement claims.

For regex, the most common mismatch: a quantifier matches differently than expected because of greedy versus lazy semantics. .* is greedy and grabs as much as possible. .*? is lazy and grabs as little as possible. A pattern like <.*> against <a>foo<b> matches <a>foo<b> (the whole string), not <a> and <b> separately. If your intent says “match each tag” but your pattern is greedy, the intent and syntax do not match.

For SQL, the most common mismatch: NULL handling. WHERE col != 'x' does not match rows where col IS NULL, because comparison with NULL yields NULL (not true). If your intent is “all rows where col is not ‘x’ (including nulls),” the syntax needs WHERE col IS NULL OR col != 'x'.

For bash, the most common mismatch: word splitting. for f in $(ls *.log); do rm "$f"; done looks correct. It is not. The output of ls is split on whitespace, so a filename with a space breaks. The correct form is for f in *.log; do rm "$f"; done, which uses bash’s own glob expansion that respects spaces.

For Excel, the most common mismatch: absolute versus relative references. =A1+B1 copied down becomes =A2+B2 (relative). =$A$1+$B$1 copied down stays the same (absolute). A formula that mixes them produces output that looks right in the original cell and wrong in copies.

Step 4: Test on edge cases

Once intent and syntax align, validate with edge cases the writer probably did not think of. For regex, test the empty string, strings that match the start of the pattern but not the end, strings with special characters. For SQL, test with NULLs, with empty result sets, with maximum row counts. For bash, test with filenames containing spaces, quotes, newlines, and Unicode. For Excel, test with text in number cells, with division by zero, with empty cells.

This is also where you discover bugs that the original writer did not catch. The point is not to reject the snippet; it is to know its actual contract, which often differs from the intended one.

Tool tour: when AI explanation helps

For each of the four languages, glunty has a free AI-augmented explainer that walks through the snippet token by token in plain English. They are useful when you want a sanity check, when you want to understand something written by someone else, or when you want to teach the language to a new team member by showing the explanation alongside.

  • Regex to English explainer decomposes a pattern into anchors, groups, character classes, quantifiers, and notes when something is greedy versus lazy.
  • SQL query explainer walks a query clause by clause and flags common bugs (Cartesian products, missing GROUP BY columns, NULL comparison pitfalls).
  • Bash command explainer describes each token in a command, including pipes, redirections, parameter expansion, and command substitution. Flags potentially-destructive patterns.
  • Excel and Sheets formula explainer walks a formula by function, calling out volatile functions (NOW, RAND, INDIRECT) that recalculate on every change.
  • Code snippet explainer is the general-purpose version for any language, useful when none of the more specific explainers fits.
  • Curl command parser is local (not AI) but lives in the same family: it decomposes a curl command into the structured HTTP request it represents.

The pattern: paste the snippet, read the explanation, do not skip steps 1 through 4 above just because you have an explainer. The explainer accelerates the reading; it does not replace the reading.

When AI explanation gets it right

For idiomatic snippets in mainstream languages with clearly-stated intent in the surrounding context, AI explanation is reliable. It catches what the snippet says, what each piece is for, and (often) what edge cases it might handle wrong. The “notes” section that the explainers emit usually flags genuine concerns: an unanchored regex that will match substrings, a SQL query whose JOIN cardinality looks suspicious, a bash command that recursively touches the home directory.

This is the highest-value case: a confused reader meets an idiomatic snippet, the explainer translates it once, the reader internalizes it, and next time the same idiom appears the reader does not need the explainer.

When AI explanation gets it wrong

There are three categories where AI explanation produces confidently wrong output:

  1. Vendor-specific extensions. A PostgreSQL query using RETURNING clauses, JSON operators (->>, #>), or window functions with FILTER clauses may be parsed as standard SQL and the dialect-specific semantics missed. Same with bash arrays (POSIX sh does not have them), Excel LAMBDA (Excel 365 only), JavaScript regex lookbehind in older flavors. Always provide the dialect when the explainer asks.

  2. Novel idioms. A regex pattern someone wrote for a specific purpose that does not match a common idiom may be explained piece by piece correctly but miss the intent. The explainer says “this pattern matches a sequence of letters followed by digits,” and that is true at the token level, but the actual purpose was to validate a specific identifier format and the pattern happens to permit invalid forms.

  3. Intent versus syntax. When the writer’s intent is clear from context the explainer does not have, the explanation may describe what the snippet actually does in a way that obscures whether it matches what was wanted. A regex \d+ extracts digits. The explainer says so. But if the writer’s intent was “extract a US ZIP code,” the regex is wrong (it matches digit sequences of any length, not exactly five). The explainer is not lying; it is just describing the syntax, and the intent gap is the reader’s job to spot.

The robust approach: use the explainer to decode tokens, then ask yourself whether the decoded behavior matches what you would expect for this context. Mismatch means the writer made an error or the explainer missed a dialect issue. Either way, you have learned something.

The compounding effect

Reading these languages well is one of the higher-leverage skills in software work. The compounding shows up in three places:

  • Bug-debugging time. Most regex/SQL/bash/Excel bugs are not deep; they are intent-syntax mismatches that careful reading catches. The cost of catching them at read time is a few minutes. The cost of catching them at runtime is at least an hour, often a day, sometimes a postmortem.
  • Onboarding friction. New team members spend a disproportionate amount of their first weeks reading existing code. Every snippet they cannot decode is a question they have to ask. A team that writes for readability, with brief comments stating intent, scales onboarding much better than one that writes the densest possible code.
  • Documentation as a side effect. When you decompose a snippet into intent statements during reading, write them down in a comment. The next person reading the same snippet (often you, six months later) gets the intent for free. The cost is one minute now; the savings are recurring.

Where this strategy goes wrong

There are cases where the careful-reading approach is the wrong tool:

  • Cryptographic primitives. Reading the source of an AES implementation does not validate it. Cryptography requires test vectors, formal review, and timing-attack analysis. AI explanation can describe what the code does at a high level but cannot verify cryptographic correctness.
  • Production security filters. A regex used as a security filter (an SQL injection pattern matcher, a path traversal detector) needs to be tested against real attack patterns, not just read. AI explanation that says “this pattern matches input ending in .. or /” is correct but missing the point: the question is whether it misses any attack pattern.
  • Performance-critical code. Reading a query and reading EXPLAIN against a real database with real statistics give different answers. The query-shape that looks slow on paper may be fast because of an index. The query-shape that looks fast may be slow because of a bad join order. Performance reading requires running the code with real data, not staring at it.

In all three cases, careful reading is still useful as a first pass; it just is not sufficient. Pair it with the appropriate verification (test vectors, attack corpus, EXPLAIN output).

Closing

The four languages this post covers (regex, SQL, bash, Excel) plus general code make up a large fraction of the text any working developer or analyst reads in a year. Reading them well is not a matter of having a photographic memory for syntax; it is a matter of having a strategy that holds up across languages: decompose into clauses, state intent in plain English, verify the syntax matches the intent, and test on edge cases. AI-augmented explainers accelerate the first three steps but do not replace them.

The cluster posts that follow this pillar walk through specific patterns and pitfalls in each of the four languages: regex matching more than expected, SQL bugs the explainer catches, reading curl-as-bash safely, and Excel formulas that look right but are not. Each post links back here for the broader strategy and forward to the relevant tool for hands-on practice.

Embedded tool from glunty.com