SQL bugs the explainer catches before EXPLAIN

EXPLAIN tells you what your database will do with a query at runtime. It tells you the join order, the indexes it will use, the row estimates it has. It does not tell you whether the query expresses what you meant. That is a different kind of audit, and it happens at read time, not at execute time.

This post walks through SQL bugs that look correct on paper but produce wrong results, with patterns the SQL explainer flags in the notes section. The pillar post on reading code in plain English covers the broader strategy; this is the SQL-specific deep dive.

The Cartesian product that hides in plain sight

A JOIN without a predicate produces a Cartesian product: every row from the left table combined with every row from the right. For two 1000-row tables, that is a million rows. This is sometimes intentional but usually not.

The bug shape: SELECT u.id, o.id FROM users u JOIN orders o WHERE u.active = true. Note there is no ON clause. Some SQL dialects (older MySQL, SQL Server with implicit-join syntax) accept this and produce a Cartesian product before applying the WHERE. The result has way more rows than expected.

The fix: always specify a JOIN ... ON predicate. If you genuinely want a Cartesian product, use CROSS JOIN to make the intent explicit.

The subtler case: a JOIN predicate that is structurally correct but logically too permissive. JOIN orders o ON o.created_year = u.created_year joins on a coarse column and produces many-to-many results when the writer expected one-to-many. The query runs without errors; the result has duplicates the writer did not anticipate.

The explainer catches the first case (missing JOIN predicate) reliably and flags the second case (low-cardinality JOIN column) as a warning to verify against actual row counts.

NULL comparison: the trap that bites everyone once

In SQL, NULL = NULL is not true. It is NULL. Same with NULL != 'x', NULL > 0, and most other comparisons. NULL means “unknown,” and any operation involving an unknown is itself unknown.

This breaks intuitions repeatedly. A query like SELECT * FROM users WHERE state != 'CA' does not return rows where state IS NULL. The comparison NULL != 'CA' is NULL, which is treated as false in the WHERE clause.

If your intent was “all users not in California (including those whose state is unknown),” the correct form is:

WHERE state IS NULL OR state != 'CA'

Other NULL traps:

NOT IN (subquery) returns no rows if the subquery produces any NULL. WHERE id NOT IN (SELECT excluded_id FROM exclusions) returns nothing if any exclusion has a NULL id, even when the writer expected non-matching rows.
COUNT(column) ignores NULLs; COUNT(*) does not. A change from COUNT(*) to COUNT(some_col) silently changes results when the column has nulls.
string_concat(NULL, 'foo') produces NULL in some databases (Oracle, PostgreSQL) and 'foo' in others (MySQL). Cross-database queries break here.

How to spot in reading: any comparison or aggregation where the column could be nullable. The schema definition tells you which columns are nullable. The query tells you what it does with them. When the two do not align, NULL bugs follow.

The GROUP BY column omission

A SELECT with a mix of aggregate and non-aggregate columns has to GROUP BY all the non-aggregate columns. PostgreSQL, SQL Server, and Oracle enforce this strictly. MySQL (until 5.7 default) and SQLite do not, and they pick an arbitrary value for the non-grouped columns.

Bug shape: SELECT user_id, name, COUNT(*) FROM users JOIN orders ON ... GROUP BY user_id. The query says “for each user_id, give me the name and the order count.” But name is not in GROUP BY. PostgreSQL refuses; MySQL silently picks one of the names per user_id (which one is not specified) and gives you something that looks right and is not.

The fix: include all non-aggregate columns in the GROUP BY: GROUP BY user_id, name. Or aggregate the column: MIN(name) or MAX(name) if you do not care which value you get.

The explainer flags this pattern even on MySQL because it is almost always a bug regardless of what the engine accepts.

Implicit casts that change behavior across databases

Comparing a string to an integer is a portability minefield. PostgreSQL refuses (integer = text is an error). MySQL silently casts the string to integer (and casts non-numeric strings to 0). SQL Server casts based on data type precedence (the integer wins, the string is converted).

Query shape that breaks on portability: WHERE customer_id = '12345'. If customer_id is integer and you wrote it with quotes, the behavior depends on the engine. Some accept it; some warn; some fail. Tests written on one database may pass on it and fail on another.

Worse: WHERE customer_id IN ('12345', '67890') where the inner values look like integers but are strings. MySQL handles it; PostgreSQL errors out. The test on local SQLite passes; the integration test on Postgres fails.

The fix: type your literals correctly. If a column is integer, compare to integer literals (no quotes). If a column is string, compare to string literals.

How to spot in reading: any comparison where the literal type does not obviously match the column type. The schema tells you the column type. Match it.

ORDER BY with non-deterministic ties

SELECT * FROM users ORDER BY created_date LIMIT 10 is non-deterministic when multiple users share a created_date. Each query execution may return a different set of 10 rows from the tied group.

For pagination, this manifests as: page 1 returns rows A through J; page 2 should return K through T but returns G through P because the underlying engine reordered tied rows differently. Users see duplicates and missing items.

The fix: always tie-break with a column guaranteed unique. ORDER BY created_date DESC, id ASC ensures stable ordering even when created_date repeats.

How to spot in reading: any ORDER BY without a final unique-tiebreaker. Especially common in pagination queries where the writer thought the time column was unique enough.

Date and timezone surprises

WHERE created_at >= '2026-01-01' may compare strings or dates depending on the database. PostgreSQL parses the literal as a date and compares correctly. SQL Server with default collation works. MySQL with STRICT_ALL_TABLES works. Without those, behavior varies.

Worse, timezone-naive comparisons in databases that store timezone-aware data. WHERE created_at >= '2026-01-01' against a TIMESTAMPTZ column treats the literal as session-local-timezone, which may not match production-timezone, which may not match user-timezone.

The fix: explicit casts and explicit timezones. WHERE created_at >= TIMESTAMP '2026-01-01 00:00:00 UTC' or WHERE created_at >= '2026-01-01'::DATE. Different databases have different cast syntax; check yours.

How to spot in reading: any comparison against a date-shaped string literal. If the column is TIMESTAMP and the literal is a string, the behavior depends on cast rules you may not have read.

The subquery that should be a JOIN

A correlated subquery in SELECT runs once per row of the outer query. For a 100-row outer query and a 10-row inner table, this is 1000 lookups (or a million if the engine cannot optimize it).

Bug shape:

SELECT u.id, (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count
FROM users u

Each user row triggers a count subquery. Most modern query planners rewrite this as a LEFT JOIN with GROUP BY. Older planners do not. The query runs 100x slower than necessary on a 10K-user table.

The fix:

SELECT u.id, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id

How to spot in reading: any subquery in SELECT that depends on the outer row (correlated). If the same data could be expressed as a JOIN with GROUP BY, the JOIN form is usually faster.

How to catch these in practice

For each query you read, walk the 4-step strategy from the pillar post:

State the intent in one sentence.
Decompose into clauses (FROM, WHERE, JOIN, GROUP BY, HAVING, ORDER BY, LIMIT).
For each clause, verify the intent matches the syntax. Pay special attention to NULL handling and JOIN cardinality.
Test on edge cases: empty tables, NULL columns, ties in ORDER BY, large result sets.

The SQL query explainer automates the decomposition and flags the structural bugs above. It does not run EXPLAIN; that is your database’s job. The two are complementary: read with the explainer to catch logical bugs; run EXPLAIN to catch performance bugs.

Closing

The bugs above account for most “the query runs but the data is wrong” issues in production SQL. None require the database to be running to detect; all are catchable at read time by anyone who knows what to look for. The SQL explainer flags the structural ones; the pillar post covers the general strategy. The remaining cases (intent-versus-syntax mismatches, vendor-specific quirks) require human judgment, but the search space is much smaller after the structural pass.