SQL Indexes: Range Queries, B-tree Scaling, Partitioning

SQL range queries, B-tree index behavior under scale, LIKE pattern pitfalls, and table partitioning strategies determine whether your database reads three blocks or three million. Understanding the mechanics behind each technique separates production-ready database design from guesswork.

Key Takeaways

✓Covering indexes eliminate table reads entirely by including all query columns in the index itself. When WHERE, JOIN, and SELECT columns are all in the index, the database never touches the base table

✓Applying functions to indexed columns (e.g., TO_DATE(column)) prevents the optimizer from using index range scans. Rewrite conditions to compare raw column values against computed boundaries instead

✓LIKE with a leading wildcard ('%son') forces a full index scan. LIKE with a trailing wildcard ('Rob%') enables efficient B-tree range traversal because the sorted order is preserved

✓B-tree depth grows logarithmically: a 3-level tree handles 1 billion rows. Table size alone rarely causes performance degradation for properly indexed range queries

✓Index fragmentation from inserts and deletes is the real scaling bottleneck, not row count. Regular reindexing restores physical order and halves worst-case I/O

✓Table partitioning solves maintenance problems (dropping old data without rebuilding indexes) but introduces query complexity. Global vs. local index strategy determines whether you gain or lose performance

At Boundev, we've optimized database layers across 31 dedicated team engagements where SQL performance was the primary scaling constraint. The pattern is consistent: teams that understand B-tree mechanics and partitioning strategy reduce query latency by 10x-100x without hardware upgrades.

Most SQL performance problems are not caused by too much data. They are caused by queries that access data inefficiently, indexes that are structured incorrectly for the query patterns, and maintenance strategies that let fragmentation accumulate until the database crawls. Understanding exactly how B-tree indexes serve range queries, why certain SQL patterns defeat the optimizer, and when partitioning helps versus hurts is the foundation of production-grade database engineering.

The difference between a query that reads 5 blocks and a query that reads 500,000 blocks is not hardware. It is how well the developer understands index mechanics.

How Covering Indexes Eliminate Table Reads

A covering index contains every column that a query needs: the columns in the WHERE clause, the JOIN conditions, and the SELECT list. When all required data lives in the index, the database engine never needs to jump back to the base table to fetch additional columns. This eliminates the most expensive operation in non-covering index usage: random I/O to the table for each matching index entry.

When Covering Indexes Matter Most

Covering indexes deliver the largest performance gains when the query matches a small fraction of the table through the index but would otherwise require random reads back to the table for each match. In scenarios where the query returns a large percentage of the table, a full table scan is typically faster anyway, and the covering index provides minimal benefit. The sweet spot is queries that are selective enough to benefit from index traversal but would be slowed by the key lookups needed to fetch non-indexed columns.

● If all SELECT columns are in the index, the database reads only the index's leaf pages sequentially, which is significantly faster than random table access

● Non-clustered indexes are narrower than tables (fewer columns per row), so more rows fit per disk block, meaning fewer blocks need to be read

● The INCLUDE clause in SQL Server and PostgreSQL lets you add non-key columns to the index leaf level without affecting sort order or tree depth

● Trade-off: covering indexes increase storage and slow down INSERT/UPDATE/DELETE operations because the index must be maintained alongside the table

Consider a reservations system where you frequently query client names for a specific hotel and date range. Without a covering index, the database finds matching rows via the index, then performs a separate lookup to the base table for each row to retrieve the client name. With a covering index that includes the client name column, the query is satisfied entirely from the index.

sql

-- Query: Find client names for a specific hotel and date range
SELECT c.ClientName
FROM Reservations r
JOIN Clients c ON r.ClientID = c.ClientID
WHERE r.DateFrom BETWEEN '2020-08-13' AND '2020-08-14'
  AND r.HotelID = 3;

-- Covering index on Reservations (HotelID, DateFrom, ClientID)
-- eliminates table lookups for the Reservations table entirely.
-- The database traverses only the index B-tree.

The Range Query Execution Model

Range queries are the most common use case for B-tree indexes. They restrict data based on conditions like "Column A must be between X and Y." The execution model for a properly indexed range query is remarkably efficient: the database traverses the B-tree to find the first matching entry, then reads sequentially through the sorted leaf pages until it passes the range boundary. The total number of disk blocks read is determined by a simple formula.

1 B-tree Traversal (D blocks)

The database reads D blocks to navigate from the root of the B-tree to the first matching leaf node. D is the depth of the tree. A 2-level tree fits approximately 1 million rows. A 3-level tree fits approximately 1 billion rows. For most production tables, D is 3 or 4.

2 Sequential Leaf Read (1 block for the first entry)

Once the first matching leaf node is found, the database reads that block. Because B-tree leaf pages are linked via double pointers, the database follows the chain sequentially to read subsequent matches without any additional tree traversal.

3 Table Lookups (R blocks for non-covering indexes)

For each matching index entry, if the index is not covering, the database performs a random read to the base table to fetch the remaining columns. R equals the number of rows returned. This is the expensive part and the reason covering indexes exist.

The block formula: Total blocks read = D + 1 + R, where D is tree depth and R is rows returned (for non-covering indexes). For a covering index, R drops to the number of additional leaf pages needed to hold all matching entries, which is typically far fewer than R table lookups. This formula explains why range queries on a billion-row table can execute in under 10 milliseconds when properly indexed: D=3, plus a handful of leaf pages, equals single-digit block reads.

Why Functions on Indexed Columns Kill Performance

One of the most common indexing mistakes we see in staff augmentation engagements is wrapping indexed columns in functions within WHERE clauses. When you apply a function to an indexed column, the database cannot use the index's sorted order because the function transforms the column value into something the index does not contain. The optimizer falls back to scanning every entry in the index and applying the function to each one.

Query That Defeats the Index:

✗ Applying TO_DATE() or CAST() to the indexed column forces the engine to evaluate every row in the index

✗ The optimizer sees the function as a black box and cannot determine which range of sorted values to traverse

✗ Execution reads every index entry instead of navigating directly to the matching range

Rewritten Query That Uses the Index:

✓ Move the function to the comparison side: compare the raw column value against computed constants

✓ The optimizer recognizes a standard range condition on the indexed column

✓ Execution traverses directly to the first match and reads sequentially until the boundary

Here is the concrete example. Suppose you want all reservations where the check-in date falls on August 15, 2020, and the CheckInTime column stores a full timestamp. The instinctive query wraps the column in a date-extraction function:

sql

-- BAD: Function on the indexed column prevents range scan
SELECT c.ClientName
FROM Reservations r
JOIN Clients c ON r.ClientID = c.ClientID
WHERE TO_DATE(r.CheckInTime, 'YYYY-MM-DD') = '2020-08-15';
-- Execution: scans EVERY index entry, applies TO_DATE() to each

-- GOOD: Range condition on the raw column enables B-tree traversal
SELECT c.ClientName
FROM Reservations r
JOIN Clients c ON r.ClientID = c.ClientID
WHERE r.CheckInTime >= TO_DATE('2020-08-15 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
  AND r.CheckInTime <  TO_DATE('2020-08-16 00:00:00', 'YYYY-MM-DD HH24:MI:SS');
-- Execution: navigates to first match, reads sequentially, stops at boundary

The rewritten query produces identical results but uses a completely different execution plan. The database locates the first entry where CheckInTime is midnight on August 15, reads forward through the sorted index, and stops as soon as it encounters midnight on August 16. The difference can be orders of magnitude: scanning 500,000 index entries versus reading 47 entries directly.

LIKE Patterns and Index Behavior

The position of the wildcard in a LIKE pattern determines whether the database can use the B-tree's sorted order. This distinction is critical for query design and is frequently misunderstood by developers who assume that any LIKE query benefits from an index on the filtered column.

Leading Wildcard: LIKE '%son'

A leading wildcard means "any characters followed by 'son'." The B-tree sorts values from their first character, so the database has no way to skip to names ending with "-son." It must read every entry in the index and check each one. This is functionally a full index scan, which for most scenarios is slower than a full table scan because the index scan adds the overhead of jumping back to the table for each match.

● The sorted order offers zero benefit because the matching criterion is the suffix, not the prefix

● Every index entry must be evaluated, plus a table lookup for each match to retrieve non-indexed columns

● A full table scan is typically faster because it avoids the random I/O of jumping between index and table

Trailing Wildcard: LIKE 'Rob%'

A trailing wildcard means "starts with 'Rob' followed by any characters." The B-tree's sorted order is perfectly suited for this pattern. The database navigates to the first entry starting with "Rob," reads forward through the sorted leaf pages, and stops as soon as it encounters a value that no longer starts with "Rob." This is a standard range scan, as efficient as a BETWEEN query.

● The optimizer translates LIKE 'Rob%' into an equivalent range: values >= 'Rob' AND < 'Roc'

● Only matching entries are read from the index, with immediate termination when the prefix boundary is crossed

● Performance scales with the number of matches, not the size of the table

sql

-- Leading wildcard: full index scan, then table lookups (SLOW)
SELECT FirstName, LastName FROM Clients WHERE LastName LIKE '%son';
-- Reads every entry in IX_LastName, checks suffix for each

-- Trailing wildcard: B-tree range scan (FAST)
SELECT FirstName, LastName FROM Clients WHERE LastName LIKE 'Rob%';
-- Navigates to 'Rob', reads forward, stops at 'Roc'
-- Equivalent to: WHERE LastName >= 'Rob' AND LastName < 'Roc'

Database Performance Holding You Back?

We embed senior database engineers into your team who understand B-tree internals, query optimizer behavior, and partitioning strategy. Our staff augmentation model lets you add deep SQL expertise without long-term hiring overhead.

Talk to Our Team

B-tree Scaling: Why Table Size Is Rarely the Problem

When queries slow down as data volume grows, the instinctive reaction is to blame table size. This is almost always wrong for properly indexed range queries. The total blocks read for a range query using an index follows the formula B = D + 1 + R, where D is tree depth and R is rows returned. The only parameter that depends on total table size is D, and D grows logarithmically.

Table Size	B-tree Depth (D)	Blocks for 50-Row Range Query	Impact of 10x Growth
10,000 rows	2	53 blocks (2 + 1 + 50)	—
1,000,000 rows	2-3	53-54 blocks	+1 block (negligible)
100,000,000 rows	3	54 blocks	+1 block
1,000,000,000 rows	3-4	54-55 blocks	+1 block

Growing from 10,000 rows to 1 billion rows adds at most 2 additional block reads to a 50-row range query. The execution time difference is unmeasurable. If your range queries are slowing down as data grows, the cause is almost certainly not increased tree depth. It is fragmentation, missing indexes, or queries that do not use range conditions effectively.

The Real Bottleneck: Index Fragmentation

B-tree indexes maintain sorted order by splitting blocks when they become full. When a new entry needs to be inserted into a full leaf block, the database splits it into two half-full blocks. Over time, with ongoing inserts, updates, and deletes, the physical order of blocks on disk diverges from the logical order of the index. This fragmentation means that "sequential" reads through the index actually involve random disk seeks between non-contiguous blocks.

How Fragmentation Accumulates

When a B-tree block splits, both resulting blocks are approximately half-full. If subsequent inserts fill one of those blocks again, it splits again, creating more half-full blocks scattered across the disk. Deletes leave gaps in existing blocks that may never be reused if the deleted values are not close to future inserts. The worst-case fragmentation scenario can result in reading twice as many blocks as a defragmented index would require for the same query.

● Block splits create non-contiguous storage: logical neighbors in the index may be physically distant on disk

● Deleted entries leave space that may not be reclaimed, reducing the effective density of each block

● Sequential index scans degrade to random I/O as fragmentation increases, negating the B-tree's performance model

● The fix is reindexing (defragmentation): the database recreates the index with optimally packed, physically sequential blocks

sql

-- SQL Server: Check fragmentation level
SELECT
    OBJECT_NAME(ips.object_id) AS TableName,
    i.name AS IndexName,
    ips.avg_fragmentation_in_percent,
    ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') ips
JOIN sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 30
ORDER BY ips.avg_fragmentation_in_percent DESC;

-- Rebuild fragmented indexes
ALTER INDEX IX_Reservations_DateFrom ON Reservations REBUILD;

-- PostgreSQL equivalent
REINDEX INDEX ix_reservations_datefrom;

Table Partitioning: Maintenance Solution, Query Complexity

When tables grow to billions of rows, the operational challenge shifts from query speed to maintenance. Deleting old data row-by-row is prohibitively slow and leaves behind massive fragmentation. Reindexing a billion-row table can take hours. Table partitioning solves these maintenance problems by splitting the table into smaller, independently manageable chunks, typically by date ranges.

1 Partition Pruning

When a query includes a condition on the partition key (e.g., DateFrom in a date-partitioned table), the database engine automatically excludes partitions that cannot contain matching data. A query for March data only reads the March partition, completely ignoring January, February, and every other month.

2 Instant Data Removal

Dropping a partition is a metadata operation, not a row-by-row delete. Removing a month of data from a billion-row table takes milliseconds instead of hours. No fragmentation is created because no individual rows are deleted.

3 Independent Maintenance

Each partition can be reindexed, backed up, or restored independently. Reindexing the current month's partition while last month's partition remains untouched allows maintenance during shorter windows with less system impact.

Partitioning and Index Strategy

The critical decision in a partitioned table is how indexes interact with partitions. The three approaches, global indexes, globally partitioned indexes, and local indexes, each have different trade-offs for query performance and maintenance.

Index Type	Query Performance	Maintenance Impact	Best For
Global Index	Fastest for non-partition-key queries (single B-tree traversal)	Dropping a partition requires full index rebuild and potential downtime	Read-heavy workloads where partitions are rarely dropped
Globally Partitioned	Same as global index for most queries	Dropping a table partition only requires dropping the matching index partition	Balanced workloads with regular partition maintenance
Local Index	Queries without the partition key must scan every partition's local index	Zero downtime for partition operations, each partition is independent	High-write workloads with frequent partition drops and queries that always include the partition key

sql

-- Local index problem: query without partition key
-- must scan every partition's local index
SELECT DateFrom, DateTo
FROM Reservations
WHERE ClientID = 124 AND HotelID = 1;
-- With local IX_HotelID_ClientID, the RDBMS reads this index
-- from EVERY partition (Jan, Feb, Mar, Apr...) separately.
-- 12 monthly partitions = 12 separate index traversals.

-- Partition-aligned query: only reads relevant partition
SELECT HotelID, ReservationID, ClientID, DateFrom, DateTo
FROM Reservations
WHERE DateFrom BETWEEN '2021-03-01' AND '2021-03-03';
-- Partition pruning: only the March partition is accessed.
-- Local index on DateFrom within that partition works efficiently.

Practical Index Design Checklist

Based on our work across software outsourcing projects with SQL-heavy workloads, here is the index design checklist our database engineers follow before any production deployment:

Identify hot queries—profile the top 10 queries by execution frequency and cumulative CPU time. These are the queries worth optimizing with indexes.

Check for function-wrapped columns—audit WHERE clauses for TO_DATE(), UPPER(), CAST(), or any function applied to indexed columns. Rewrite as range conditions.

Evaluate covering index candidates—for high-frequency queries, check if adding 1-2 INCLUDE columns would eliminate table lookups entirely.

Audit LIKE patterns—replace leading-wildcard queries with full-text search or reverse-column indexes where possible. Ensure trailing-wildcard queries have prefix-compatible indexes.

Schedule reindexing—establish a fragmentation monitoring threshold (typically 30%) and automate reindex operations during low-traffic windows.

Partition strategy decision—for tables exceeding 500 million rows with data retention requirements, evaluate whether local or globally partitioned indexes best fit your query patterns.

The Bottom Line

SQL performance at scale comes down to understanding three things: how B-tree indexes serve range queries (with logarithmic depth and sequential leaf reads), what prevents the optimizer from using indexes effectively (functions on columns, leading wildcards, missing covering columns), and when partitioning helps versus introduces new complexity (maintenance benefits vs. local index limitations). Teams that invest in these fundamentals routinely achieve 10x-100x query performance improvements without any infrastructure changes, because the bottleneck was never the hardware. It was the gap between what the developer wrote and what the database engine needed to execute efficiently.

3-4

B-tree Depth for 1B Rows

100x

Typical Query Speedup

30%

Fragmentation Threshold

31+

DB Optimization Projects

Frequently Asked Questions

What is a range query in SQL and why does it matter for indexing?

A range query restricts results to rows where a column's value falls between two boundaries, using operators like BETWEEN, greater-than, less-than, or bounded LIKE patterns. Range queries matter for indexing because B-tree indexes store data in sorted order, which means once the database finds the first matching entry, it can read sequentially through the sorted leaf pages and stop as soon as it passes the upper boundary. This transforms what would be a full table scan into a targeted read of only the matching rows, plus the logarithmic cost of navigating the B-tree to the starting point. The efficiency of a range query depends entirely on whether the database can recognize the condition as a range scan on the indexed column without any function transformation.

How does a covering index improve SQL query performance?

A covering index includes all columns that a query references: the columns in WHERE conditions, JOIN conditions, and the SELECT list. When every needed column exists in the index, the database satisfies the entire query by reading only the index without ever accessing the base table. This eliminates the most expensive operation in non-covering index queries: the key lookup, where the database uses the row address from the index to perform a random read to the table for each matching row. For a query returning 50 rows, a covering index might read 5 index leaf pages sequentially, while a non-covering index would read those 5 pages plus 50 random table pages. The trade-off is that covering indexes consume more storage and slightly increase the cost of INSERT, UPDATE, and DELETE operations because the index has more columns to maintain.

When should you use table partitioning versus adding more indexes?

Table partitioning and indexes solve different problems. Indexes optimize query retrieval speed by enabling the database to find matching rows without scanning the entire table. Partitioning optimizes data management by splitting the table into independently maintainable segments. Use indexes when your queries are slow due to full table scans or key lookups. Use partitioning when your operational challenges are about deleting historical data efficiently, reducing reindex time for maintenance windows, or isolating backup and restore operations. Partitioning can also improve query performance if your queries consistently filter on the partition key (enabling partition pruning), but it can degrade performance if queries frequently access columns that are not the partition key and you are using local indexes, because the database must scan the local index in every partition separately.

Why does applying a function to an indexed column prevent index usage?

B-tree indexes store the raw column values in sorted order. When you apply a function to the column in a WHERE clause (such as TO_DATE(column) or UPPER(column)), the database needs to compare the function's output against the search value. Since the index contains the raw values, not the function results, the database cannot use the B-tree's sorted structure to navigate directly to matching entries. Instead, it must read every entry in the index, apply the function to each value, and check whether the result matches the search condition. This is effectively a full index scan, which is only marginally faster than a full table scan. The solution is to restructure the condition so the indexed column remains on one side of the comparison without any transformation, and the function is applied to the constant values on the other side.

SQL Indexes: Range Queries and Partitioning