SQL: how to determine which rows represent a range of consecutive values?

SQL Practical Cases: learning by example

Problem

You want to determine which rows represent a range of consecutive projects. Consider the following result set from view V, which contains data about a project and its start and end dates:

	select *
	  from V

	PROJ_ID PROJ_START  PROJ_END
	------- ----------- -----------
	      1 01-JAN-2005 02-JAN-2005
	      2 02-JAN-2005 03-JAN-2005
	      3 03-JAN-2005 04-JAN-2005
	      4 04-JAN-2005 05-JAN-2005
	      5 06-JAN-2005 07-JAN-2005
	      6 16-JAN-2005 17-JAN-2005
	      7 17-JAN-2005 18-JAN-2005
	      8 18-JAN-2005 19-JAN-2005
	      9 19-JAN-2005 20-JAN-2005
	     10 21-JAN-2005 22-JAN-2005
	     11 26-JAN-2005 27-JAN-2005
	     12 27-JAN-2005 28-JAN-2005
	     13 28-JAN-2005 29-JAN-2005
	     14 29-JAN-2005 30-JAN-2005

Excluding the first row, each row’s PROJ_START should equal the PROJ_END of the row before it (“before” is defined as PROJ_ID–1 for the current row). Examining the first five rows from view V, PROJ_IDs 1 through 3 are part of the same “group” as each PROJ_END equals the PROJ_START of the row after it. Because you want to find the range of dates for consecutive projects, you would like to return all rows where the current PROJ_END equals the next row’s PROJ_START. If the first five rows comprised the entire result set, you would like to return only the first three rows. The final result set (using all 14 rows from view V) should be:

	PROJ_ID PROJ_START  PROJ_END
	------- ----------- -----------
	     1  01-JAN-2005 02-JAN-2005
	     2  02-JAN-2005 03-JAN-2005
	     3  03-JAN-2005 04-JAN-2005
	     6  16-JAN-2005 17-JAN-2005
	     7  17-JAN-2005 18-JAN-2005
	     8  18-JAN-2005 19-JAN-2005
	    11  26-JAN-2005 27-JAN-2005
	    12  27-JAN-2005 28-JAN-2005
	    13  28-JAN-2005 29-JAN-2005

The rows with PROJ_IDs 4,5,9,10, and 14 are excluded from this result set because the PROJ_END of each of these rows does not match the PROJ_START of the row following it.

Solution

This solution takes best advantage of the window function LEAD OVER to look at the “next” row’s BEGIN_DATE, thus avoiding the need to self join, which was necessary before window functions were widely introduced:

	1 select proj_id, proj_start, proj_end
	2   from (
	3 select proj_id, proj_start, proj_end,
	4        lead(proj_start)over(order by proj_id) next_proj_start
	5   from V
	6        ) alias
	7 where next_proj_start = proj_end

Discussion

Applies to: DB2, MySQL, PostgreSQL, SQL Server and Oracle

Although it is possible to develop a solution using a self-join, the window function LEAD OVER is perfect for this type of problem, and more intuitive. The function LEAD OVER allows you to examine other rows without performing a self join (though the function must impose order on the result set to do so). Consider the results of the inline view (lines 3–5) for IDs 1 and 4:

	select *
	  from (
	select proj_id, proj_start, proj_end,
	       lead(proj_start)over(order by proj_id) next_proj_start
	  from v
	       )
	 where proj_id in ( 1, 4 )

	PROJ_ID PROJ_START  PROJ_END    NEXT_PROJ_START
	------- ----------- ----------- ---------------
	      1 01-JAN-2005 02-JAN-2005 02-JAN-2005
	      4 04-JAN-2005 05-JAN-2005 06-JAN-2005

 Examining the above snippet of code and its result set, it is particularly easy to see why PROJ_ID 4 is excluded from the final result set of the complete solution. It’s excluded because its PROJ_END date of 05-JAN-2005 does not match the “next” project’s start date of 06-JAN-2005.

The function LEAD OVER is extremely handy when it comes to problems such as this one, particularly when examining partial results. When working with window functions, keep in mind that they are evaluated after the FROM and WHERE clauses, so the LEAD OVER function in the preceding query must be embedded within an inline view. Otherwise the LEAD OVER function is applied to the result set after the WHERE clause has filtered out all rows except for PROJ_ID’s 1 and 4.

Now, depending on how you view the data, you may very well want to include PROJ_ID 4 in the final result set. Consider the first five rows from view V:

	select *
	  from V
	 where proj_id <= 5

	PROJ_ID PROJ_START  PROJ_END
	------- ----------- -----------
	      1 01-JAN-2005 02-JAN-2005
	      2 02-JAN-2005 03-JAN-2005
	      3 03-JAN-2005 04-JAN-2005
	      4 04-JAN-2005 05-JAN-2005
	      5 06-JAN-2005 07-JAN-2005

If your requirement is such that PROJ_ID 4 is in fact contiguous (because PROJ_ START for PROJ_ID 4 matches PROJ_END for PROJ_ID 3), and that only PROJ_ ID 5 should be discarded, the proposed solution for this recipe is incorrect (!), or at the very least, incomplete: 

	select proj_id, proj_start, proj_end
	  from (
	select proj_id, proj_start, proj_end, 
	       lead(proj_start)over(order by proj_id) next_start
	  from V
	where proj_id <= 5
	      )
	where proj_end = next_start

	PROJ_ID PROJ_START  PROJ_END
	------- ----------- -----------
	      1 01-JAN-2005 02-JAN-2005
	      2 02-JAN-2005 03-JAN-2005
	      3 03-JAN-2005 04-JAN-2005

If you believe PROJ_ID 4 should be included, simply add LAG OVER to the query and use an additional filter in the WHERE clause:

	select proj_id, proj_start, proj_end
	  from (
	select proj_id, proj_start, proj_end, 
	       lead(proj_start)over(order by proj_id) next_start,
	       lag(proj_end)over(order by proj_id) last_end
	  from V
	where proj_id <= 5
	      )
	where proj_end = next_start
	   or proj_start = last_end

	PROJ_ID PROJ_START  PROJ_END
	------- ----------- -----------
	      1 01-JAN-2005 02-JAN-2005
	      2 02-JAN-2005 03-JAN-2005
	      3 03-JAN-2005 04-JAN-2005
	      4 04-JAN-2005 05-JAN-2005

Now PROJ_ID 4 is included in the final result set, and only the evil PROJ_ID 5 is excluded. Please consider your exact requirements when applying these recipes to your code.

Вас заинтересует / Intresting for you:

A Brief History of SQL
A Brief History of SQL 2835 views Александров Попков Wed, 17 Oct 2018, 15:04:29
Referencing an Aliased Column ...
Referencing an Aliased Column ... 225 views Денис Wed, 14 Jul 2021, 12:59:03
How to Retrieving a Subset of ...
How to Retrieving a Subset of ... 242 views Денис Wed, 14 Jul 2021, 04:45:37
SQL: Locating the Beginning an...
SQL: Locating the Beginning an... 232 views Денис Sat, 10 Jul 2021, 18:02:52