How to count referrals/patients/anything between two dates in a given period of time.
I wrote a similar blog, with almost the same title and everything, but with R and now SQL gets its time to shine! I said in the previous blog that I’d spent a long time puzzling over the problem of counts over a period of time, particularly for referrals or caseload; in truth, most of that problem solving was for SQL and the R code was based on the underlying theory.
The blog referred to these scenarios of open referrals/caseload in a given period:
The methodology for R was different because I could use specific verbs: complete() and seq.Date() to create the observations/rows between dates but these don’t exist (that I know of) in SQL. To do the same thing in SQL this needs to be done using a JOIN.
I’ve coded with SQL for nearly a decade and still struggle with reproducing data in it whereas in R it’s a doddle2 so I exported the R data frame I created and then imported it into SQL. It did mean that I could double check the final counts were the same which was useful!
The data warehouse that I use has a look up table of dates from 1900 until well into the future, with a single row of observations for each day. Other information is in there like, what was the day of the week, what financial year was it, is it today? But in the absence of such a table you will need to create a table with dates like so3:
--Create date sequence in SQL using the first start date and the last end date from the data,
--in this case I saved it in the SQL table under my own schema ZT and called the table Data
DECLARE @startDate date = (SELECT MIN(start_date) FROM ZT.Data),
date = (SELECT MAX(end_date) FROM ZT.Data);
@endDate
SELECT DATEADD(day, number - 1, @startDate) AS [Date]
INTO #calendar
FROM (
SELECT ROW_NUMBER() OVER (
ORDER BY n.object_id
)FROM sys.all_objects n
number)
) S(WHERE number <= DATEDIFF(day, @startDate, @endDate) + 1;
Also adding in the floored dates for month and year4
SELECT *
MONTH,DATEDIFF(MONTH,0,date),0) AS month_year
,DATEADD(YEAR,DATEDIFF(YEAR,0,date),0) AS [year]
,DATEADD(INTO #extended
FROM ZT.[data]
INNER JOIN #calendar AS cal ON [start_date] <= cal.date AND [end_date] > = cal.date
To match the previous counts in R where the individual referrals by patient were counted in a period, in SQL you have to merge the two data together because DISTINCT(patient_id, referral_id) won’t work
By month and year
SELECT month_year
COUNT(DISTINCT CONCAT(patient_id, referral_id))
,FROM #extended
GROUP BY month_year
ORDER BY month_year
By year
SELECT [year]
COUNT(DISTINCT CONCAT(patient_id, referral_id))
,FROM #extended
GROUP BY [year]
ORDER BY [year]
If you only want one period of time and don’t want, or need, to create the calendar lookup counts can be done for ‘being open’ in the WHERE clause5:
DECLARE @start_period datetime2 = '2020-12-01 00:00:00.0000000'
DECLARE @end_period datetime2 = '2021-12-31 00:00:00.0000000'
SELECT COUNT(DISTINCT CONCAT(patient_id, referral_id))
FROM ZT.data
WHERE ((end_date >= @start_period) AND [start_date] <= @end_period)
Or more explicitly:
DECLARE @start_period datetime2 = '2020-12-01 00:00:00.0000000'
DECLARE @end_period datetime2 = '2021-12-31 00:00:00.0000000'
SELECT COUNT(DISTINCT CONCAT(patient_id, referral_id))
FROM ZT.data
--Started before period and ended after period or still open (Patient 6)
WHERE (([start_date] < @start_period AND (end_date > = @end_period))
--Started and ended in period (Patient 3) and (Patient 2)
OR ([start_date] < = @end_period AND end_date > = @start_period)
--Started in period and ended after period or still open (Patient 4)
OR ([start_date]> = @start_period AND (end_date > = @end_period)))
This is possibly something analysts across the NHS have solved so I’d be really interested in hearing about any other ways of creating caseloads or counting open referrals in a given period.
The link no longer works but I’ve emailed AphA about the document which was here https://www.aphanalysts.org/wp-content/uploads/2016/08/JOIS_2016_038_Diagnosing_the_Flow_Constraint_i.pdf↩︎
Informal British meaning very easy to do↩︎
From a blog by Andrey Zavadskiy↩︎
From Stackoverflow↩︎
The code for this WHERE clause was shared with me from a previous colleague, Barney Lawrence whose blog and Twitter are worth checking out for more SQL information.↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Turner (2021, May 2). Blog: Creating a caseload over time in SQL. Retrieved from https://philosopher-analyst.netlify.app/posts/2021-05-02-creating-a-caseload-over-time-in-sql/
BibTeX citation
@misc{turner2021creating, author = {Turner, Zoë}, title = {Blog: Creating a caseload over time in SQL}, url = {https://philosopher-analyst.netlify.app/posts/2021-05-02-creating-a-caseload-over-time-in-sql/}, year = {2021} }