How to count referrals/patients/anything between two dates in a given period of time.
My wonderful colleague Lori asked a question on some analysis that I hadn’t really considered for a long time but I had spent an inordinate amount of time solving a few years ago. She wanted to assess caseload1 over time to see if things had changed.
The logic of finding people is summed up nicely here:
It covers all the scenarios of a particular time period:
And needs to exclude those who are:
If the period of time you are looking for is repeated, so over several years or several months, thoughts may turn to some sort of loop to repeatedly count the people.
Stop!
It’s all in the dates. The solution is to create an observation, a row essentially, for every date between the start and end dates. This then offers the flexibility to count by any period.
Note that tables can be huge if you are working with large datasets which cover long periods of time and are creating observations at the day level rather than month or year. Even so, R coped well with a dataset that had 18.5k over several years, ending up as 5,5 million observations. It took about 30 seconds to complete but if you find things are unbearably slow the options are either to increase RAM capacity, use the power of a SQL server, or the other options may be to recode using other packages like data.table.
library(lubridate)
library(tidyverse)
set.seed(130) # so the numbers generated will replicate
# create random start and end dates and ids which can be repeated
data <- data.frame(
start_date = sample(seq(as.Date('2019/01/01'), as.Date('2021/01/01'), by = "day"), 300),
end_date = sample(seq(as.Date('2019/01/01'), as.Date('2021/01/01'), by = "day"), 300),
patient_id = floor(runif(300, min = 1, max = 300))
)
# Add a referral_id which is realistic and also because a patient_id can have multiple dates generated like patient_id 10 for example
data_filtered <- data %>%
filter(end_date > start_date) %>%
group_by(patient_id) %>%
mutate(referral_id = row_number(start_date))
The function that gets all the dates betweeen the start and end date is complete() from the tidyr package (part of tidyverse). In this case the code is creating a sequence using seq.Date and filling in by day. This could be by month or by year but for this example it’s by day as it’s not too big a dataset and gives greater flexibility on the later counts which are by month and then by year.
# Create an observation for every date between the start and end date
data_expanded <- data_filtered %>%
group_by(patient_id,
referral_id) %>%
pivot_longer(cols = ends_with("date"),
names_to = "caseload",
values_to = "dates") %>%
complete(dates = seq.Date(min(dates), max(dates), by="day")) %>%
ungroup() # affects any counts or summarising later
By month and year
data_expanded %>%
mutate(month_year = lubridate::floor_date(dates, "1 month")) %>%
group_by(month_year) %>%
summarise(count = n_distinct(patient_id, referral_id))
# A tibble: 25 × 2
month_year count
<date> <int>
1 2019-01-01 14
2 2019-02-01 25
3 2019-03-01 38
4 2019-04-01 48
5 2019-05-01 57
6 2019-06-01 68
7 2019-07-01 75
8 2019-08-01 78
9 2019-09-01 80
10 2019-10-01 78
# ℹ 15 more rows
By year
data_expanded %>%
mutate(year = lubridate::year(dates)) %>%
group_by(year) %>%
summarise(count = n_distinct(patient_id, referral_id))
# A tibble: 3 × 2
year count
<dbl> <int>
1 2019 120
2 2020 116
3 2021 1
also known as open referrals but this can be anything with a start and end date↩︎
The link no longer works but I’ve emailed AphA about the document which was here https://www.aphanalysts.org/wp-content/uploads/2016/08/JOIS_2016_038_Diagnosing_the_Flow_Constraint_i.pdf↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Turner (2021, May 1). Blog: Creating a caseload over time in R. Retrieved from https://philosopher-analyst.netlify.app/posts/2021-05-01-creating-a-caseload-over-time-in-r/
BibTeX citation
@misc{turner2021creating, author = {Turner, Zoë}, title = {Blog: Creating a caseload over time in R}, url = {https://philosopher-analyst.netlify.app/posts/2021-05-01-creating-a-caseload-over-time-in-r/}, year = {2021} }