Creating sample datetime data in R

R Generate data

How to produce a reproducible example (reprex) for datetime

Zoë Turner
02-05-2021
Frosty leaves

Figure 1: Frosty leaves

The product of my drifting mind

I was writing a blog for my team’s blog about working in the open and midway I drifted into writing code. For some reason I had a burning desire to highlight a common issue in SQL with the use of BETWEEN for dates and then ended up spending working out how to create fake date/time data in R.

Reading through my team blog I realised it didn’t make sense to have the R data creation code in it so I’ve promoted it to its own blog - this! The SQL thing with BETWEEN will also get its own blog in due course.

Reproducible examples

I like producing reprexes1 and I have a few gists where I’ve answered some questions from places like Stackoverflow and R Studio Community by first creating reprexes. These were questions which were removed, unfortunately, before I had a chance to send the reply. One time I had an answer, with a reprex, in just 20 minutes and was about to post, but it had been deleted! I wasn’t prepared to lose that code so I posted it in my own GitHub gist and I’ve used the reprex code many times since.

Reproducing what?

I was trying to recreate the SQL date time format YYYY-MM-DD hh:mm:ss[.nnn] that I have in my work’s data warehouse. For this example I’ve only reproduced a random sample in the SmallDateTime YYYY-MM-DD hh:mm:ss format.

Losing the code

I spent a long time trying to work out how to randomly sample the constituent parts of a date time (including hours, minutes and seconds), even using base R code and the chron package to get hm (but not s)2.

By the end of the day I was very tired and I did a terrible thing: I cut the code from an Untitled and unsaved file, moved to another R project and then copied something else. I didn’t stop there. Oh no, I then copied and pasted several things then, when I finally went to paste what I’d cut originally it was, of course, gone.

Windows history clipboard

Turns out Windows 10 has a clipboard history but you have to switch it on by going to Windows settings/Clipboard settings and switch on history.

I looked for a GIPHY for “nice to know” but none conveyed the right level of sarcasm for that phrase.

Every cloud has a silver lining and all that…

Frantically re-writing code after deleting huge swathes means that you do get an opportunity to improve the code. At least, that’s what I told myself. And so I re-wrote what I could remember and realised I’d missed the seconds and also how I’d not even checked the {lubridate} package which does indeed produce sequential dates and time which can be sampled:

library(lubridate)

lubridate_dhms <- data.frame(
  hms = sample(seq(ymd_hms("2020-1-1 0:00:00"), ymd_hms("2021-1-1 0:00:00"), 
                    by = "hour"), 15)
)

But the by = "" only accepts hour, not minute or second so those are all 00:00.

Help!!!!

By this point I was a bit fed up so I did what all good coders who have exploited the internet for help do - I asked my NHS-R Community colleagues on Slack:

Is there any way to generate random hours, minutes and seconds for made-up data?

And I included examples of what I’d attempted.

Thus ensued a great thread with my boss, Chris Beeley, who answered the question within minutes.

base_r_dhms <- data.frame(
  sample(seq(as.POSIXlt("2020-10-01"),
      as.POSIXlt("2020-10-10"), by = 1), 15)
)

I asked what the by = 1 means and Chris confirmed this was a 1 second interval so the seq(…) creates the sequence at 1 seconds and then the sample() takes, in this case, 15 data points from this sequence.

It’s also possible to write “s” or “sec” in place of the 1:

base_r_dhms_same <- data.frame(
  sample(seq(as.POSIXlt("2020-10-01"),
      as.POSIXlt("2020-10-10"), by = "s"), 15)
)

# or

base_r_dhms_same <- data.frame(
  sample(seq(as.POSIXlt("2020-10-01"),
      as.POSIXlt("2020-10-10"), by = "sec"), 15)
)

Because I’d also asked about {lubridate} generating random minutes and seconds, and Chris was having too much fun with this he answered that too:

# using the code I shared that generates random dates and hours

hour_min_sec <- data.frame(
  hms = seq(ymd_hms("2020-1-1 0:00:00"), ymd_hms("2021-1-1 0:00:00"), 
                    by = "hour")
)

# updating the data frame with random seconds and updating the data
lubridate::second(hour_min_sec$hms) <- sample(0 : 59, nrow(hour_min_sec), replace = TRUE)

# updating the data frame with random minutes and updating the data
lubridate::minute(hour_min_sec$hms) <- sample(0 : 59, nrow(hour_min_sec), replace = TRUE)

Happy ending

In response to my saying:

You won’t believe how long I’ve been working on this and how many lines of code I have written!

Chris said:

I bet you learned a lot though

And it’s true.


  1. Reproducible example or data that has been copied or made up to help explain a problem in data↩︎

  2. I used this blog http://datacornering.com/how-to-generate-time-intervals-or-date-sequence-in-r/↩︎

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Turner (2021, Feb. 5). Blog: Creating sample datetime data in R. Retrieved from https://philosopher-analyst.netlify.app/posts/2021-02-05-creating-sample-datetime-data-in-r/

BibTeX citation

@misc{turner2021creating,
  author = {Turner, Zoë},
  title = {Blog: Creating sample datetime data in R},
  url = {https://philosopher-analyst.netlify.app/posts/2021-02-05-creating-sample-datetime-data-in-r/},
  year = {2021}
}