October 2019

Background

6 days work by 1 analyst

Only completed 3x in a year

9 clinical databases on 2 systems (Rio and S1)

2 non clinical datasets (incidents and Coroners Verdict)

How?…

Excel copying, checking, pasting and ultimately: VLOOKUPS

But…

What about patients who access multiple services?

Do deaths by ethnicity and gender reflect the same Public Health patterns?

SQL solution v1.0

First attempt is wide data.

PatientID SystmOne Rio instance1 Rio instance2 Rio instance3
Patient A 1 1 0 0

Wide and long form data

id Ward Referrals
Patient A Ward 1 Team A
Patient B Ward 2 Team B

Long table (in R uses gather function in dplyr)

id Activity Detail
Patient A Ward Ward 1
Patient A Referrals Team A
Patient B Ward Ward 2
Patient B Referrals Team B

Further reading

SQL solution v2.0

PatientID System DOB DOD
Patient A SystmOne 01/01/1900 02/02/1910
Patient A Rio instance1 02/02/2010
Patient A Rio instance2 01/01/2000

Data warehouse rights

New server = BORIS (no rights to update/delete/drop table)

Old server = Temp_tables (full rights)

-- In BORIS newServer

INSERT OPENQUERY([oldServer],'
SELECT *
  FROM [oldServer].[ZT].[PatientIndex]')
SELECT *
 FROM #tmp

SQL solution v2.1

Originally for all patients:

  • 1 script to build
  • OPENQUERY to OldServer
  • 1 script to amend (merged DOD, DOB, Ethnicity, Gender…)

SQL solution v2.2

Now:

  • 1 script to build and amend
  • OPENQUERY only those who have died since 2015

SQL solution v2.3

Now:

  • 1 script to build an amend
  • OPENQUERY only those who have died since 2015

  • AND who are not already in the existing table

Things I learned

  • Back up tables before running any code that makes changes
  • Add an id to any table you create
  • Write notes to your future self
  • Number the chunks of SQL code
  • I probably haven't finished at v2.3 …

Now for the data feed to R…

That's the data created

Data feed before analysis

Back to wide data!

PatientID SystmOIne Rio Instance1 Rio Instance2
Patient A 1 1 0
Patient B 0 1 0
Patient C 1 1 0

SQL MAX(CASE….)

SELECT 
p.MergedID
,p.DateOfBirth_skMerge
,SystmOne = MAX(CASE WHEN p.System = 'S1' THEN 1 ELSE 0 END)
,RioLOCAL = MAX(CASE WHEN p.System = 'LOCAL' THEN 1 ELSE 0 END)
FROM #pat                           AS p 
GROUP BY  p.MergedID
,p.DateOfBirth_skMerge

Output

MergedID SystmOIne Rio Instance1 EthnicityMerged DateofDeath_skMerged
123 1 1 White 20190101
124 0 1 Mixed 20190807
125 1 1 Unknown 20180101

Then back to long data!

MergedID System Description Active LoadedDate_sk
123 S1 NA 1 20190513
123 Rio Referrals AMH 0 20190513
123 Rio Inpatients Ward A 0 20190513
126 S1 NA 1 20190513

Now to the analysis in R…

This was presented originally to R-Ladies