name: hello class: title-slide, right, bottom # Why do I need to learn R when I can use SQL? ---- ## Zoë Turner, Senior Information Analyst ## November 2020 --- ### Why a talk on this? .pull-left[ <img src="https://media.giphy.com/media/L0lW0pxa3HbD128xO9/giphy.gif" height="450px" /> ] .pull-right[ * Analysts in the NHS and Local Authorities use SQL and Excel for their analysis. * 'I can do everything I learned in the introduction course in SQL. Why do I need R?' * Really, what's all the fuss about? ] --- ### SQL SPC .left-col[.center[ <video width="640" height="470" controls> <source src="videos/SPC_SQL_code_words.mp4" type="video/mp4"> </video> ]] .footnote[SPC = Statistical Process Control] --- background-image: url("https://media.giphy.com/media/KZNGTvSbtlHR90BpyM/giphy.gif") background-position: center background-size: contain --- ### A better way is to... .pull-left[Use a package like [qicharts2](https://cran.r-project.org/web/packages/qicharts2/index.html) from CRAN (or [runcharter](https://github.com/johnmackintosh/runcharter) or [spcccharter](https://github.com/johnmackintosh/spccharter) by @_johnmackintosh and code can be as small as this... ```r qic(month, n, days, data = hospital_infections, * facets = infection ~ hospital, chart = 'u', multiply = 10000, scales = 'free_y', x.angle = 45, title = 'Hospital acquired infections in the Capital Region of Denmark', ylab = 'Cases per 10,000 risk days', xlab = 'Month') ``` ] -- .pull-right[...creating multiple charts in one go ![](index_files/figure-html/unnamed-chunk-1-1.png)<!-- --> ] --- class: middle, center ## But this isn't about charts, this is about code... So let's compare R and SQL coding to create dummy/fake data --- ### SQL .pull-left[ <img src="img/sqlFillDown.PNG" height="450px" /> ] .pull-right[ * 76 lines of code * I may have many years' experience but I spent several hours learning to produce fake data * this requires a server to run, even when using dummy data ] --- ### And now in R .pull-left[ ![](img/rInSQLForComparison.PNG) ] .pull-right[ * 25 lines of code * I have 2 years' experience and took less than an hour to produce fake data * This only requires R/RStudio (free software) and existing hardware (my laptop) ] --- ### Filling down Imagine you have some data and you want to [fill the empty cells with previous row value](https://stackoverflow.com/questions/3465847/sql-how-to-fill-empty-cells-with-previous-row-value): <table> <thead> <tr> <th style="text-align:right;"> Dates </th> <th style="text-align:right;"> Customer </th> <th style="text-align:right;"> Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 20100101 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 20100102 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 20100101 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 20100102 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 20100101 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 32 </td> </tr> <tr> <td style="text-align:right;"> 20100102 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 39 </td> </tr> <tr> <td style="text-align:right;"> 20100101 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 42 </td> </tr> <tr> <td style="text-align:right;"> 20100102 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 20100101 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 15 </td> </tr> <tr> <td style="text-align:right;"> 20100102 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 16 </td> </tr> </tbody> </table> --- ### SQL code requires CTE <br/> ... roughly 34 lines of SQL code <img src="img/sqlFillScreenshot.PNG" height="390px"/> .footnote[CTE = Common Text Expression] --- ### R code <br/> .pull-left[ In contrast, R only take 1 line of code ```r example %>% fill(Rule, .direction = 'down') ``` ] .pull-right[ leaving lots of space for a GIF ![](https://media.giphy.com/media/xT8qBhrlNooHBYR9f2/giphy.gif) ] --- ### Still not convinced... .pull-left[ <img src="https://media.giphy.com/media/yj5oYHjoIwv28/giphy.gif" width="90%"/> ] .pull-right[ **Let's see if I can convince you with...** ] --- ### Two words... -- ![](https://media.giphy.com/media/dwpbGUm18BAfm/giphy.gif) .pull-right[ # Pivot Tables ] --- ### Pivoting .panelset[ .panel[.panel-name[Long to Wide data] .pull-left[ Let's take a small long data set like this... <table> <thead> <tr> <th style="text-align:right;"> Patient </th> <th style="text-align:left;"> System </th> <th style="text-align:right;"> DateofBirth_sk </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> S1 </td> <td style="text-align:right;"> 19680103 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> RIO </td> <td style="text-align:right;"> 19680103 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> IAPT </td> <td style="text-align:right;"> 19680103 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> S1 </td> <td style="text-align:right;"> 19970509 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> RIO </td> <td style="text-align:right;"> 19471209 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> S1 </td> <td style="text-align:right;"> 19660321 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> IAPT </td> <td style="text-align:right;"> 19780131 </td> </tr> </tbody> </table> ] .pull-right[ and turn it into this wide... <table> <thead> <tr> <th style="text-align:right;"> Patient </th> <th style="text-align:right;"> S1 </th> <th style="text-align:right;"> RIO </th> <th style="text-align:right;"> IAPT </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> ] ] <!--end--> .panel[.panel-name[SQL operators] .green[Hard coded] ```r SELECT * FROM ( SELECT Patient, Service, DateofBirth_sk FROM #patients ) AS src PIVOT ( COUNT(DateofBirth_sk) FOR Service IN ([S1], [RIO], [IAPT]) ) AS pivt; ``` ] <!--end--> .panel[.panel-name[SQL Joins] .green[Code is verbose] ```r SELECT p.Patient ,CASE WHEN S1.Patient IS NULL THEN 0 ELSE 1 END AS S1 ,CASE WHEN rio.Patient IS NULL THEN 0 ELSE 1 END AS RIO ,CASE WHEN iapt.Patient IS NULL THEN 0 ELSE 1 END AS IAPT FROM #patients AS p LEFT JOIN (SELECT Patient ,Service FROM #patients WHERE Service = 'S1') AS S1 ON S1.Patient = p.Patient LEFT JOIN (SELECT Patient ,Service FROM #patients WHERE Service = 'RIO') AS rio ON rio.Patient = p.Patient GROUP BY p.Patient ,CASE WHEN S1.Patient IS NULL THEN 0 ELSE 1 END ,CASE WHEN rio.Patient IS NULL THEN 0 ELSE 1 END ``` ] <!--end--> .panel[.panel-name[SQL CASE WHEN] .green[Flexible and concise] ```r SELECT Patient ,DateofBirth_sk ,S1 = MAX(CASE WHEN Service = 'S1' THEN 1 ELSE 0 END) ,Rio = MAX(CASE WHEN Service = 'RIO' THEN 1 ELSE 0 END) ,IAPT = MAX(CASE WHEN Service = 'IAPT' THEN 1 ELSE 0 END) FROM #patients GROUP BY Patient ,DateofBirth_sk ORDER BY Patient ``` ] <!--end--> .panel[.panel-name[R code] .green[About the same amount of code in this example but many more services could be added with no need for more coding.] ```r library(tidyr) wideData <- examplePivot %>% mutate(value = 1) %>% select(-DateofBirth_sk) %>% pivot_wider(names_from = System, values_from = value, values_fill = 0) ``` ] <!--end--> .panel[.panel-name[R code - cont] .green[It's also possible to move data from wide to long and repeat this within the same amount of code... as many times as required] ```r library(tidyr) examplePivot %>% mutate(value = 1) %>% select(-DateofBirth_sk) %>% pivot_wider(names_from = System, values_from = value, values_fill = 0) %>% pivot_longer(cols = -Patient, names_to = "service", values_to = "values") ``` ] <!--end--> ] <!--end of panelset--> --- ### Dummy Data .panelset[ .panel[.panel-name[SQL Dummy Data] .green[Based on real life example of having multiple databases] ```r CREATE TABLE #patients ( Patient integer, Service varchar(5), DateofBirth_sk int ); INSERT INTO #Patients VALUES (1, 'S1', 19680103), (1, 'RIO', 19680103), (1, 'IAPT', 19680103), (3, 'S1', 19970509), (4, 'RIO', 19471209), (5, 'S1', 19660321), (6, 'IAPT', 19780131) ``` ] <!--end--> .panel[.panel-name[R Dummy Data] .green[Based on real life example of having multiple databases] ```r examplePivot <- tibble::tribble( ~Patient, ~System, ~DateofBirth_sk, 1, 'S1', 19680103, 1, 'RIO', 19680103, 1, 'IAPT', 19680103, 3, 'S1', 19970509, 4, 'RIO', 19471209, 5, 'S1', 19660321, 6, 'IAPT', 19780131) examplePivot %>% knitr::kable(format = "html") ``` ] <!--end--> ] <!--end of panelset--> --- name: goodbye class: middle, inverse # **Thank you!** <br/> Acknowledgements: the professional look of this presentation, using NHS and Nottinghamshire Healthcare NHS Foundation Trust colour branding, exists because of the amazing work of Silvia Canelón, details of the workshops she ran at the [NHS-R Community conference](https://spcanelon.github.io/xaringan-basics-and-beyond/index.html). Thanks too to Simon Wellesley-Miller who ran an [R Markdown workshop](https://github.com/SimonW-M/Markdown) at the conference, where I finally found out how to get GIFs into my slides. .warmyellow[It goes without saying that SQL can't produce presentations.] [<svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @Letxuga007](https://twitter.com/Letxuga007)<br/> [<svg viewBox="0 0 496 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @Lextuga007](https://github.com/Lextuga007)<br/> [<svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> zoe.turner2@notthshc.nhs.uk](mailto:zoe.turner2@nottshc.nhs.uk)