Summary

This is an R Markdown report of the analysis of hospital prescription data.

I have transformed the ten original datasets, joined them together and analysed. I have prepared nine questions about this dataset. I have shown the total number and the total price of the prescriptions. Only in hospitals the spending on prescriptions from January till October 2022 consists of around £800M. The most prescribed medical area (by BNF chapter) is Central Nervous System. The most prescribed medicine by cost is Physeptone by number of items is Methadone. I also showed the time series analysis of the growing expenses by month and found the three-month moving average that is around £80M.

Objectives

My aim doing this project was to improve my wrangling skills using dplyr and tidyverse packages. And demonstrate the ability to produce insights from the real world data.

Data

Data comes from NHS Business Service Authority (NHSBSA) : https://www.nhsbsa.nhs.uk/prescription-data/prescribing-data/hospital-prescribing-dispensed-community It is a link where more information can be found.

The link to download actual data is here https://opendata.nhsbsa.net/dataset/hospital-prescribing-dispensed-in-the-community

The data captured from prescription processing is used to calculate reimbursement and remuneration. The data provided is based on England Hospital Trust prescribing only and includes prescription items which were prescribed in England and dispensed in the community in England as well as items prescribed in England and dispensed in Wales, Scotland, and the Isle of Man.

It is a monthly dataset. The data was downloaded in December 2022. The datasets available at the time were from January to October 2022.

This data excludes:

Items not dispensed, disallowed and those returned to the contractor for further clarification
Prescriptions prescribed and dispensed in Prisons, Hospitals and Private prescriptions
Items prescribed but not presented for dispensing or not submitted to NHS Prescription Services by the dispenser

This data does not contain any information about the patients including their diagnosis and demographics.

Each monthly dataset consists of the next variables:

period name
BNF name
BNF code
hospital trust code
hospital trust name
quantity
total quantity
total items
total actual cost
total NIC

Period name is the month and year when prescriptions was recorded through the system.

BNF name and BNF code are the British National Formulary name of a medical product and a 15 digit code in which the first seven digits are allocated according to the categories in the BNF and the last 8 digits represent the medicinal product.

Hospital Trust code and names are the code and name of the trust where the product was prescribed.

Quantity is the quantity prescribed.

Total Items shows the number of times a product appears on a prescription form not the quantity prescribed.

Total Quantity is the number of items multiplied by the quantity prescribed.

Total Net Ingredient cost NIC is the basic price of a drug.

Total Actual Cost is the calculated value including discounts and some adjustments. Actual Cost only relates to England.

Glimpse at data

As an example I am showing here January 2022 data

#load the data
hosp_prescrip_01_22_tbl <- read_rds("00_data/hosp_prescrip_01_22_tbl.rds")

hosp_pr_2022_post <- read_rds("00_data/hosp_pr_2022_post.rds")


hosp_prescrip_01_22_tbl %>% glimpse()

## Rows: 140,478
## Columns: 10
## $ PERIOD              <int> 202201, 202201, 202201, 202201, 202201, 202201, 20…
## $ BNF.NAME            <chr> "Movicol Paediatric Chocolate oral powder 6.9g sac…
## $ BNF.CODE            <chr> "0106040M0BBAJAB", "0404000M0BGAEAQ", "0501060D0AA…
## $ HOSPITAL.TRUST.CODE <chr> "R0A00", "R0A00", "R0A00", "R0A00", "R0A00", "R0B0…
## $ HOSPITAL.TRUST      <chr> "MANCHESTER UNIVERSITY NHS FT", "MANCHESTER UNIVER…
## $ QUANTITY            <dbl> 21.0, 20.0, 168.0, 224.0, 60.0, 30.0, 56.0, 56.0, …
## $ TOTAL.QUANTITY      <dbl> 21.0, 20.0, 168.0, 224.0, 120.0, 30.0, 56.0, 56.0,…
## $ TOTAL.ITEMS         <int> 1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 4, 1,…
## $ TOTAL.ACTUAL.COST   <dbl> 2.9844190, 18.1116328, 11.0794958, 28.9009815, 23.…
## $ TOTAL.NIC           <dbl> 3.07, 19.24, 11.83, 30.88, 25.52, 72.00, 2.90, 109…

Data Transformation

I have made the transformations and joined individual datasets together to have one table.

The transformation included:

transformation of period in dates to use it in time series analysis using lubridate package
transformation of BNF code to obtain the BNF chapter and then to obtain a BNF chapter description
calculating the TOTAL.ITEMS.COST by multiplying TOTAL.ITEMS and TOTAL.ACTUAL.COST
cleaning and creation of “look up table” with hospital trusts, their most up to date name, address and postcode using HOSPITAL.TRUST.CODE for future analysis
joining “look up table” to create final table with all the data in one place

The first 6 rows of a final table that will be used for further analysis:

hosp_pr_2022_post %>% glimpse()

## Rows: 1,471,460
## Columns: 24
## $ PERIOD                <int> 202201, 202201, 202201, 202201, 202201, 202201, …
## $ PERIOD.DATE           <date> 2022-01-01, 2022-01-01, 2022-01-01, 2022-01-01,…
## $ PERIOD.YEAR           <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
## $ PERIOD.MONTH          <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan…
## $ BNF.NAME              <chr> "Movicol Paediatric Chocolate oral powder 6.9g s…
## $ BNF.CODE              <chr> "0106040M0BBAJAB", "0404000M0BGAEAQ", "0501060D0…
## $ BNF.CHAPTER.CODE      <chr> "01", "04", "05", "10", "13", "11", "05", "04", …
## $ BNF.CHAPTER.PLUS.CODE <chr> "01: Gastro-Intestinal System", "04: Central Ner…
## $ BNF.CHAPTER.DESCRPT   <chr> " Gastro-Intestinal System", " Central Nervous S…
## $ HOSPITAL.TRUST.CODE   <chr> "R0A00", "R0A00", "R0A00", "R0A00", "R0A00", "R0…
## $ QUANTITY              <dbl> 21.0, 20.0, 168.0, 224.0, 60.0, 30.0, 56.0, 56.0…
## $ TOTAL.QUANTITY        <dbl> 21.0, 20.0, 168.0, 224.0, 120.0, 30.0, 56.0, 56.…
## $ TOTAL.ITEMS           <int> 1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 4, …
## $ TOTAL.ACTUAL.COST     <dbl> 2.9844190, 18.1116328, 11.0794958, 28.9009815, 2…
## $ TOTAL.NIC             <dbl> 3.07, 19.24, 11.83, 30.88, 25.52, 72.00, 2.90, 1…
## $ TOTAL.ITEMS.COST      <dbl> 2.9844190, 18.1116328, 11.0794958, 28.9009815, 4…
## $ HOSPITAL.TRUST        <chr> "MANCHESTER UNIVERSITY NHS FOUNDATION TRUST", "M…
## $ CODE                  <chr> "R0A", "R0A", "R0A", "R0A", "R0A", "R0B", "R0D",…
## $ ORGANISATION          <chr> "MANCHESTER UNIVERSITY NHS FOUNDATION TRUST", "M…
## $ ADDRESS.1             <chr> "COBBETT HOUSE", "COBBETT HOUSE", "COBBETT HOUSE…
## $ ADDRESS.2             <chr> "OXFORD ROAD, MANCHESTER", "OXFORD ROAD, MANCHES…
## $ AREA                  <chr> "MANCHESTER", "MANCHESTER", "MANCHESTER", "MANCH…
## $ REGION                <chr> "GREATER MANCHESTER", "GREATER MANCHESTER", "GRE…
## $ POST.CODE             <chr> "M13 9WL", "M13 9WL", "M13 9WL", "M13 9WL", "M13…

hosp_pr_2022_post %>% head()

Questions

In my report I wanted to answer the next questions:

What is the total number and cost of prescriptions?
What is the total cost of prescriptions by BNF chapter?
What is three top medicines by cost that are prescribed in each chapter?
What is the total number of prescriptions in each chapter?
What are the top prescribed medicines by number of items prescribed?
Which NHS trust spent the most?
Which NHS trust prescribed the most items?
How the expenses are growing each month?
What is the rolling mean spending by month?

Answers to the questions

1. What is the total number and cost of prescriptions?

hosp_pr_2022_post %>% 
    summarise(total.items = sum(TOTAL.ITEMS),
              total.cost  = sum(TOTAL.ITEMS.COST)) %>%
    
    mutate(total.items.chr = scales::number(total.items),
           total.cost.chr  = scales::dollar(total.cost, prefix = "£")) %>% 
    
    select(total.items.chr, total.cost.chr) %>% 

#rename the columns
    rename("Total Items" = total.items.chr,
           "Total Cost"  = total.cost.chr) %>%
    
    #add a table
    flextable() %>% 
    autofit() %>%
    bold(part = "header") %>% 
    add_footer_lines("") %>% 
    add_header_lines("Table 1. Prescriptions in 2022")

Table 1. Prescriptions in 2022
3 915 437	£786,398,249

2. What is the total cost of prescriptions by BNF chapter?

We can see on the Fig. 1 that the most spending on medicines is in the next BNF chapters: Central Nervous System, Skin and Malignant disease and Immunosuppression. Those three chapters cover 73.4% of all prescriptions in 2022. Already the first chapter, Central Nervous System, contributes to more than half of the spending and consists of more than £459M.

##data transformation
n <- 7

hosp_pr_2022_post %>% 
    select(BNF.CHAPTER.CODE, BNF.CHAPTER.DESCRPT, TOTAL.ITEMS, TOTAL.ITEMS.COST) %>% 
    mutate(BNF.CHAPTER.CODE = as_factor(BNF.CHAPTER.CODE) %>% 
               fct_lump(n = n, w = TOTAL.ITEMS.COST)) %>% 
    group_by(BNF.CHAPTER.CODE, BNF.CHAPTER.DESCRPT) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
   ungroup() %>% view()
#I've got not really what I wanted. I wanted to have one description as a factor and than total price and number, anyway try again below:

top_prescr_by_chapt_tbl <-  hosp_pr_2022_post %>% 
    select(BNF.CHAPTER.PLUS.CODE, TOTAL.ITEMS, TOTAL.ITEMS.COST) %>% 
    mutate(BNF.CHAPTER.PLUS.CODE = as_factor(BNF.CHAPTER.PLUS.CODE) %>% 
               fct_lump(n = n, w = TOTAL.ITEMS.COST)) %>% 
    group_by(BNF.CHAPTER.PLUS.CODE) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
   ungroup() %>% 
    
    #I want to arrange it in the right order for a plot
    
    mutate(BNF.CHAPTER.PLUS.CODE = BNF.CHAPTER.PLUS.CODE %>% fct_reorder(COST)) %>%
    mutate(BNF.CHAPTER.PLUS.CODE = BNF.CHAPTER.PLUS.CODE %>% fct_relevel("Other", after = 0)) %>% 
    
    arrange(desc(BNF.CHAPTER.PLUS.CODE)) %>% 
    #I want to add label to my plot
    mutate(N.ITEMS_text = scales::number(N.ITEMS),
           COST_text    = scales::dollar(COST, scale = 1e-6, prefix = "£", suffix = "M")) %>% 
    
    #Cumulative percent
    mutate(cum_pct = cumsum(COST)/sum(COST)) %>% 
    mutate(cum_pct_text = scales::percent(cum_pct)) %>% 
    
    
    #Rank
    mutate(rank = row_number()) %>% 
    mutate(rank = case_when(
        rank == max(rank) ~ NA_integer_,
        TRUE ~ rank)) %>% 
    
    #Label text
    mutate(label_text = str_glue("Rank: {rank}\nCost: {COST_text}\nCumPct: {cum_pct_text}"))
    
    #Now the visualisation
    
 g1 <-   top_prescr_by_chapt_tbl %>% 
       
       #plot
       ggplot(aes(COST, BNF.CHAPTER.PLUS.CODE)) +
      
   
        #geometries
        geom_segment(aes(xend = 0, yend = BNF.CHAPTER.PLUS.CODE),
                     colour = palette_light()[1],
                     size = 1) +
       geom_point(aes(size = COST),
                  colour = palette_light()[1]) +
       
       #labels
       geom_label(aes(label = label_text),
                  hjust = "inward",
                  size  = 2.5,
                  colour = palette_light()[1]) +
   #Formating
   scale_x_continuous(labels = scales::dollar_format(scale = 1e-6, prefix = "£", suffix = "M")) +
       
       labs(
           title = str_glue("Fig.1. Top {n} prescriptions in 2022"),
           x     = "Cost (£M)",
           y     = "BNF chapter",
           caption = str_glue("Already first chapter contributes
                              to half of the spending on prescriptions")
            
       ) +
       
       theme_tq() +
       
       theme(
               legend.position = "none",
               title = element_text(face = "bold"),
               plot.caption = element_text(face = "bold.italic")
           )
   g1

3. What are three top medicines by cost that are prescribed in each BNF chapter (top 7)?

We can see that the first chapter number 04 contributes to the most expensive spending that exceed £148M (for the first three medicines). Physotone oral solution is the most expensive medicine taking into consideration the price and the quantity (around £60M and more than 76k items).

#data transformation
top_3_med_bnf_chap_price <- hosp_pr_2022_post %>% group_by(BNF.CHAPTER.CODE, BNF.CHAPTER.DESCRPT, BNF.NAME) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
    arrange(desc(COST)) %>% 
    
    ungroup() %>% 
    
    group_by(BNF.CHAPTER.CODE) %>% 
    top_n(3) 

#level adjustment
sorted <- hosp_pr_2022_post %>% 
    select(BNF.CHAPTER.CODE, BNF.CHAPTER.DESCRPT, BNF.NAME, TOTAL.ITEMS, TOTAL.ITEMS.COST) %>% 
    group_by(BNF.CHAPTER.CODE, BNF.CHAPTER.DESCRPT) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
    arrange(desc(COST)) %>% 
    
    ungroup() %>% view()
 
a <-  sorted$BNF.CHAPTER.CODE

#presentation into table

top_3_med_bnf_chap_price %>% 
    arrange(BNF.CHAPTER.CODE, desc(COST)) %>%
    
    arrange(factor(BNF.CHAPTER.CODE, levels = a)) %>% 
    
    filter(BNF.CHAPTER.CODE %in% a[1:7]) %>% 
    
    #change the presentation of the numbers in table
    mutate(N.ITEMS = scales::number(N.ITEMS),
           COST    = scales::dollar(COST, prefix = "£")) %>% 
    #rename the columns
    rename("BNF Chapter\nCode" = BNF.CHAPTER.CODE,
           "Chapter Description" = BNF.CHAPTER.DESCRPT,
           "BNF Name" = BNF.NAME,
         "Number of Items" = N.ITEMS,
        "Total Cost" = COST) %>%

    #add a table
    flextable() %>% 
    autofit() %>%
    bold(part = "header") %>% 
    add_footer_lines("") %>% 
    add_header_lines("Table 2. Top 3 medicine in each BNF chapter (top 7) sorted by cost")

Table 2. Top 3 medicine in each BNF chapter (top 7) sorted by cost
04	Central Nervous System	Physeptone 1mg/ml oral solution sugar free	76 060	£59,905,519
04	Central Nervous System	Circadin 2mg modified-release tablets	40 066	£48,753,512
04	Central Nervous System	Espranor 8mg oral lyophilisates	14 632	£39,958,794
13	Skin	Efudix 5% cream	15 763	£27,826,439
13	Skin	Stelara 90mg/1ml solution for injection pre-filled syringes	462	£27,666,071
13	Skin	Enstilar 50micrograms/g / 0.5 mg/g cutaneous foam	6 486	£3,986,420
21	Appliances	Thealoz Duo eye drops preservative free	14 283	£14,098,579
21	Appliances	Hylo-Forte 0.2% eye drops preservative free	18 175	£13,119,399
21	Appliances	FreeStyle Libre 2 Sensor	1 927	£3,514,666
11	Eye	Atropine 1% eye drops	2 170	£4,143,625
11	Eye	Chloramphenicol 0.5% eye drops	9 552	£3,639,891
11	Eye	Chloramphenicol 1% eye ointment	15 760	£3,552,518
08	Malignant Disease and Immunosuppression	Adoport 1mg capsules	2 647	£10,692,071
08	Malignant Disease and Immunosuppression	Adoport 0.5mg capsules	1 387	£3,355,469
08	Malignant Disease and Immunosuppression	Mycophenolate mofetil 500mg tablets	9 202	£2,478,820
05	Infections	Co-amoxiclav 500mg/125mg tablets	55 125	£6,566,015
05	Infections	Flucloxacillin 500mg capsules	41 584	£6,135,470
05	Infections	Nitrofurantoin 100mg modified-release capsules	16 788	£5,256,293
01	Gastro-Intestinal System	Plenvu oral powder sachets	2 231	£6,082,030
01	Gastro-Intestinal System	Moviprep oral powder sachets	2 716	£5,143,292
01	Gastro-Intestinal System	Sulfasalazine 500mg gastro-resistant tablets	4 816	£5,035,670

4. What is the total number of prescriptions in each BNF chapter?

The most prescribed medicines aka area of disease are Central nervous systems, Infections and Eye treatment. Indicating the prevalent disease areas in 2022 for people who get the prescriptions from hospitals.

hosp_pr_2022_post %>% group_by(BNF.CHAPTER.CODE, BNF.CHAPTER.DESCRPT) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
    arrange(desc(N.ITEMS)) %>% 
    ungroup() %>% 

    #change the presentation of the numbers in table
    mutate(N.ITEMS  = scales::number(N.ITEMS),
            COST    = scales::dollar(COST, prefix = "£")) %>% 
    #rename the columns
    rename("BNF Chapter\nCode" = BNF.CHAPTER.CODE,
           "Chapter Description" = BNF.CHAPTER.DESCRPT,
           "Number of Items" = N.ITEMS,
           "Total Cost" = COST) %>%
    
    #add a table
    flextable() %>% 
    autofit() %>%
    bold(part = "header") %>% 
    add_footer_lines("") %>% 
    add_header_lines("Table 3. Total number of prescriptions in each BNF chapter and its cost")

Table 3. Total number of prescriptions in each BNF chapter and its cost
04	Central Nervous System	1 711 497	£459,156,957
05	Infections	398 647	£31,145,176
11	Eye	249 645	£33,656,583
02	Cardiovascular System	239 765	£19,488,521
13	Skin	216 778	£79,805,701
01	Gastro-Intestinal System	197 437	£26,380,974
06	Endocrine System	179 162	£19,560,811
21	Appliances	130 264	£37,931,817
09	Nutrition and Blood	123 207	£13,121,650
03	Respiratory System	119 621	£12,006,112
10	Musculoskeletal and Joint Diseases	116 545	£8,085,556
12	Ear, Nose and Oropharynx	84 935	£8,294,682
08	Malignant Disease and Immunosuppression	59 172	£31,492,260
07	Obstetrics, Gynaecology and Urinary-Tract Disorders	51 876	£3,656,976
19	Other Drugs and Preparations	18 767	£968,891
20	Dressings	8 377	£389,353
15	Anaesthesia	7 303	£1,129,120
23	Stoma Appliances	1 752	£63,026
22	Incontinence Appliances	668	£63,078
14	Immunological Products and Vaccines	12	£578
18	Preparations used in Diagnosis	7	£426

5. What are the top prescribed medicines (top 7) by number of items prescribed?

When we are looking at the total number prescribed it is methadone oral solution that is most frequent but less expensive compared to physeptone that is prescribed the second (but is much more expensive).

n <- 7

sorted_3 <- hosp_pr_2022_post %>% 
    group_by(BNF.CHAPTER.PLUS.CODE, BNF.NAME) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
    arrange(desc(N.ITEMS)) %>% 
    
    ungroup() %>% 
    
    slice(1:n)



#data transformation and table presentation



top_number_by_chapt_tbl <-  hosp_pr_2022_post %>% 
    select(BNF.CHAPTER.PLUS.CODE, BNF.NAME, TOTAL.ITEMS, TOTAL.ITEMS.COST) %>% 
    mutate(BNF.NAME = as_factor(BNF.NAME) %>% 
               fct_lump(n = n, w = TOTAL.ITEMS)) %>% 
    group_by(BNF.NAME) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
    ungroup() %>% 
    
    #I want to arrange it in the right order for a plot
    
    mutate(BNF.NAME = BNF.NAME %>% fct_reorder(N.ITEMS)) %>%
    mutate(BNF.NAME = BNF.NAME %>% fct_relevel("Other", after = 0)) %>% 
    
    arrange(desc(BNF.NAME)) %>% 
    #I want to add label to my plot
    mutate(N.ITEMS_text = scales::number(N.ITEMS),
           COST_text    = scales::dollar(COST, scale = 1e-6, prefix = "£", suffix = "M")) %>% 
    
    #Cumulative percent
    mutate(cum_pct = cumsum(N.ITEMS)/sum(N.ITEMS)) %>% 
    mutate(cum_pct_text = scales::percent(cum_pct)) %>% 
    
    
    #Rank
    mutate(rank = row_number()) %>% 
    mutate(rank = case_when(
        rank == max(rank) ~ NA_integer_,
        TRUE ~ rank)) %>% 
    
    #Label text
    mutate(label_text = str_glue("Rank: {rank}\nItems: {N.ITEMS_text}\nCost: {COST_text}"))

#join the column from another table
top_number_by_chapt_u_tbl <- sorted_3 %>% 
    select(BNF.CHAPTER.PLUS.CODE, N.ITEMS) %>% 
    full_join(top_number_by_chapt_tbl)

#I am not sure that vis is better here I prefer to create a table!

top_number_by_chapt_u_tbl %>% 
    select(BNF.NAME, BNF.CHAPTER.PLUS.CODE, N.ITEMS_text, COST_text) %>% 
    
    #flaxtable
    rename("BNF Name" = BNF.NAME,
           "BNF Chapter" = BNF.CHAPTER.PLUS.CODE,
           "Number of Items" = N.ITEMS_text,
           "Total Cost" = COST_text) %>%
    
    #add a table
    flextable() %>% 
    autofit() %>%
    bold(part = "header") %>% 
    add_footer_lines("") %>% 
    add_header_lines(str_glue("Table 4. Top {n} most prescribed medicine sorted by number of items"))

Table 4. Top 7 most prescribed medicine sorted by number of items
Methadone 1mg/ml oral solution sugar free	04: Central Nervous System	97 831	£29.63M
Physeptone 1mg/ml oral solution sugar free	04: Central Nervous System	76 060	£59.91M
Prednisolone 5mg tablets	06: Endocrine System	64 170	£1.41M
Co-amoxiclav 500mg/125mg tablets	05: Infections	55 125	£6.57M
Sertraline 50mg tablets	04: Central Nervous System	52 230	£1.83M
Methadone 1mg/ml oral solution	04: Central Nervous System	43 702	£14.11M
Folic acid 5mg tablets	09: Nutrition and Blood	42 106	£1.02M
Other		3 484 213	£671.93M

6. Which NHS trust (top 10) spent the most?

It should be emphasized that It is not the way to compare trusts between them as each of the hospital is specialise in different medical areas and can be bigger or smaller depending on the region and population.

hosp_pr_2022_post %>% group_by(HOSPITAL.TRUST) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
    arrange(desc(COST)) %>% 
    ungroup() %>% 
    top_n(10) %>% 
#this mutate() transform it to character, so no possible to plot but better visibility - if I want to build a table
mutate(N.ITEMS = scales::number(N.ITEMS),
       COST    = scales::dollar(COST, prefix = "£")) %>% 
    
    
    rename("NHS Trust Name" = HOSPITAL.TRUST,
           "Number of Items" = N.ITEMS,
           "Total Cost" = COST) %>%
    
    #add a table
    flextable() %>% 
    autofit() %>%
    bold(part = "header") %>% 
    add_footer_lines("") %>% 
    add_header_lines(str_glue("Table 5. Top 10 NHS Trusts that prescribed the most considering cost"))

Table 5. Top 10 NHS Trusts that prescribed the most considering cost
MIDLANDS PARTNERSHIP NHS FOUNDATION TRUST	122 509	£108,654,040
SANDWELL AND WEST BIRMINGHAM HOSPITALS NHS TRUST	113 434	£45,699,559
ALDER HEY CHILDREN'S NHS FOUNDATION TRUST	16 654	£36,677,264
NORTH BRISTOL NHS TRUST	34 003	£34,966,829
CENTRA AND NORTH WEST LONDON NHS FOUNDATION TRUST	64 742	£27,624,692
SUSSEX PARTNERSHIP NHS FOUNDATION TRUST	60 553	£27,098,929
ESSEX PARTNERSHIP UNIVERSITY NHS FOUNDATION TRUST	49 704	£24,076,846
NORTH EAST LONDON NHS FOUNDATION TRUST	38 709	£22,308,378
EAST KENT HOSPITALS UNIVERSITY NHS FOUNDATION TRUST	106 436	£22,167,541
TEES, ESK AND WEAR VALLEYS NHS FOUNDATION TRUST	102 025	£19,982,558

7. Which NHS trust (top 10) prescribed the most items?

hosp_pr_2022_post %>% group_by(HOSPITAL.TRUST) %>% 
    summarise(N.ITEMS = sum(TOTAL.ITEMS), 
              COST    = sum(TOTAL.ITEMS.COST)) %>%
    arrange(desc(N.ITEMS)) %>% 
    ungroup() %>% 
    
    top_n(10) %>% 
    
    #this mutate() transform it to character, so no possible to plot but better visibility - if I want to build a table
    mutate(N.ITEMS = scales::number(N.ITEMS, big.mark = ","),
           COST    = scales::dollar(COST, prefix = "£")) %>% 
    
    rename("NHS Trust Name" = HOSPITAL.TRUST,
           "Number of Items" = N.ITEMS,
           "Total Cost" = COST) %>%
    
    #add a table
    flextable() %>% 
    autofit() %>%
    bold(part = "header") %>% 
    add_footer_lines("") %>% 
    add_header_lines(str_glue("Table 6. Top 10 NHS Trusts that prescribed the most considering number of items"))

Table 6. Top 10 NHS Trusts that prescribed the most considering number of items
MIDLANDS PARTNERSHIP NHS FOUNDATION TRUST	122,509	£108,654,040
SANDWELL AND WEST BIRMINGHAM HOSPITALS NHS TRUST	113,434	£45,699,559
EAST KENT HOSPITALS UNIVERSITY NHS FOUNDATION TRUST	106,436	£22,167,541
TEES, ESK AND WEAR VALLEYS NHS FOUNDATION TRUST	102,025	£19,982,558
CENTRA AND NORTH WEST LONDON NHS FOUNDATION TRUST	64,742	£27,624,692
SUSSEX PARTNERSHIP NHS FOUNDATION TRUST	60,553	£27,098,929
ESSEX PARTNERSHIP UNIVERSITY NHS FOUNDATION TRUST	49,704	£24,076,846
NORTH EAST LONDON NHS FOUNDATION TRUST	38,709	£22,308,378
NORTH BRISTOL NHS TRUST	34,003	£34,966,829
ALDER HEY CHILDREN'S NHS FOUNDATION TRUST	16,654	£36,677,264

8. How the expenses are growing each month?

The total expenses reach almost £800M and almost 4M items from January to October in 2022. It is interesting to observe the gradual increase month over month.

#create a data set grouped by month
hosp_pr_2022_sum_month <-     hosp_pr_2022_post  %>%
    select(PERIOD, PERIOD.YEAR, PERIOD.MONTH, TOTAL.ITEMS, TOTAL.ITEMS.COST) %>% 
    #group by month so there are 10 groups now
    group_by(PERIOD.MONTH) %>% 
    
    #summarise to get the sum of prescriptions by each month
    summarise(monthly_items = sum(TOTAL.ITEMS),
              monthly_cost  = sum(TOTAL.ITEMS.COST)) %>% 
    ungroup()
    
#Now to do the calculations
hosp_pr_2022_sum_month %>% 
    #find the cumulative amount items prescribed and cumulative cost for each month
    mutate(cumulative_items = cumsum(monthly_items),
           cumulative_cost = cumsum(monthly_cost)) %>% 
    #transform it to percentage but they are exactly the same, so delete one
    mutate(cumulative_items_pct = cumulative_items/sum(monthly_items),
           cumulative_cost_pct  = cumulative_cost/sum(monthly_cost)) %>% 
    #but it is a character
    mutate(cumulative_items_pct_chr = scales::percent(cumulative_items_pct),
           cumulative_cost_pct_chr  = scales::percent(cumulative_cost_pct),
           cumulative_items_chr = scales::number(cumulative_items)) %>%
    
    #Label text
    mutate(label_text = str_glue("N Items: {cumulative_items_chr}\nCumPct items: {cumulative_items_pct_chr}\nCumPct Cost: {cumulative_cost_pct_chr}")) %>% 


#plot    
    ggplot(aes(PERIOD.MONTH, cumulative_cost, group = 1)) +
    
    
    geom_point(size = 5, colour = "#2C3E50") +    
    geom_line(size = 1.5, colour = "#2C3E50") +
    
    #labels
    geom_label(aes(label = label_text),
               hjust = "inward",
               size  = 2,
               colour = "#2C3E50") +
    scale_y_continuous(labels = scales::dollar_format(scale = 1e-6, prefix = "£", suffix = "M")) +
    
    labs(title = "Fig. 2. Cumulative cost of the prescriptions by month in 2022",
        x = "",
        y = "Cumulative Cost"
    ) +
    
    theme_tq()

9. What is the rolling mean spending by month?

The three-month moving average is around £80M.

#I want to plot moving averages
hosp_pr_2022_sum_month %>%
    mutate(roll_mean_3 = rollmean(monthly_cost, k = 3, na.pad = TRUE, align = "right")) %>% 
   
    #plot
    ggplot(aes(x = PERIOD.MONTH, y = monthly_cost, group = 1))+   # start ggplot
    geom_line(                        # plot raw values
        size = 1,
        alpha = 0.2                       # semi-transparent line
    ) + 
    
    #adjust the limits and label on the scale
   
    scale_y_continuous(labels=scales::dollar_format(prefix = "£", 
                                                    scale = 1e-6, suffix = "M"), 
                       limits = c(40000000, 100000000)) +
   
    #add moving averages using tidyquant package
     geom_ma(                 # plot moving average
        n = 3,           
        size = 1,
        color = "blue")+ 
    
    labs(title = "Fig. 3. Time series analysis of total cost of prescriptions by month in 2022", 
         subtitle = "Three months moving averages(blue line)", 
         y = "Total Items Cost", x = "") + 
    
    theme_tq()

Analysis of hospital prescription data

Elena Thomson

2023-01-30