Category Archives: Technology

Data Science Masterclass

The field of machine learning, science that studies the design of algorithms that can learn, is advancing rapidly and is becoming widespread in critical care medicine given the large amounts of data collected routinely in the intensive care units. Typical tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns. These tasks are learned through available data that are observed through experiences or instructions.

The goal of this Data Science masterclass is to teach doctors and other health care professionals basic concepts and skills and give tools for working more effectively with data. Moreover, in the literature there is an increasing number of papers describing AI/machine learning algorithms and prediction models so clinicians and other healthcare provides must know the key concepts of Data Science to correctly interpret results.

The Data Science masterclass was a very interactive and practical course were participants have the possibility to discover insights about large, rich and complex data sets, to find new ways to answer clinical questions using large datasets of electronic health records, to cooperate with specialists of different fields and to learn more about the potential of medical data, machine learning and predictive modelling that could provide new insights and improve patient care.

To start familiarizing with Clinical Data Science for Critical Care you need

  1. a laptop
  2. to install R and R studio
  3. to have or to sign up for a Google docs account (optional)
  4. to download and install a spreadsheet software

Moreover, you need to have an understanding of how files and folders (directories) are named on your computer because unlike your usual habit of pointing and clicking to open something you don’t have a graphical user interface (GUI) and you will need to start writing instructions/scripts in the R terminal.

What is R?

R is a free cross-platform (UNIX platforms, Windows and MacOS) software environment for statistical computing and graphics well suited to data analysis. R is not graphical (GUI) instead is based on scripts and the learning curve might be steeper than with other software. Working with scripts forces you to have deeper understanding of what you are doing.

Why R?

3 good reasons:

  1. You can do anything in R
  2. Science should be reproducible
  3. You have a vast support network

People think R is hard because it’s not a graphical user interface (GUI) and you have to describe what tasks you want the computer to complete in text, using the R language.

Data pipeline

Building data pipelines is a core component of data science. Data pipeline is a set of actions that extract data (or directly analytics and visualisation) from various sources to produce an output (tables, plots, manuscripts, presentations) thanks to a R script. 

After obtaining data from electronic health records databases, web servers, logs, online open-source repositories you have your data in a spreadsheet, you write instructions/scripts using the R language and you obtain an output: a table, plot or entire manuscript. You can change your data, or add new data, and run the script another time and instantly you regenerate the output.

Data preparation

Data preparation is the combination of data cleaning and data modelling. To be able to describe, plot, and test data must be tidy following the rule that “Each column is a variable. Each row is an observation.”. Data preparation includes variable re-naming, extract numbers and strings, parsing dates, columns to rows, missing and duplicate values.

Types of data: Not all data is equal, aim for consistency in every column, never try to record more than 1 type in a column: integers, decimals, strings, datetime, booleans, factors, try to think like a computer.

Data visualisation

Complex ideas must be communicated with clarity, precision and efficiency with storytelling, decluttering, avoid misleading and pie chart horror, scaling up and rational use of colours.

Visualisation is a fundamentally human activity. A good visualisation will show you things that you did not expect, or raise new questions about the data. A good visualisation might also indicate you that you’re asking the wrong question or you need to collect different data. 

Statistical modeling

Models are complementary tools to visualisation. Once you have made your questions sufficiently precise, you can use a model to answer them. Machine learning algorithms are divided in three categories:

  1. supervised: model training, focused on predictive tasks (e.g. risk of death, readmission, length of stay, early deterioration, …);
  2. unsupervised: discovery of latent structure/subclasses in a dataset, useful to define subgroups and phenotypes;
  3. reinforcement learning:virtual agents ought to take actions in an environment so as to maximize some notion of cumulative reward. This is the most immature branch of machine learning.

Communication

The last step of data science is communication, a critical part because It doesn’t matter how well your models and visualisation have led you to understand the data unless you can also communicate your results to others.

Tips in case of error messages

If you encounter any error messages during your Data Science practice just try copy and past your error message into stackoverflow.com and in most of the times you’ll find an answer.

Resources

Most of the material and sample code used in this Data Science masterclass is available online here datascibc.org/Data-Science-f

The suggested book for starting learning R for Data Science is “R for Data Science” and is available online here r4ds.had.co.nz. Moreover, remember that Google is your friend.

Infographic

To conclude, my infographic from masterclass in Data Science at summarising the key concepts. Follow me on Twitter: Scquizzato Tommaso @tscquizzato.

ESICM Datathon: Day 3

Hemodynamics in septic shock Speaker: G. Baselli
Amazing how a ‘routine’ haemodynamic system can be viewed so differently from the point of view of an engineer/data scientist/medic

In conclusion

  1. Autonomic nervous system function and CV regulation is dynamic and hence the data needs to be dynamic and not a constant
  2. Need to pick out the meaningful physiological parameters to feed into the machine learning algorithm
  3. Important to have large open-access databases
  4. These databases need to integrate multi-scale information both in dimension and in time

Neurointensive care Speaker: A. Ercole

The concept of cerebral perfusion pressure = MAP – ICP is an example of a simple mathematical concept

We measure what we can, NOT what we should

Perhaps the autoregulation status of the TBI pt is more important – cerebrovascular pressure reactivity (PRx) Link

Data studies needs the same robustness as any other drug studies

Data Access quality and Curation for Observational Research Designs

Reinforcement learning in sepsis Speaker: M. Komorowski

The Artificial Intelligence Clinicians learns optimal treatment strategies for sepsis in intensive care

Can a computer help the clinician do the right thing?

Looked at 2 groups of treatment in septic patients – vasopressor use and fluid administration.

In general pts received more IV fluids and less vasopressors than recommended by the AI policy.

The machine considered much more variables compared to the human clinician

Conclusion

  • Reinforcement learning could provide clinically interpretable treatment suggestions
  • The models could improve outcomes in sepsis
  • Flexible framework transferable to other clinical questions

Gradient-boosted decision trees Speaker: C. Cosgriff

“Its not about doctors versus computers, its doctors with computers versus doctors without computers – @cosgriffc”

If you don’t know what gradient boost is (I don’t) but would like to find out more OR want to learn some basic coding in R/Python/SQL, have a look at kaggle.com

The power of XGBoost for the ICU is to shift towards ‘human intelligence’, supporting clinicians and intensivists (NOT REPLACING THEM)

References:

  • Introduction to Statistical Learning
  • Elements of statistical learning
  • Deep Learning with Python
  • xgBoost: A scalable tree boosting system

Clinical research and AI Speaker: R. Furlan

This is highlighted by a local project looking at the ability to diagnose syncope using natural language algorithms of EHR compared to human clinicians

Moving models to the bedside Speaker: P. Thoral

Insight into the challenges on bringing machine learning and artificial intelligence models to the bed side.

Highlights the various legislative requirement with regards to introducing system (medical software is still considered a medical device)

Need to engage the stakeholders

The future: physicians, engineers, machines Speaker: R. Barbieri

Various scenarios of the future predicted by Prof Barbieri

Armageddon

  • Technology used in warfare
  • Technology cannot overcome environmental disasters

Best Scenario

  • Technology cures, predicts and prevents diseases
  • No need for ICUs

Idiocracy

  • Technology blunts human knowledge
  • Humans lose ability to think
  • Technology takes over but are unable to make critical decisions

Oligarchy (aka BladeRunner)

  • Technology and knowledge controlled by a few
  • Progress without common wealth
  • Humans and machines start merging

Terminator

  • Machines takes over
  • Humans and human nature are irrelevant

Optimistic

ESICM Datathon: Day 2

Session 3: Advanced data analysis Chairs: J. De Waele, A. Girbes

AI & machine learning for clinical predictive analytics Speaker: M. Ferrario

An Interpretable Machine Learning Model for Accurate Prediction of Sepsis in the ICU.

The combination of genomics, metabonomics coupled with Artificial Intelligence and Machine Learning is an incredible one. BUT its application is still an open challenge.

Given the complexity and heterogeneity of the data, there are no well-defined set of procedures to interrogate them.

BUT THERE ARE ISSUES WITH MACHINE LEARNING – Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

New meaning to observational studies Speaker: S. Finazzi

Data sources

  • Prospective data collection
  • Administrative databases
  • Registries
  • Electronic health records

Research question that can be tackled

  • Evaluation of quality of care
  • Study clinical and decision making processes
  • Analyse pathophysiological phenomena

Besides MIMIC/PhysioNet, there are other collaborative databases out there

http://giviti.marionegri.it/

These data have been used to improve care and processes in participating departments. Also can act as benchmarking exercise.

The big issue is the quality of the data!

Predictive models and clinical support Speaker: G. Meyfroidt

Examples of application:

Computerized prediction of intensive care unit discharge after cardiac surgery: development and validation of a Gaussian processes model

Predictive models may help us to predict patients discharge form the ICU, to predict intracranial pressure increase and acute kidney injury onset.

Predictive models may help us to predict patients discharge form the ICU, to predict intracranial pressure increase and acute kidney injury onset.

Medical data science 101 Speaker: M. Komorowski

Why should we conduct secondary analysis of EHR?

  • RCT results not always applicable to real life patients
  • RCTs are negative!
  • RCTs won’t allow precision medicine
  • Not using the data is unethical

Limitations

  • Observational data: difficult to examine causality
  • Availability of the data?
  • Data quality

Matt then did a LIVE demo on how to build a machine learning model – follow my thoughts here

State of the art of EMRs in Europe Speaker: T. Kyprianou

Cognitive Informatics in Health and Biomedicine

Adverse effects in medicine: easy to count, complicated to understand and complex to prevent.

There needs to be a shift in focus from error intolerance to error recognition and recovery.

The data for EHR are driven by 4 sources:

  • Patient
  • Unit
  • Education
  • Research

Problems and promises of innovation: why healthcare needs to rethink its love/hate relationship with the new

Improving the Electronic Health Record—Are Clinicians Getting What They Wished For?

The tragedy of the electronic health record

Opportunities for ICU CIS/PDMS

  • Direct link/real time updates of patient’s medical records
  • Healthcare professionals access to all information and services they need in one place
  • Patients/family-centric decision-making based on best clinical evidence
  • Improve data quality and analysis
  • Development of better and more effective security protocols
  • Faster test turnaround times to provide quicker diagnosis for patients.

GDPR and pseudonymization Speaker: D. Fulco, A. Di Stasio

This was an absolutely fascinating insight into the GDPR from a legal prospective.

I really think that GDPR is a good thing.

General Data Protection Regulation (GDPR)

Privacy in the age of medical big data

Blackout: when IT fails Speaker: C. Hinske

*Great title slide*

3 types of failure

  • Failure to use (e.g. IT blackout)
  • Failure to support (e.g. incorrect information)
  • Failure to enable (e.g. too much information)

Top 3 tips

    Risk assessment

    • Contingency plan where you tolerate workflow disruption (with strict time limit) followed by a fallback plan

    Failure strategy

    • Failure prevention –> Failure management strategy

    Train your team

    • Simulated systems fail

Prediction and deep learning Speaker: A. Ercole

If you were blown away by Matt’s SQL and Python prowess, wait till you see Ari’s demo. I was mesmorised when he did his party trick at LIVES2016 in Milan.

This time around, he constructed mortality prediction models in real time using R (here)

Critical Care Health Informatics Collaborative (CCHIC): Data, tools and methods for reproducible research: A multi-centre UK intensive care database

Another mesmorising site he introduced us to was the Neural Network Playground

*TOP TIP FROM ARI – IF YOU WANT TO LEARN R/PYTHON/SQL, DOWNLOAD THE PACKAGE AND USE IT

The issue of data quality Speaker: S. Vieira

A Data Quality Assessment Guideline for Electronic Health Record Data Reuse

The types of missing data

  • Missing at completely random e.g. loss of label in lab test
  • Missing at random e.g. arterial pH, PaCO2 measurements in blood
  • Missing Not at Random e.g. blood counts which doctor decides not to do

Harmonization of data sources Speaker: B. Illigence
This is a fascinating insight into the process in Germany introducing a national EHR (it’s not completed yet)

Making sense of a big data mess Speaker: H. Hovenkamp

From the founder of PACMED (https://pacmed.ai//) based in Amsterdam

Once upon a time: the story of MIMIC Speaker: R. Mark

This is probably my highlight of day 2. The story of how the MIMIC database came into being from Prof R Mark. Amazing and inspirational. A call for further collaboration. Furthermore, if you use the MIMIC data and publish your research, you must submit your code to an open repository.

ESICM Datathon: Day 1

This is my first datathon and this blog just summarises some of the themes/discussions at the conference. As a declaration of interest, I believe in the collaborative use of healthcare data to improve patient care BUT I am NOT a data scientist and barely write a Python/R script.

66DD7966-4E39-469F-B074-380775F5BBB7
Physicians: the need for machine learning (G. Meyfroidt @GMeyfroid)

Predicting the Future — Big Data, Machine Learning, and Clinical Medicine

Do you know the difference??

9D062094-36CA-4B0D-AE82-B60690776FAD

There is just too much data in the ICU – you need to understand it.

Data by themselves are uselss. To be useful, data must be analysed, interpreted and acted upon. Thus, it is the algorithms – not data set – that will prove transformative.

The transformation will be in the form of:

– Decision support, prognostication and diagnostics

– Personalised medicine

– Continuous learning

– Knowledge discovery

By freeing physicians from the tasks that interfere with human connection, AI will create space for the real healing that takes place between a doctor who can listen and a patient who needs to be heard.

High-performance medicine: the convergence of human and artificial intelligence

Geert has a team of data scientist working with the clinical team. One should not try to be the other.

2589457C-FA44-4E6E-8A37-6C4B8D85E396

97278801-F690-4EC4-8C52-F7FD37EE0816

 

Data issues

  • Quality
    • Lack of standards
    • Missing or incomplete data
      • Can be unbiased or random
      • Most often biased (eg. lactate measurements in sickest pts)
    • Will influence the performance of Machine Learning models
  • Access to data, privacy and regulatory issues
    • Who owns shared data?
    • Who oversees the correct use
    • GDPR

Article | Published: 22 October 2018 The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care by @matkomorowski

 

Data analysts: why invest in ICM? (M. Flechet @FlechetMarine)

*I love her slideset

Data Scientist: The Sexiest Job of the 21st Century

1A8FCAB7-8ED2-43CF-B70F-F8B9091FFED8

The Vs of Big Data

– Velocity

– Volume

– Variety

– Value

– Veracity

B39DD693-E6FD-4AD4-9BAE-7B95369A2F45 58150520-E007-4870-8725-809CEE803719 3541665F-AED1-44DB-A8AF-B3216F5F9FD5 2DA59227-C3C1-464A-A9D6-FC10E2F4BBEC

Healthcare Big Data and the Promise of Value-Based Care
The data scientist as part of the medical team and the doctor as an information coach (L. Celi @MITcriticaldata)

Healthcare is a failed business model

  • Under-reported and under-appreciated degree of medical errors
  • Inequalities in care delivery
  • Enormous waste of resources: over-testing, over-diagnosis, oer-treatment
  • Large information gaps from imperfect medical knowledge system
  • Inefficiencies in workflow
  • High level of workforce burnout

Why doctors hate their computers – Atul Gawande

 

Opportunities in AI in healthcare

  • Classification: image recognition, risk stratification
  • Prediction: disease trajectory and prognosis, clinical events for triaging, treatment response
  • Optimisation aka precision medicine: diagnostic and screening strategies, defining therapeutic targets

Challenges for AI in healthcare

  • Labelling, a requirement for classification and prediction, is not straightforward
  • Model validity is limited by time and space
  • Machine bias
  • Optimal outcomes may vary across different stakeholders
  • Short term gais may not translate to long term benefits
  • Over-diagnosis (and over-treatment) will surge

Using machine learning, the degree of uncertainty may actually increase

Tolerating Uncertainty — The Next Medical Revolution?

Artificial intelligence systems for complex decision-making in acute care medicine: a review

In the AI Age, “Being Smart” Will Mean Something Completely Different

The new smart will be determind not by what or how we know, but by the quality of our thinking, listening, relating, collaborating and learning.

* I would highly recommend the following links as a good starting point if you are interested in database research

MIT Critical Care Data

eICU Collaborative Research Database

MIMIC Critical Care Database

MIMICIII is arguably the most well known freely accessible critical care database.

F560DCDA-C4AE-4D9B-B00D-5D0366B57E56

Secondary analysis of electronic health records (FREE ebook)