Category Archives: Big data

2nd Milan Critical Care Datathon & ESICM Big Datatalk


The 2nd Milan Critical Care Datathon & ESICM Big Datatalk 3 days of data science meeting (involving 350 attendees, 23 faculty members & 20 mentors), held at Humanitas Hospital: lot of enthusiasm & interest, standing rooms only for the whole event, including:

  • the datathon a competition to learn how to implement/work on data at the bedside, with physicians, allied health care professionals and engineers/data scientists working together
  • the datatalk, a meeting focused on data science, artificial intelligence and machine learning, to allow attendees to learn more about these new challenges but promising strategies, including some fist examples of clinical applications showing how these tools could improve outcomes (ie in perioperative setting, in septic pt, in radiology…), reflections on potential ethical and legal/regulatory issues, and some practical demonstration on to do the query
  • an important European Society of Intensive Care Medicine ESICM &  Society of Critical Care Medicine  SCCM joint meeting, involving:
    ESICM President Jozef Kesecioglu
    ESICM President Elect Maurizio Cecconi
    ESICM Past President Massimo Antonelli
    SCCM President Heatherlee Bailey
    SCCM President Elect Lewis J. kaplan
    SCCM President Elect Greg S. Martin
    with a related joint initiative between the two Societies been announced.

We need to share data in order to improve clinical outcomes and scientific progress: free databases are now available, ie MIMIC (Medical Information Mart for Intensive Care)-III (Johnson AEW et al, Sci Data. 2016), and the new Amsterdam one, and more are expected to come in the next future. ESICM launched a new, really active section dedicated to data science.
It is not just artificial intelligence (AI) & machine learning (ML) but data science (DS): using (all) health care related data that we are not actually able to use as data volume is overwhelming, & physicians and all healthcare professionals are overworked. And the human?? As part of out job is caring the patient, not only curing, and sometimes what we should do in taking the patient’s hand, we hope these tools could give us the time we need to be (more) human again (take. a moment to read Topol E. Deep Medicine: how Artificial Intelligence can make healthcare human again). In the infographic, some notes from the talks given during the three days, enjoy!! PS you could also look for the official hashtag #ESICMdata20 to recall all the posts from the meeting! 

Here group pics of the team competing in the datathon!

And the winners!

Data Science Masterclass

The field of machine learning, science that studies the design of algorithms that can learn, is advancing rapidly and is becoming widespread in critical care medicine given the large amounts of data collected routinely in the intensive care units. Typical tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns. These tasks are learned through available data that are observed through experiences or instructions.

The goal of this Data Science masterclass is to teach doctors and other health care professionals basic concepts and skills and give tools for working more effectively with data. Moreover, in the literature there is an increasing number of papers describing AI/machine learning algorithms and prediction models so clinicians and other healthcare provides must know the key concepts of Data Science to correctly interpret results.

The Data Science masterclass was a very interactive and practical course were participants have the possibility to discover insights about large, rich and complex data sets, to find new ways to answer clinical questions using large datasets of electronic health records, to cooperate with specialists of different fields and to learn more about the potential of medical data, machine learning and predictive modelling that could provide new insights and improve patient care.

To start familiarizing with Clinical Data Science for Critical Care you need

  1. a laptop
  2. to install R and R studio
  3. to have or to sign up for a Google docs account (optional)
  4. to download and install a spreadsheet software

Moreover, you need to have an understanding of how files and folders (directories) are named on your computer because unlike your usual habit of pointing and clicking to open something you don’t have a graphical user interface (GUI) and you will need to start writing instructions/scripts in the R terminal.

What is R?

R is a free cross-platform (UNIX platforms, Windows and MacOS) software environment for statistical computing and graphics well suited to data analysis. R is not graphical (GUI) instead is based on scripts and the learning curve might be steeper than with other software. Working with scripts forces you to have deeper understanding of what you are doing.

Why R?

3 good reasons:

  1. You can do anything in R
  2. Science should be reproducible
  3. You have a vast support network

People think R is hard because it’s not a graphical user interface (GUI) and you have to describe what tasks you want the computer to complete in text, using the R language.

Data pipeline

Building data pipelines is a core component of data science. Data pipeline is a set of actions that extract data (or directly analytics and visualisation) from various sources to produce an output (tables, plots, manuscripts, presentations) thanks to a R script. 

After obtaining data from electronic health records databases, web servers, logs, online open-source repositories you have your data in a spreadsheet, you write instructions/scripts using the R language and you obtain an output: a table, plot or entire manuscript. You can change your data, or add new data, and run the script another time and instantly you regenerate the output.

Data preparation

Data preparation is the combination of data cleaning and data modelling. To be able to describe, plot, and test data must be tidy following the rule that “Each column is a variable. Each row is an observation.”. Data preparation includes variable re-naming, extract numbers and strings, parsing dates, columns to rows, missing and duplicate values.

Types of data: Not all data is equal, aim for consistency in every column, never try to record more than 1 type in a column: integers, decimals, strings, datetime, booleans, factors, try to think like a computer.

Data visualisation

Complex ideas must be communicated with clarity, precision and efficiency with storytelling, decluttering, avoid misleading and pie chart horror, scaling up and rational use of colours.

Visualisation is a fundamentally human activity. A good visualisation will show you things that you did not expect, or raise new questions about the data. A good visualisation might also indicate you that you’re asking the wrong question or you need to collect different data. 

Statistical modeling

Models are complementary tools to visualisation. Once you have made your questions sufficiently precise, you can use a model to answer them. Machine learning algorithms are divided in three categories:

  1. supervised: model training, focused on predictive tasks (e.g. risk of death, readmission, length of stay, early deterioration, …);
  2. unsupervised: discovery of latent structure/subclasses in a dataset, useful to define subgroups and phenotypes;
  3. reinforcement learning:virtual agents ought to take actions in an environment so as to maximize some notion of cumulative reward. This is the most immature branch of machine learning.


The last step of data science is communication, a critical part because It doesn’t matter how well your models and visualisation have led you to understand the data unless you can also communicate your results to others.

Tips in case of error messages

If you encounter any error messages during your Data Science practice just try copy and past your error message into and in most of the times you’ll find an answer.


Most of the material and sample code used in this Data Science masterclass is available online here

The suggested book for starting learning R for Data Science is “R for Data Science” and is available online here Moreover, remember that Google is your friend.


To conclude, my infographic from masterclass in Data Science at summarising the key concepts. Follow me on Twitter: Scquizzato Tommaso @tscquizzato.

ESICM Datathon: Day 3

Hemodynamics in septic shock Speaker: G. Baselli
Amazing how a ‘routine’ haemodynamic system can be viewed so differently from the point of view of an engineer/data scientist/medic

In conclusion

  1. Autonomic nervous system function and CV regulation is dynamic and hence the data needs to be dynamic and not a constant
  2. Need to pick out the meaningful physiological parameters to feed into the machine learning algorithm
  3. Important to have large open-access databases
  4. These databases need to integrate multi-scale information both in dimension and in time

Neurointensive care Speaker: A. Ercole

The concept of cerebral perfusion pressure = MAP – ICP is an example of a simple mathematical concept

We measure what we can, NOT what we should

Perhaps the autoregulation status of the TBI pt is more important – cerebrovascular pressure reactivity (PRx) Link

Data studies needs the same robustness as any other drug studies

Data Access quality and Curation for Observational Research Designs

Reinforcement learning in sepsis Speaker: M. Komorowski

The Artificial Intelligence Clinicians learns optimal treatment strategies for sepsis in intensive care

Can a computer help the clinician do the right thing?

Looked at 2 groups of treatment in septic patients – vasopressor use and fluid administration.

In general pts received more IV fluids and less vasopressors than recommended by the AI policy.

The machine considered much more variables compared to the human clinician


  • Reinforcement learning could provide clinically interpretable treatment suggestions
  • The models could improve outcomes in sepsis
  • Flexible framework transferable to other clinical questions

Gradient-boosted decision trees Speaker: C. Cosgriff

“Its not about doctors versus computers, its doctors with computers versus doctors without computers – @cosgriffc”

If you don’t know what gradient boost is (I don’t) but would like to find out more OR want to learn some basic coding in R/Python/SQL, have a look at

The power of XGBoost for the ICU is to shift towards ‘human intelligence’, supporting clinicians and intensivists (NOT REPLACING THEM)


  • Introduction to Statistical Learning
  • Elements of statistical learning
  • Deep Learning with Python
  • xgBoost: A scalable tree boosting system

Clinical research and AI Speaker: R. Furlan

This is highlighted by a local project looking at the ability to diagnose syncope using natural language algorithms of EHR compared to human clinicians

Moving models to the bedside Speaker: P. Thoral

Insight into the challenges on bringing machine learning and artificial intelligence models to the bed side.

Highlights the various legislative requirement with regards to introducing system (medical software is still considered a medical device)

Need to engage the stakeholders

The future: physicians, engineers, machines Speaker: R. Barbieri

Various scenarios of the future predicted by Prof Barbieri


  • Technology used in warfare
  • Technology cannot overcome environmental disasters

Best Scenario

  • Technology cures, predicts and prevents diseases
  • No need for ICUs


  • Technology blunts human knowledge
  • Humans lose ability to think
  • Technology takes over but are unable to make critical decisions

Oligarchy (aka BladeRunner)

  • Technology and knowledge controlled by a few
  • Progress without common wealth
  • Humans and machines start merging


  • Machines takes over
  • Humans and human nature are irrelevant


ESICM Datathon: Day 2

Session 3: Advanced data analysis Chairs: J. De Waele, A. Girbes

AI & machine learning for clinical predictive analytics Speaker: M. Ferrario

An Interpretable Machine Learning Model for Accurate Prediction of Sepsis in the ICU.

The combination of genomics, metabonomics coupled with Artificial Intelligence and Machine Learning is an incredible one. BUT its application is still an open challenge.

Given the complexity and heterogeneity of the data, there are no well-defined set of procedures to interrogate them.

BUT THERE ARE ISSUES WITH MACHINE LEARNING – Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

New meaning to observational studies Speaker: S. Finazzi

Data sources

  • Prospective data collection
  • Administrative databases
  • Registries
  • Electronic health records

Research question that can be tackled

  • Evaluation of quality of care
  • Study clinical and decision making processes
  • Analyse pathophysiological phenomena

Besides MIMIC/PhysioNet, there are other collaborative databases out there

These data have been used to improve care and processes in participating departments. Also can act as benchmarking exercise.

The big issue is the quality of the data!

Predictive models and clinical support Speaker: G. Meyfroidt

Examples of application:

Computerized prediction of intensive care unit discharge after cardiac surgery: development and validation of a Gaussian processes model

Predictive models may help us to predict patients discharge form the ICU, to predict intracranial pressure increase and acute kidney injury onset.

Predictive models may help us to predict patients discharge form the ICU, to predict intracranial pressure increase and acute kidney injury onset.

Medical data science 101 Speaker: M. Komorowski

Why should we conduct secondary analysis of EHR?

  • RCT results not always applicable to real life patients
  • RCTs are negative!
  • RCTs won’t allow precision medicine
  • Not using the data is unethical


  • Observational data: difficult to examine causality
  • Availability of the data?
  • Data quality

Matt then did a LIVE demo on how to build a machine learning model – follow my thoughts here

State of the art of EMRs in Europe Speaker: T. Kyprianou

Cognitive Informatics in Health and Biomedicine

Adverse effects in medicine: easy to count, complicated to understand and complex to prevent.

There needs to be a shift in focus from error intolerance to error recognition and recovery.

The data for EHR are driven by 4 sources:

  • Patient
  • Unit
  • Education
  • Research

Problems and promises of innovation: why healthcare needs to rethink its love/hate relationship with the new

Improving the Electronic Health Record—Are Clinicians Getting What They Wished For?

The tragedy of the electronic health record

Opportunities for ICU CIS/PDMS

  • Direct link/real time updates of patient’s medical records
  • Healthcare professionals access to all information and services they need in one place
  • Patients/family-centric decision-making based on best clinical evidence
  • Improve data quality and analysis
  • Development of better and more effective security protocols
  • Faster test turnaround times to provide quicker diagnosis for patients.

GDPR and pseudonymization Speaker: D. Fulco, A. Di Stasio

This was an absolutely fascinating insight into the GDPR from a legal prospective.

I really think that GDPR is a good thing.

General Data Protection Regulation (GDPR)

Privacy in the age of medical big data

Blackout: when IT fails Speaker: C. Hinske

*Great title slide*

3 types of failure

  • Failure to use (e.g. IT blackout)
  • Failure to support (e.g. incorrect information)
  • Failure to enable (e.g. too much information)

Top 3 tips

    Risk assessment

    • Contingency plan where you tolerate workflow disruption (with strict time limit) followed by a fallback plan

    Failure strategy

    • Failure prevention –> Failure management strategy

    Train your team

    • Simulated systems fail

Prediction and deep learning Speaker: A. Ercole

If you were blown away by Matt’s SQL and Python prowess, wait till you see Ari’s demo. I was mesmorised when he did his party trick at LIVES2016 in Milan.

This time around, he constructed mortality prediction models in real time using R (here)

Critical Care Health Informatics Collaborative (CCHIC): Data, tools and methods for reproducible research: A multi-centre UK intensive care database

Another mesmorising site he introduced us to was the Neural Network Playground


The issue of data quality Speaker: S. Vieira

A Data Quality Assessment Guideline for Electronic Health Record Data Reuse

The types of missing data

  • Missing at completely random e.g. loss of label in lab test
  • Missing at random e.g. arterial pH, PaCO2 measurements in blood
  • Missing Not at Random e.g. blood counts which doctor decides not to do

Harmonization of data sources Speaker: B. Illigence
This is a fascinating insight into the process in Germany introducing a national EHR (it’s not completed yet)

Making sense of a big data mess Speaker: H. Hovenkamp

From the founder of PACMED ( based in Amsterdam

Once upon a time: the story of MIMIC Speaker: R. Mark

This is probably my highlight of day 2. The story of how the MIMIC database came into being from Prof R Mark. Amazing and inspirational. A call for further collaboration. Furthermore, if you use the MIMIC data and publish your research, you must submit your code to an open repository.