Tag Archives: machine learning

2nd Milan Critical Care Datathon & ESICM Big Datatalk

 

The 2nd Milan Critical Care Datathon & ESICM Big Datatalk 3 days of data science meeting (involving 350 attendees, 23 faculty members & 20 mentors), held at Humanitas Hospital: lot of enthusiasm & interest, standing rooms only for the whole event, including:

  • the datathon a competition to learn how to implement/work on data at the bedside, with physicians, allied health care professionals and engineers/data scientists working together
  • the datatalk, a meeting focused on data science, artificial intelligence and machine learning, to allow attendees to learn more about these new challenges but promising strategies, including some fist examples of clinical applications showing how these tools could improve outcomes (ie in perioperative setting, in septic pt, in radiology…), reflections on potential ethical and legal/regulatory issues, and some practical demonstration on to do the query
  • an important European Society of Intensive Care Medicine ESICM &  Society of Critical Care Medicine  SCCM joint meeting, involving:
    ESICM President Jozef Kesecioglu
    ESICM President Elect Maurizio Cecconi
    ESICM Past President Massimo Antonelli
    SCCM President Heatherlee Bailey
    SCCM President Elect Lewis J. kaplan
    SCCM President Elect Greg S. Martin
    with a related joint initiative between the two Societies been announced.

We need to share data in order to improve clinical outcomes and scientific progress: free databases are now available, ie MIMIC (Medical Information Mart for Intensive Care)-III (Johnson AEW et al, Sci Data. 2016), and the new Amsterdam one, and more are expected to come in the next future. ESICM launched a new, really active section dedicated to data science.
It is not just artificial intelligence (AI) & machine learning (ML) but data science (DS): using (all) health care related data that we are not actually able to use as data volume is overwhelming, & physicians and all healthcare professionals are overworked. And the human?? As part of out job is caring the patient, not only curing, and sometimes what we should do in taking the patient’s hand, we hope these tools could give us the time we need to be (more) human again (take. a moment to read Topol E. Deep Medicine: how Artificial Intelligence can make healthcare human again). In the infographic, some notes from the talks given during the three days, enjoy!! PS you could also look for the official hashtag #ESICMdata20 to recall all the posts from the meeting! 

Here group pics of the team competing in the datathon!

And the winners!

Data Science Masterclass

The field of machine learning, science that studies the design of algorithms that can learn, is advancing rapidly and is becoming widespread in critical care medicine given the large amounts of data collected routinely in the intensive care units. Typical tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns. These tasks are learned through available data that are observed through experiences or instructions.

The goal of this Data Science masterclass is to teach doctors and other health care professionals basic concepts and skills and give tools for working more effectively with data. Moreover, in the literature there is an increasing number of papers describing AI/machine learning algorithms and prediction models so clinicians and other healthcare provides must know the key concepts of Data Science to correctly interpret results.

The Data Science masterclass was a very interactive and practical course were participants have the possibility to discover insights about large, rich and complex data sets, to find new ways to answer clinical questions using large datasets of electronic health records, to cooperate with specialists of different fields and to learn more about the potential of medical data, machine learning and predictive modelling that could provide new insights and improve patient care.

To start familiarizing with Clinical Data Science for Critical Care you need

  1. a laptop
  2. to install R and R studio
  3. to have or to sign up for a Google docs account (optional)
  4. to download and install a spreadsheet software

Moreover, you need to have an understanding of how files and folders (directories) are named on your computer because unlike your usual habit of pointing and clicking to open something you don’t have a graphical user interface (GUI) and you will need to start writing instructions/scripts in the R terminal.

What is R?

R is a free cross-platform (UNIX platforms, Windows and MacOS) software environment for statistical computing and graphics well suited to data analysis. R is not graphical (GUI) instead is based on scripts and the learning curve might be steeper than with other software. Working with scripts forces you to have deeper understanding of what you are doing.

Why R?

3 good reasons:

  1. You can do anything in R
  2. Science should be reproducible
  3. You have a vast support network

People think R is hard because it’s not a graphical user interface (GUI) and you have to describe what tasks you want the computer to complete in text, using the R language.

Data pipeline

Building data pipelines is a core component of data science. Data pipeline is a set of actions that extract data (or directly analytics and visualisation) from various sources to produce an output (tables, plots, manuscripts, presentations) thanks to a R script. 

After obtaining data from electronic health records databases, web servers, logs, online open-source repositories you have your data in a spreadsheet, you write instructions/scripts using the R language and you obtain an output: a table, plot or entire manuscript. You can change your data, or add new data, and run the script another time and instantly you regenerate the output.

Data preparation

Data preparation is the combination of data cleaning and data modelling. To be able to describe, plot, and test data must be tidy following the rule that “Each column is a variable. Each row is an observation.”. Data preparation includes variable re-naming, extract numbers and strings, parsing dates, columns to rows, missing and duplicate values.

Types of data: Not all data is equal, aim for consistency in every column, never try to record more than 1 type in a column: integers, decimals, strings, datetime, booleans, factors, try to think like a computer.

Data visualisation

Complex ideas must be communicated with clarity, precision and efficiency with storytelling, decluttering, avoid misleading and pie chart horror, scaling up and rational use of colours.

Visualisation is a fundamentally human activity. A good visualisation will show you things that you did not expect, or raise new questions about the data. A good visualisation might also indicate you that you’re asking the wrong question or you need to collect different data. 

Statistical modeling

Models are complementary tools to visualisation. Once you have made your questions sufficiently precise, you can use a model to answer them. Machine learning algorithms are divided in three categories:

  1. supervised: model training, focused on predictive tasks (e.g. risk of death, readmission, length of stay, early deterioration, …);
  2. unsupervised: discovery of latent structure/subclasses in a dataset, useful to define subgroups and phenotypes;
  3. reinforcement learning:virtual agents ought to take actions in an environment so as to maximize some notion of cumulative reward. This is the most immature branch of machine learning.

Communication

The last step of data science is communication, a critical part because It doesn’t matter how well your models and visualisation have led you to understand the data unless you can also communicate your results to others.

Tips in case of error messages

If you encounter any error messages during your Data Science practice just try copy and past your error message into stackoverflow.com and in most of the times you’ll find an answer.

Resources

Most of the material and sample code used in this Data Science masterclass is available online here datascibc.org/Data-Science-f

The suggested book for starting learning R for Data Science is “R for Data Science” and is available online here r4ds.had.co.nz. Moreover, remember that Google is your friend.

Infographic

To conclude, my infographic from masterclass in Data Science at summarising the key concepts. Follow me on Twitter: Scquizzato Tommaso @tscquizzato.