VITech Lab Survey Insights: State of Data Science and ML in Healthcare

VITech Lab is pleased to reveal the results of the State of Data Science and ML in Healthcare survey that we conducted on LinkedIn in 2019. The survey sought to look into the scope and patterns of adoption of data science and machine learning in the healthcare industry. Over 50 qualified respondents represented a variety of company sizes, from startups to corporations with more than 10,000 employees, in pharma, care sector, biotechnology, and medical device development. Among them: C-level execs, Directors, and VPs (50% of the pool), as well as ML engineers, data scientists, and software developers. The surveyed have a strong grasp of industry’s challenges and objectives, enabling us to analyze and assess the trends and the actual state of tech in healthcare.

Data Is the Key Concern

Around 50% of respondents surveyed in State of Data Science and ML in Healthcare report they have already adopted AI, however, data remains the key concern. Finding reliable and relevant data, data evaluation, data extraction, data processing and cleansing, and data transfer are reported as major challenges for AI initiatives to kick in. Others wrestle with visualizing large datasets to enable more efficient analytics and decision-making.

At the same time, although AI adoption is growing rapidly among the leaders of the healthcare industry, half of respondents note their companies do not use any of the data science and machine learning methods to drive either research or business outcomes. However, part of them indicate that they are planning to or are already exploring solutions relevant to their needs. Trend identification, diagnostic imaging results, prognostics and predictive analytics are among current requests.

Among other AI/ML adoption challenges in healthcare are:

  • Decision-making. AI officers and dedicated AI teams are not common. Being able to connect business needs to specific ML tasks remains an organizational struggle.

  • Team training. The shortage of qualified staff remains an issue. Training employees to use AI systems takes time and resources that smaller players may not have.

  • ML infrastructure issues. IT teams struggle to efficiently integrate ready-to-use ML models, much less design and develop their own solutions from scratch. They face data pipeline issues, wrestle with updates and optimizations.

  • Tools integration and use. Though the tooling for ML has gone long ways in recent years, organizations lack expertise in using and integrating them into their systems. For instance, Google’s AutoML and Amazon SageMaker Ground Truth are reported by the surveyed as challenging to use with limited ML skills.

  • End-user disposition towards AI suggestions. Respondents report that users do not trust suggestions generated by AI. Educating users about potential and real-world benefits as increased diagnostic accuracy and flexibility can be a challenge.

Analytics, Reporting, and Image Analysis Remain a Priority

Machine learning is enabling healthcare companies to optimize a wide range of business processes, but analytics and image analysis are reported as the most common tasks by 45% of the surveyed.


Analytics is applied in healthcare in many ways, but mostly for data analysis, interpretations and statistics. The surveyed report to be using analytics capabilities of data science and machine learning for predictions of efficacy of medical outcomes, procedural enhancement through better insights, text analysis, medical market research, data analysis of biometrical research datasets, NGS data analysis, and molecular analysis.


Image analysis covers a wide range of tasks as well, from image classification and processing to image reconstruction. In practical terms, automated image analysis helps detect anomalies in images at scale to diagnose a variety of diseases and conditions like skin cancer, diabetes, and blindness more efficiently.


Quality and security represent another scope of tasks. For example, the surveyed report to be using ML solutions to quantify and track quality control parameters in laboratory environments. They apply it to assess and enhance standard operating procedures and mitigate risks through Quality Through Training Program (QTTP). ML is used to increase security of critical business and patient data, too.


Other applications of ML include: Patent search, molecule search, pharmacovigilance, chemical space navigation, clinical robotics, etc.

Clouds and ML Tooling

The respondents report to be progressively migrating enterprise workloads to the cloud, to take advantage of its network of services for data storage, data processing, and machine learning. Amazon Web Services and Microsoft Azure remain the dominant cloud providers in healthcare, with 40% and 30% of the market, respectively. They are also the platforms of choice for data science and machine learning tasks, with Python and R as top open-source programming languages for data analysis.


Among the most repeatedly featured ML platforms and libraries are Keras, PyTorch, Tensorflow, and OpenCV. Django, Docker, SQL Server, and Apache Kafka have been picked as data analysis and data streaming tools of preference, too. Security-wise, the leaders are DarkTrace and Sophos. 


Among healthcare-specific tools, REAL Space Navigator, Pharmapendium, Reaxys, Scifinder are primarily reported for research while Qlik Sense and Spotfire — for decision making and data visualization, and Teradata and Databricks — for data analytics.


For sure, organizations in healthcare continue to rely on a wide range of statistical tools, including the ones developed in-house, and Excel sheets.

Plans for the Future

Around 50% of those organizations and teams who have not applied any of the data science and machine learning techniques in 2019 plan to kick off their AI adoption journey next year. Primarily, they plan to cover a variety of prediction tasks, from volume/demand predictions and prognostics to predictive toxicology and predictive risk alerts.


Among other notable tasks are CAD and image analysis, NGS for pharmacovigilance, staff performance analysis to cut man-hours (i.e. better case processing), clinical trial analysis and optimization, drug design, analysis of sales team data, and PLC communications.

Note: 57%  of the  surveyed report that their organizations do not struggle with IT any infrastructure challenges; 30% report otherwise.

Thank you for reading! 

Feel free to rate the article here: