An in depth analysis of AWS’ HealthLake versus Google Cloud’s Healthcare API for FHIR as of June 2021
Disclaimer: I am employed at Google Cloud as a Solution Architect.
I am a FHIR advocate — and thus, am always interested in the next best addition to the FHIR family. Previously, I wrote a blog on the difference between Azure and Google’s FHIR APIs here. As a continuation of the managed API series, I decided to look into AWS’ HealthLake. AWS announced HealthLake at their Re:Invent conference earlier this year. Unfortunately, it is in preview, so I am unable to test out the product, thus, all of my research is based on their documentation and their conference presentations.
This is an analysis of AWS’ HealthLake product in comparison to Google Cloud’s FHIR and healthcare services. As always, I attempt an unbiased review of both with documentation and resources to support. This reflects the state of both services as of June 2021.
Infrastructure and Scalability
Google Cloud (from previous blog)
The underlying infrastructure for the API is built on Spanner, the same technology that powers Google Cloud Spanner. Google Cloud Spanner is described as a “fully managed relational database with unlimited scale, strong consistency and up to 99.999% availability.” It is a first of its kind database, claiming to break the CAP theorem, having both analytics and transactional functionalities. Spanner ensures “TrueTime” external consistency which is the strongest form of consistency one can achieve for transactions in a distributed system. On top of the Spanner infrastructure, Google Cloud’s Healthcare API is tightly integrated with additional proprietary technologies such as indexing and queuing features, optimized for FHIR workloads, to achieve its performance and scalability.
Amazon does not have clear documentation on exactly how FHIR data is persisted. Based on their FHIR search functionality documentation AWS states data first is stored in Amazon DynamoDB for CRUD operations and then processed into an Amazon Elasticsearch Service cluster for search operations. It is unclear if that suggests the data is persisted in two separate products based on the users interaction with the product — or if it is removed from DynamoDB once the data has been processed. It is also unclear what the latency/performance implications are by having a data processing pipeline between the two products. In fact, the documentation alludes to some latency as it states the store has “eventual consistency” for searchability.
Spanner and DynamoDB have plenty of comparisons done by third parties, so I will not cover the database differences in this article, rather, focus on the difference between the APIs provided by Google Cloud and AWS HealthLake to access the FHIR data. Both APIs leverage performant, high availability back-end systems for their FHIR data storage, yet neither API provides direct access to the underlying datastore.
Google Cloud FHIR service has tight integration with the other offerings in the Google Cloud product stack, as stated below.
First and foremost, there is an out-of-the-box integration with Google’s Cloud data warehouse product, BigQuery. This means with one-click users can bulk export FHIR data to BigQuery, or choose to stream data from the FHIR store to BigQuery real-time. This saves organizations from the hassle of creating and maintaining a custom pipeline between the products, as it is done automatically for users without any code. This also saves users from writing a DDL or maintaining the schema — as Google uses community best practices by leveraging the SQL on FHIR schema.
BigQuery also is a main point of integration for Google Cloud’s ML suite, such as their recently announced Vertex AI. Once data is exported or streamed to BigQuery, users can utilize their data to train and deploy models, enrich with BigQuery’s public datasets, or integrate with any number of partners in the BigQuery ecosystem.
Google Cloud Storage Integration
A second out of the box integration option is with Google’s object storage product, Google Cloud Storage. With the same one-click method used for BigQuery, users can import and export FHIR data to and from Google Cloud Storage. This is useful for data interoperability and sharing use cases.
Google Cloud Pub/Sub Integration
Finally, Google’s Healthcare API has out of the box integration (again, one click, no code) with their Pub/Sub product. This is a real-time notification service for events that occur on the FHIR store. The notification — or message — is published when a resource is created, updated, or deleted from a FHIR store. This notification can be used to kick off any number of real-time workflows, such as triggering an action on an application or to use the new resource as an input for a ML model. Pub/Sub allows for integration with most of the Google Cloud’s data suite.
The overall architecture for Google Cloud’s Healthcare APIs is demonstrated in the image below. Note: there are capabilities shown that are out-of-scope for this article.
HealthLake’s main value add is their integration with Amazon Comprehend Medical, a natural language processing service specific for medical use cases. AWS provides the seamless ability to search on text data that has been through Comprehend Medical and stored in the FHIR format. This is a value add for analyzing free text and clinical notes that may be difficult to abstract otherwise in combination with structured clinical data stored natively in the FHIR format.
Other than Comprehend Medical, the only other data integration Amazon HealthLake has with its product suite is through their object storage product, Amazon S3. Users can import and export data from S3 into Amazon HealthLake in batch/bulk only. The overall architecture for Amazon’s HealthLake is demonstrated in the image below. Note that the only way to utilize Amazon SageMaker, their ML tool suite, is by exporting to object storage, creating a custom pipeline, and pushing it to the rest of the architecture.
The main external data integration AWS offers is with their object storage product, S3, for batch import and export. This means Amazon only supports bulk/batch use cases and cannot be utilized for streaming use cases. Google offers the same functionality with their Google Cloud Storage integration, but also has additional integrations with their data warehouse and notification services. Fundamentally, if a user wanted to utilize the FHIR data outside of the CRUD and search operations provided by HealthLake, they would have to build custom pipelines to other products and build their own event management system. Google Cloud has removed the need for users to write or manage any integration code by providing UI-based product integrations and coupling with a comprehensive database notification system.
Accessing the FHIR Data API Comparison
FHIR Servers traditionally offer an API specification defined by the HL7 community on how to interact with the FHIR server. FHIR servers typically need to accommodate for enterprise functionality beyond the API methods defined by the standard FHIR specification, for example, import and export to other systems and a need for analytics on FHIR data. The ability to use SQL directly over FHIR is becoming increasingly popular in the community, as well as the need to visualize the data for dashboards or administrative tasks. FHIR data viewers have also risen in popularity to allow non-technical users access to the raw data without having to manually parse the JSON data to understand the resource. Currently, AWS only offers users to access data in the raw JSON, whether it be in a storage object or from an API return. Google’s suite also offers direct API access to FHIR content as well as integration to utilize the raw JSON as Google Cloud Storage objects. Google has two additional functionalities not supported by AWS’s HealthLake — direct access through a SQL database and a UI to view data. Google allows for structured SQL-querying and viewing of FHIR through BigQuery by a direct integration between the products. In addition, Google has also launched its own FHIR viewer. Google’s FHIR viewer allows users to access their resources through the Cloud Console — providing an easily-digestible way to view FHIR data for all users, regardless of skill set. It provides options for an overview display, a tree-structured visualization (shown below), and a direct view of the raw JSON object. Finally, the FHIR viewer allows users to navigate between referenced resources by simply clicking on the reference. FHIR accessibility is a clear area where Google has increased functionality and usability.
FHIR Version Support Comparison
The following table lists the currently supported versions by each service.
FHIR Operations and Server-Wide Operations Comparison
Import and export functionalities for HeathLake are only designed to be to and from storage buckets, or object storage, not databases. This means that developers do not have query-ability against the data nor can they visualize or analyze the data without creating custom pipelines to other AWS database products. This is in stark comparison with Google’s integration with their managed data warehouse product, BigQuery, which is done automatically for users. There is also no streaming in or out of the AWS product, making it less likely to be utilized for AI/ML workflows. Below is a more in depth comparison of the operations able to be conducted by the two services.
There are also several extended operations the FHIR community has defined that Google Cloud supports that are currently not supported in the AWS product. Patient $everything gives users the ability to fetch all the resources around one patient and is currently only supported in the Google Cloud product. Observation $lastn is a second FHIR wide call that allows users to pull the last “n” number of observations for a certain subject, and is currently also only supported in the Google Cloud product.
Furthermore, Google has robust console integration for their FHIR stores, allowing for import and export to be done on the console UI as demonstrated, however, also creation and deletion of stores — and additional functionality such as deidentification.
Rest API calls Comparison
The table below highlights the differences between the rest API calls of the two services. AWS supports very minimal API functionalities compared to the API specification outlined by the FHIR community.
Search can be an extremely useful FHIR operation for users. The following table has general search functionalities and what each product supports. On top of this table, Google supports a number of advanced search capabilities such as sorting, chained parameters, and reverse chaining which are not supported in the AWS environment.
Note: AWS’ HealthLake is currently free as it is still in preview. This pricing is based on their public documentation and both vendors pricing is subject to change. This pricing analysis was conducted on 05/12/2021.
Google Cloud’s pricing is based on data storage and API requests. Structured storage has multiple tiers, their free tier up to a GB per month, $0.39 per GB per month up to 1 TB, then it drops to $0.34 per GB per month above 1 TB. The API pricing also has multiple tiers, the free tier covering up to 2.5 billion requests per month. After the free tier the requests are broken down into standard, complex, and multi-operation requests. For standard and multi-operation requests, the API is $0.39 per 100,000 a month, and $0.29 after a quintillion of them (wow!). Complex requests run at $0.69 per 100,000 a month, dropping to $0.59.
AWS’ pricing is also broken out into data storage and API requests, with an additional runtime charge for server costs. The Data Store cost to run is $0.27 per Data Store per hour. The free tier includes 10 GB of storage, then jumps to $0.25 per GB stored per month. For API requests, AWS offers the first 2.52 million requests per month for free, and then charged $0.015 per 10,000 queries per hour, rounded to the nearest 10,000.
AWS does have a larger free tier for data storage than Google Cloud at 10 GB versus Google’s 1 GB. That being said, AWS also charges for server runtime, a cost GCP’s FHIR API does not incur. If a user was to run a server 24/7 for 30 days, that would cost around $194.40. AWS has a lower storage cost at $0.25 GB per month, compared to Google Cloud’s $0.39 GB per month. The API cost for AWS is calculated per hour, while Google Cloud’s is calculated per month — making it difficult to make a 1:1 comparison.
I have calculated the amount required to store 100 GB of data for a month while querying the FHIR store 1,000 times a minute for both services.
HealthLake is a cheaper FHIR option for storage and API calls, however, charges almost 200 dollars a month for running a FHIR store. This could be an unfriendly option for smaller organizations or small workloads looking to test FHIR, or, for larger organizations whose strategy revolves around having many FHIR stores.
Note on the API free tier: With 3,500 queries per hour free, this is 56,500 an hour. That being said, since it rounds up to the nearest 10,000 queries per hour, it is still charged at 60,000 an hour — or $0.09 an hour. The free tier for AWS seems to be a little misleading, since they round up to the nearest 10,000. So if a user queried 13,501 times, with a free tier that makes 10,001 queries, yet, AWS would charge a user for 20,000.
Limitations in Preview
AWS has restrictions in preview currently documented here. These limitations are very limiting, and due to these limitations and the early preview status, an enterprise organization would likely be unable to utilize this in production.
Out of Scope
Both Google Cloud and AWS have medical NLP products as part of their healthcare offerings. This article does not dive into the functionalities of either, as the likely most useful comparison would be to review accuracy of each — requiring a third-party study and review.
Google’s FHIR API when compared 1:1 with AWS’s HealthLake supports double the amount of FHIR resources, all FHIR versions (in comparison with just R4 in AWS) and more capabilities to view and analyze the FHIR data. In a feature by feature review, Google has significantly more API functionalities than what AWS offers — including extended operations and conditional operations. Google’s API has more comprehensive search capabilities and broader support for things like FHIR profile validation and pagination. Google has tighter integration with its larger product suite thus allowing for less overhead and management on users. Google supports streaming of data, unlike AWS’s batch-only model, and thus is a more attractive option for machine learning or application use cases. Finally, AWS pricing is more attractive, however, Google FHIR offers more advanced FHIR capabilities that are necessary for enterprise data management. Try both out and decide for yourself at Google Cloud and AWS.