Healthcare Data Warehousing

Arunachalam S. *  Tom Page **  Thorsteinsson G. ***
* Senior Lecturer, Department of Mechanical Engineering, University of East London, London.
** Senior Lecturer, Electronic Product Design, Loughborough University, UK.
*** Professor, Department of Design and Craft, University of Iceland, Iceland.

Abstract

The term data warehouse was first used by Inmon in 1990 which he defined as “A warehouse is a subject oriented, integrated, time- invariant and nonvolatile collection of data in support of management decision making process” [Inmon, 1990]. On the other hand, Ralph Kimball provided a much simpler definition of the data warehouse. In the book ‘The Data Warehouse Toolkit’ he said that a data warehouse is “a copy of transaction data specifically structured for query and analysis” [Inmon and Kimball, 1996]. This paper provides theoretical description of data ware housing. Implementation of clinical Data warehousing were analyzed, finally evaluating them. The proto type considered for implementation has been collected from the East Hospital London Clinical Data Warehouse Project.

Keywords :

Introduction

A data warehouse can be normalized or denormalised, as it can be a relational database, multidimensional database, flat file, hierarchical database, object database, etc. With the widespread availability of the ever changing computer technologies, telecommunication technologies and internet access, most of the business handles huge amounts of data. The business organizations collect, examine and take their decision for the next move. The information they collect from different sources help the decision makers to take the right decision at the right time. Data warehouse which acts as a decision support system, extracts, integrates, and stores relevant information. This is to support the information needed by the information workers and decision makers. Finally, “Data warehouse is a process, not a product, for assembling and managing data from various sources for the purpose of gaining a single, detailed view of part or all of a business” [Stephen, 1998].

Today data warehousing is a part and parcel of the whole decision support solution for a business organization. For the huge potentiality of the data warehouse, most corporations have discovered that a data warehouse is a must have strategic weapon in today's business world. Now most of the big corporations have built or are building data warehouses to support business analysis activities and decision making tasks.

The Clinical Data Warehouse

The Clinical Data Warehouse (CDW) is a place where healthcare providers can gain access to clinical data gathered in the patient care process. It is also anticipated that such data warehouse may provide information to users in areas ranging from research to management [Sen, and Jacob, 1998]. A clinical data warehouse is a tailored data warehouse for the needs of users in clinical environment. The clinical data warehouse combines information from a variety of legacy healthcare database and extract operational data to form a centralized repository to answer the informational needs of all clinical users. The data in a clinical warehouse are not only used by the patients, nurses and doctors, but also used by researchers, scientists, and medical students. The clinical data warehouse is all the above, for clinical administrators “for example, a clinical responsible for utilization management may retrieve data on all patients with a specific DRG/CMG and determine how many lab tests of which type were ordered, to see if services appear to be under or over utilized. An administrator may want to look at the average length of stay over all patients or selected groupings. A department manager may want to review the number of paid staff hours that is required for each type of test performed for those tests that actual turnaround times within a certain range during the summer versus the winter” [Covvy, 2000].

The clinical data warehouse can provide significant improvements in patient care, hospital management, organisational effectiveness, etc. Clinical data warehouse can facilitate efficient storage, enhance timely analysis and increase the quality of real time decision making process. The clinical data warehouse is a place where health care providers can gain access to clinical data as a part of the patient care process. The clinical data warehouse can produce improved patient care service and improved clinical research. They can reduce the medical errors and enhance quality measurement. They also can decrease costs and at times, increase revenue.

Clinical data warehouse is much more complex than the business facts of average data warehouse data. Furthermore, a clinical data warehouse demand even more security measures and less downtime than a conventional data warehouse. But a conventional data warehouse and a clinical data warehouse do not differ so much. After all, they use the same technology, but in the case of a clinical data warehouse everything gets more complex and complicated.

The Project Objectives

The primary objective of this paper is to evaluate a data warehouse in a clinical context. Data warehouse in clinical context has traditionally been administrative in nature, focusing on patient-billing and patient-care management. The clinical data warehouse combines information from a variety of legacy healthcare database and extract operational data to form a centralized repository. The data is, then used for making crucial decisions. The data in a clinical warehouse are not only used by the patients, nurses and doctors, but also used by researchers, scientists, and medical students. To design a clinical data warehouse it is necessary to find out the requirements of the client. Since the clinical data warehouse is used by various medical organizations, it is necessar y to choose the correct vendor or methodologies to avoid complications.

Research Methodology

The research for the paper started with several literature and journal study covering data warehouse, data warehouse architecture, data warehouse methodologies, and the importance of data warehousing in a clinical context. After that reading books in the related field and taking important notes and writing down for the future use of the clinical data warehouse.

The Project Scope

This research describes the data warehouse as well as the clinical data warehouse elaborately. This research also describes the problems that rise while designing a clinical data warehouse and gives the possible solutions. This research also contains a case study based on the University of Virginia Clinical Data Warehouse which helps someone to understand the clinical data warehouse. Those who want to build a clinical data warehouse can have a picture of the problems. At the same time they will get the possible solutions for those problems. This paper also used some data warehouse and clinical related terminology which is assumed to be understandable to the interested readers of this project.

1. An Overview of Data Warehousing

1.1 The Definition

The term data warehouse was first used by Inmon in 1990 which he defined as “A warehouse is a subject oriented, integrated, time-invariant and nonvolatile collection of data in support of management decision making process” [Inmon, 1990]. Inmon (1990) defined the terms in the sentence as follows:

A single subject data warehouse is typically referred as data mart, whereas data warehouses are generally enterprise in scope. Data warehouses can also be volatile. Due to the large amount of storage required for data warehouse, only a certain number of periods of history are kept in the data warehouse.

On the other hand, Ralph Kimball provided a much simpler definition of the data warehouse. In his book, ‘The Data Warehouse Toolkit’, he said that a data warehouse is “a copy of transaction data specifically structured for query and analysis” [Kimball and Inmon, 1996] .

Both Bill Inmon and Ralph Kimball agree that a data warehouse integrates data from various operational source systems. In Inmon's approach, the data warehouse is physically implemented as a normalized data store. In Kimball's approach, the data warehouse is physically implemented in a dimensional data store.

2. The Clinical Data Warehouse (CDW)

This section provides an overview of Clinical Data Warehouse. Section 2.1 provides the brief introduction to the Clinical Data Warehouse. Section 2.2 describes the fields of application of the Clinical Data Warehouse. Section 2.3 describes the comparison between the Clinical Data Warehouse and the data warehouse.

2.1 The Introduction of the Clinical Data Warehouse

Data warehouse in clinical context has traditionally been administrative in nature, focusing on patientbilling, and patient-care management. Patient-billing systems manage financial systems and are primarily concerned with providing information for financial optimization, managing costs, and coordinating health insurance providers. They typically contain demographic information about the patient, length of stay in the hospital, costs, etc. On the other hand, patient-care systems generally manage information about individual patients such as patient's location in the hospital, medical tests, prescribed medicines, etc. They are mainly concerned with optimizing administrative efforts within the hospital. They were mainly operational databases where large quantities of data were collected. But later it was also being required sorting, combining and analysis of data. As a result, the clinical data warehouse comes into the limelight. A clinical data warehouse is a tailored data warehouse for the needs of users in clinical environment. The clinical data warehouse combines information from a variety of legacy health-care database and extract operational data to form a centralized repository to answer the informational needs of all clinical users. The data in a clinical warehouse are not only used by the patients, nurses and doctors, but also used by researchers, scientists, and medical students.

2.2 Fields of Application

The clinical data warehouse has various fields of application. Some of them are as follows:

2.2.1 Clinical Research

2.2.2 Patient Service

2.2.3 Quality Improvement

2.3 A Comparison between Data Warehouse and Clinical Data Warehouse

As the authors have mentioned earlier, the data warehouse is a data structure that is optimized for distribution, mass storage, and complex quer y processing. Data warehousing methodologies share a common set of tasks, including business requirements analysis, data design, implementation, and deployment. On the other hand, the clinical data warehouse is a place where healthcare providers can gain access to clinical data gathered in the patient care process. It is also anticipated that such data warehouse many provide information to users in areas ranging from research to management (Sen and Jacob, 1998). The clinical data warehouse is all the above, for clinical administrators “for example, a clinical responsible for utilization management may retrieve data on all patients with a specific DRG/CMG and determine how many lab tests of which type were ordered, to see if services appear to be under or over utilized. An administrator may want to look at the average length of stay over all patients or selected groupings. A department manager may want to review the number of paid staff hours that is required for each type of test performed for those tests that actual turnaround times within a certain range during the summer versus the winter ” [Covvy, 2000]. At the same time, clinical data warehouse is a tool for care providers and researchers. Care providers check out data of an individual patient and compare it to the previous ones. They also check the medical records, diagnosis, medications, etc. On the other hand, researchers are interested in many patients just like conventional data warehouse users. Their interest lies in the retrieval of complex clinical data of the patients that is organized in more complex networks. In these cases, exploratory knowledge discovery and advanced data mining techniques are used.

The clinical data warehouse is much more complex than the business facts of average data warehouse data. Sometimes physicians or researchers have to go into the deep to get a patient's original healthcare record. For this, fast calculating power and massive storage are required. Furthermore, a clinical data warehouse demand even more security measures and less downtime, but a conventional data warehouse and a clinical data warehouse do not differ so much. After all, they use the same technology, but in the case of a clinical data warehouse everything gets more complex and complicated.

3. The Challenges and Issues of Clinical Data Warehouse

From the previous section, we have learned that the clinical data warehouses are complex in nature in comparison to a data warehouse. It is important to build a clinical data warehouse successfully. Clinical data warehouses require extensive analysis on data design, architectural design, implementation, and deployment. When designing a clinical data warehouse, data integration tasks of the medical data store are challenging scenarios. As the clinical data warehouse systems are much more complex, they must operate reliably to deliver continuous patient care service. Any fault in any medication or test result could be fatal to a patient. We should keep this in mind to avoid mistakes and ignorance. For these reasons and others, it is therefore important to train people who understand healthcare, information technology, health information regulation and have knowledge about using clinical data warehouse to work in a clinical data warehouse. Only a successfully implemented clinical data warehouse can provide patient safety, quality and organizational efficiency, but there are many challenges and risks involved while implementing a clinical data warehouse. Among the many challenges and issues, the most prominent are architectural issue, data quality, patient privacy, report consistency, scalability, and user involvement.

3.1 Architectural Issue

While designing a clinical data warehouse, several technical issues are to be considered, but among them the architectural challenge is the most important. There are some technologies for data warehouse available in the market, but not every one of them are suitable for designing a clinical data warehouse. Due to the confidentiality of healthcare data and the privacy policy of the patient, it is important to choose the right technology for designing a clinical data warehouse. During the design, and planning process, it is important to study the functionality of different departments. For example, patient management scenarios may differ from the mental health management scenarios. Inpatient care management analysis is conducted for admission, discharges and transfers, length of stay, patient's diagnosis, severity of illness, etc., but for the mental health management, the scenarios are different. In this context, it is essential to monitor health service activities and investigate patient outcomes. The middle or senior management could neither effectively monitor levels of team activities, nor they can determine the predictive factors of the clinical outcomes of mental health patients. As a result, the reporting on the behaviour of the mental is an important element for analysis while designing a clinical data warehouse for a mental health care.

Some issues need to be addressed while designing a clinical data warehouse. Among them data location, technical platforms, and data formats are important. Organisational behaviour on processing the data is also important while designing a clinical data warehouse. These factors are very important and if these factors are not dealt with properly, the required levels of quality decision making and analysis cannot be achieved. Data integration is another factor to be dealt with while designing a clinical data warehouse. In a clinical data warehouse, scenario data is collected from various departments, laboratories, and related administrative processes. Not every department or laboratories would use the same platform or the same vendor to store data. A clinical data warehouse should be able to collect data from different sources. So it is important that the clinical data warehouse has the ability to data integration.

3.2 Data Quality

One of the major issues in the development of the clinical data warehouse is data quality. As a decision support system is one of the main aspects of the data warehouse and it is not exceptionally for the clinical data warehouse. Poor data quality can lead to poor decision which could be even fatal for the patient. Poor data quality can destroy the whole clinical data warehouse project. Data quality in a clinical data warehouse is critical as strategic decision is made on the basis of the data available in the clinical data warehouse. Improved data quality is one of the main issues of running a clinical data warehouse project successfully. Improved data quality:

It is important that data in a clinical data warehouse is good in quality. Bad data can destroy the whole clinical data warehouse project. The data that is stored in the clinical data warehouse, may lead to a bad decision or incorrect medication. These reasons could be fatal for a patient. A bad quality data:

Error in data is the main reason for bad quality data. There are mainly five data errors:

Design, collection, integration and query errors are technical in nature. Data in a clinical data warehouse is collected from different sources. Integration of these kinds of data are an important task and also important problem to tackle, but the main problem of a clinical data warehouse is data collection. “Incorrectly recorded values, mixed records, dropped data, incorrect units, inconsistent standards or untimely collection practices can contaminate the data with scant evidence of error ” [Berndt, et al., 2001] .

3.3 Patient Privacy

Another major issue for the clinical data warehouse is the concern regarding the protection of the patient privacy. As the personal data as well as the medical data of a patient stored in the clinical data warehouse, any user who has the authority to access the clinical data warehouse can see the personal data of a patient. A clinical data warehouse not only is used by the doctors or nurses, but also by the researchers or medical students for their research. So the priority of the patient privacy should be high while designing a clinical data warehouse. It is important to keep a patient's data private. If a patient's privacy be broken, the patient might lose his confidence in coming to the health care centre. The patient might also be in the embarrassing situation if the patient suffers any sexual disease and it comes to the public. The clinical data warehouse contains sensitive data and so, security measures are essential.

3.4 Report Consistency

Report consistency is another issue for the clinical data warehouse. In a clinical data warehouse, data are collected from different sources. The sources of data that is collected may be from several disparate mainframe environments. This may be time-consuming and at the same time, it increases the chances of inaccuracies and conflicts of data. Sometimes it leaves a little time to analyze the data and produce a correct report. Usually a doctor takes his decision based on the report he got from different data analysis. Based on the test report, diagnosis report and medication report, the doctor takes his decision for the patient whether the patient needs an operation or further medication. For example, the situation of a patient is critical and doctors need to take a quick decision about the patient. If the collection of data takes long time, it leaves only a little time to analyze the data. This may lead to delay in taking decisions which may be fatal for the patient. So it is important that data is collected accurately which helps making report consistent.

4. Implementation: Designing a Clinical Data Warehouse (Prototype)

This section presents an implementation of a clinical data warehouse project. Section 4.1 provides the conventional way designing a data warehouse. Sections 4.2 describes the project overview. Section 4.2 and 4.3 give a brief description about MySQL and SQLyog, respectively. Section 4.4 describes the implementation process of the prototype clinical data warehouse. Finally, section 4.5 provides the summary of this section.

4.1 The Conventional Data Warehouse Design Process

Data warehouse design is complex. It requires extensive survey and analysis. There are some steps in designing a data warehouse. Before designing a data warehouse, it is very important to understand and represent the user requirements accurately. The strategies used for this are interviews, document analysis, surveys, company activities, etc. After getting the requirements, the data model is created. Then logical data is created by using an Entity Relation Diagram (ERD) or some other type of conceptual model. Then in the process of physical design, this logical data is translated into a star schema or snowflake schema. The next is the architecture design procedure. There are mainly four strategies available to design data warehouse architectures. They are centralized data warehouse, data warehouse and data mart, distributed data warehouse and hybrid deployment strategy. Based on their demands, the organizations will choose the most suitable methodology. This project has been developed by following centralized data warehouse methodology. The next step is the implementation process. The implementation can be facilitated by some RAD tools. SQLyog tool has been used for implementation. One of the important aspects of designing a data warehouse is the metadata management. Almost all vendors focus on metadata management. Building a data warehouse will be successful if only all the procedures be followed.

4.2 Project Overview

The aim of the project is to build a clinical data warehouse for the proposed “East London Hospital”. The authors have designed and created a clinical data warehouse, and also created a centralized database which will be residing in the hospital. Administrators and doctors in the hospital can collect data from different sources. For example, a patient is referred to the hospital from the clinic. So the doctor in the hospital need not go to the clinic to collect data. The doctor or administrator can extract data from the clinic. On the other hand, if a doctor wants to see the report for a patient, the doctor does not have to go to the laboratory. The doctor can easily collect data from the laboratory. At the same time, a doctor can see the patient's past record whether the patient was suffering from any major illness. After collecting all the information, the doctor can prescribe the best possible medication for the patient. If it is a serious condition, the hospital administration can create a medical board and analyse all the records, test reports, diagnosis, previous medication, etc. After analysis, the medical board can take their decision. The data warehouse here acts as a decision support system for them. This makes the doctors, work easier than before. Not only that, because of the massive calculating power and reliability, the chance of errors are less likely. A clinical data warehouse requires massive storage support because it stores the patient's history and medical data. The more the patients are the more the data storage requires. A clinical data warehouse, typically supports more than 10 terabytes. To record data in a clinic, doctors chamber, laboratory, etc., front end tools like Microsoft Excel, Microsoft Access or any other customised software can be used. All these features support MySQL. That’s why MySQL is getting more and more popularity. By using tools like SQLyog Enterprise, we can create a database, create table design schemas, etc.

4.3 MySQL

MySQL is one of the leading open source database solutions in the market. It provides consistent fast performance, and high reliability. Furthermore, it is easy to use. MySQL powers many online enterprises, embedded, and business intelligence applications for over twelve years.

4.4 SQLyog

SQLyog is the MySQL Graphical User Interface (GUI). It is one of the most powerful MySQL manager and admin tools. It combines the features of the MySQL query browser, administrator and various other MySQL front ends and MySQL clients in a single interface. It helps creating databases, tables, and generates schemas. SQLyog is an 'All in One' tool. We do not have to switch between different applications to perform database related tasks.

4.5 Project Implementation

The proposed East London Clinical Data Warehouse will be implemented now. There are some steps to implement. The following sub-sections describe the steps.

4.5.1 Installing MySQL

From the previous sections, the authors find that to build a clinical data warehouse one need to plan, analyse and choose the right database vendor. In this case, they will use MySQL which is open source and free. It is reliable and easy to maintain. It supports almost all the features of other expensive database vendors. Many well recognized companies use this MySQL. So for this project, they have choosen MySQL. It is open source and free. So they download it to their laptop. For the project they used MySQL server 6.0 version.

4.5.2 Installing SQLyog

To build a data warehouse, first we need to create tables and design a schema. For creating table effectively and easily, we will use SQLyog Enterprise Edition (30 day trial version). This software also helps to create tables and insert data. To install the SQLyog Enterprise, they need to 'click' on the download link from the vendor's website.

4.5.3 Schema Design

To design a data warehouse, they need to design a schema. There are mainly two schemas available to design a data warehouse. One is star schema and the other is a snowflake schema.

4.5.3.1 Star Schema

Star schema is the simplest form of data warehouse schema which contains one or more dimensions and fact tables. The central table of the star schema is called as a fact table. The entity-relationship diagram between dimensions and fact tables are like a star and so, it is called star schema. In this schema, one fact table is connected to multiple dimensions. In the centre of the star schema, there is a large fact table and it points towards the dimension tables. For this project, they will use star schema. Figure 1 shows the diagram of a star schema.

Figure 1. Star Schema Design of the Project

4.5.4 Schema Diagram of the Project

For the project, star schema is used to design the schema is used. The star schema will meet their goal to design the project. In this project there is a fact table which is called hospital and four dimensional tables, namely private clinic, lab, patient, and doctor.

4.5.5 Creating Tables

The next step is to create tables in the database. This project is about clinical database and so, tables are created where clinical data can be stored. The data warehouse will be residing in the hospital. From the schema design, we will find that we need to create five tables. They are a Hospital, lab, private clinic, patient and doctor. To create a table, we will use SQLyog which is a GUI for MySQL. We can easily create table using this software. It is very easy to use and improves performance. It also reduces a significant amount of time. The steps to create tables in the database:

Figure 2. The Doctor Table

Figure 3. The Patient Table

Figure 4. The Private Clinic Table

Figure 5. The Lab Table

Figure 6. The Hospital Table

5. Evaluation

The title of this research is “An Evaluation of the Clinical Data Warehouse”. So, the primary task is to evaluate a clinical data warehouse. In this process the research shows the benefits and that application of clinical data warehouse in theory and practice. The design process and the methodologies are also described. These descriptions, the authors believe, would be very beneficial for the interested readers and for those who would like to develop a clinical data warehouse.

Another aspect of this research is to find out the challenges and issues available in the present world. There are many challenges and issues. After learning all the major challenges, they tried to figure out some possible solutions. These solutions are based on their research in this field. By reading some very good articles, they have gathered huge knowledge which will be beneficial for future life. The challenges and issues and the possible solutions of those would be very beneficial for the clinical data warehouse developers.

Today data warehouse is considered as a must strategic business weapon. So using the appropriate product for the business is important. There are many vendors who provide data warehouse solutions. Some vendors provide DBMS engines while others give infrastructure solutions. The infrastructure tools typically work with database engines. There are many vendors who provide data warehouse solutions like Microsoft, Oracle, Sybase, etc. Different vendors support different attributes and features.

These vendors as well as some of the vendors who give data warehouse solutions are analyzed based on different attributes. This, it is believed that it would be a great help for the interested readers and the data warehouse developers.

To get the overall activity of a clinical data warehouse, a case study has been looked upon. This case study is based on the clinical data warehouse for academic purpose. This is based at the University of Virginia. This case study provides the design and architectural background. It also provides user statistics. This case study is of great benefit for their understanding and helpful for their implementation.

The East London Hospital Clinical Data Warehouse project prototype is implemented to depict the architectural and design aspects that are discussed throughout the theoretical work in the paper. At the same time, the ELHCDW is intended as a clinical data warehouse design solution for the interested reader. Building a clinical data warehouse in real world environment is a massive work to do. There are design steps, data modelling steps, schema design steps, architectural steps, implementation steps, etc. So designing a clinical warehouse in three months is not enough. During this research period they have gathered knowledge about the data warehouse and the clinical data warehouse. They have also learned the architectural designs and the methodologies behind it. The benefits of the data warehouse and the clinical data warehouse were also learnt.

A data warehouse extracts data from different sources, integrates them, and stores them in a centralize data repository system to support decision making. In this implementation they have tried to show this process. In the project, the proposed East London hospital has a centralized data repository database where all the data were collected from different sources like private clinic or laboratory. This data is then analysed and the doctors and administrators of the hospital could make a decision for a patient. For this prototype clinical data warehouse design, they use MySQL which one of the most popular databases available. To facilitate the creation of tables and query, they also use SQLyog which is a GUI for MySQL. This software is useful, easy to use, and saves time.

The ideas are that are given in this research are based on extensive research, prior knowledge, and good understanding of a data warehouse. Due to time constraint, implementation, development was only completed with 40% of the overall development. The rest 60% could be completed as a future work along with the possible modification and upgrades. The design and implementation of the project closely follow the rules for developing the clinical data warehouse as presented in the previous sections. Had the prototype been done completely, this prototype should be able to benefit the healthcare sectors. The interested readers and the data warehouse developers will get the knowledge of designing a clinical data warehouse. This will help them to build and develop a data warehouse in a clinical context.

Conclusion and Future Work

The paper provides a good theoretical description of data warehousing. As mentioned earlier, because of the short period of time, the project implementation was only completed with 40% of the overall project. The rest 60% could be completed as a future work along with the possible modification and upgrades. Future research could also focus on developing and extending the prototype and incorporating more features to make the prototype more applicable to the healthcare sector. Finally, in further research, for more analysis about the clinical data warehouse and to identify new challenges and issues with the activities to resolve those, is suggested.

References

[1]. Berndt D. J., Fisher J. W., Hevner A. R., and Studnicki J., (2001). “Healthcare Data Warehousing and Quality Assurance”. IEEE Computer Society, Vol.34, No.12, pp.56- 65.
[2]. Covvy D., (2000). “What is a Data Warehouse? ” Healthcare Information Management in Canada. Vol.14, No.2, pp.36-38.
[3]. Einbinder J. S., Scully K., Pates R. D., Schubart J.R., Reynolds R. E., Spraggings T. A. and Krumholz R.M., (1999). “Web-Accessible Patient data Warehouse at the University of Virginia”. AMIA Symposium, pp.12-16
[4]. Inmon W H, (1990). Building the Data Warehouse. QED Information Sciences.
[5]. Sen A. and Jacob VS. (1998). “Industrial Strength Data Warehousing”. Communication of the ACM, Vol.41, No.9, pp.28-31.
[6]. Stephen R., (1998). “Building the Data Warehouse”. Communication of the ACM, Vol.41, No.9, pp.52-60.
[7]. Kimball and Inmon, (1996). The Data Warehouse Toolkit. John Wiley: New York.