Tuesday, March 16, 2010

Data Management

Computers and the Internet have made possible Web-based or network-based software systems for managing not only business processes but also many school processes and in fact, today, everything relating to the school environment such as attendance rosters can be handled by computer systems while also, more and more student assessment can be managed with computers, generating data results that can be used for additional software manipulation (Doe, 2009). Education data, that are simply numerical information and are gathered about the operations of the education system, are essential tools for educational decision-making (Durosaro, n.d). The current essay provides a report on the data problems that have been encountered in an educational organization in Cyprus and the measures that have been taken in order to solve these problems. In addition, this report unveils the data problems as they are related to some data quality dimensions such as accuracy, accessibility, relevance, timeliness, and completeness.

Data management systems are developing large amounts of information that can be stored, combined, and analyzed for data-driven instructional leadership, and the need for this type of information and analysis is further fueled by funding accountability as well as the demands of state and national standards, including the No Child Left Behind (NCLB) legislation (Doe, 2009). As Turban, Leidner, Mclean and Wetherbe (2008) outlined, the goal of data management is to provide the infrastructure that is necessary in order to transform raw data into corporate information that has the highest quality while also, the foundation of data management has some building blocks such as data profiling, data quality management, data integration, and data augmentation.

An important issue related to the data gathered in any organization whether that is business, industry, medicine, or education refers to data quality dimensions. Data quality is an important and crucial issue because quality determines not only the usefulness of the data but also the quality of the decisions based on the data (Turban et al., 2008). As stated by the Melissa Data Corporation (2010), “In order for the analyst to determine the scope of the underlying root causes and to plan the ways that tools can be used to address data quality issues, it is valuable to understand…data quality dimensions” (¶2). The most common and basic data quality dimensions include: accuracy, accessibility, relevance, timeliness, and completeness. “Accuracy of data is the degree to which data correctly reflects the real world object or an event being described” (Building Intelligent and Performing Enterprises Institute, n.d, ¶2) For instance, as reported by the Melissa Data Corporation, in correct spellings of the names of persons or products, their addresses or even untimely or not current data can impact operational and analytical applications.

Accessibility refers to the ease with which customers can identify, obtain, and use the information in the data products and relevance refers to the degree to which the data products provide information that meets the needs of the customers (Tupek, 2006). As supported by the Building Intelligent and Performing Enterprises Institute (n.d), the timeliness of the data is an also important data quality dimension and this is reflected in issues such as the following: the organizations and companies are required to publish their quarterly results within a given frame of time; and customers’ services need to provide customers with up-to date information. As Wand and Wang (1996) noted, “Timeliness has been defined in terms of whether the data is out of date and availability of output on time…Timeliness is affected by three factors: How fast the information system stated is updated after the real-world system changes (system currency); the rate of change of the real-world system (volatility); and the time the data is actually used” (p.8).

As Wand and Wang (1996) reported, completeness is achieved when all the necessary values for a certain variable are included, and hence a set of data is complete. For instance, in some cases, missing data is irrelevant but when the information that is missing is critical to a specific process, then completeness becomes an issue (Melissa Data Corporation, 2010). As noted by the Building Intelligent and Performing Enterprises Institute (n.d), data completeness refers to the extent to which the expected attributes of data are provided and in fact, data completeness refers actually to the expected completeness and therefore, it is possible for data not to be available but it is still considered completed as it meets the expectations of the user.

Going further, the author has interviewed a knowledge worker in an educational organization in Cyprus and has identified the data problems encountered in the organization as well as the measures taken to solve these problems. Next, follows a report on those data problems that are related to the data quality dimensions that were mentioned and discussed above as well as a report on the measures that were taken in order to solve those data problems. The educational organization faced data problems due to the fact that the gathered and collected data had not met the data quality dimensions such as accuracy, accessibility, relevance, timeliness, and completeness. In general, data problems that were encountered included and involved incorrect data, redundant data, irrelevant data, missing data, etc.

The data available at the educational organization lacked of an important data quality dimension, and that is accuracy. As a result, the organization encountered problems that involved data validity, data inconsistency, data integrity, data inaccuracy as well as concurrency problems. In specific, the educational organization’s data were created and used offline and due to the fact that these data do not go through quality control checks, the validity and hence the accuracy of the data is not assured but rather is questionable (Turban et al., 2008). In addition to this, problems that were also encountered involved the inconsistency and integrity of the data as well as concurrency problems. As it was also reported by Turban et al., the actual values across various copies of the data were not synchronized and for instance, changes in students’ information were not made in all applications in the educational organization that require this information.

Moreover, the educational organization encountered data integrity and concurrency problems, which, according to Turban et al., had as a result, the data values not to meet the integrity constraints and in fact, while an application was updating a record, another application could not access that specific record and hence it could not get the desired information. Furthermore, problems of inaccuracy occurred due to the fact that the data had not accurately represented the real-world values they were expected to model and for instance, the operational applications were affected by incorrect spellings of students’ and teachers’ names and addresses (Melissa Data Corporation, 2010). Apart from these, the problem of data inaccuracy occurred due to the fact that the head teachers had not kept and allocated the records and data for their schools accurately deliberately, in order to influence financial allocation to their schools while they had also done this because of ignorance about record keeping (Durosaro, n.d).

The educational organization encountered problems not only because of the low quality of the data regarding accuracy but also because of the lack of accessibility to the data. In specific, the problem of accessibility arose due to the data increase, poor storage and retrieval of the data, and also due to the fact that the data were redundant and scattered as well. In particular, the amount of data gathered at the educational organization was increasing rapidly due to the fact that time to time and year by year, the number of students and teachers was increasing while also, data about old students and teachers was also increasing. As Turban et al. (2008) noted, much past data must be kept for a long time, and new data are added quickly but while only a small portion of the educational organization’s data are relevant to be used for any specific application, that relevant data must also be identified and found in order to be useful.
Problems with accessibility to the data occurred also because the educational organization’s data were scattered. Due to the fact that the data were stored in several servers and in different computing systems, databases, and formats, consequently, the data were scattered throughout the organization and were collected by many individuals using various methods and devices (Turban et al., 2008). As a result, this sometimes presented difficulties in and problems with accessing some of the data. The problem of accessibility was also caused because of the poor storage and retrieval of the data. As it was also reported by Durosaro (n.d), much of the educational organization’s data were kept in files and folders and stored in drawers while also, there were no statistics units in the educational organization to help gather and store the necessary data and there was an insufficient data-based management information system, and this method of collection hindered and prevented retrieval while it also resulted in the loss of data. In addition to these, as Durosaro outlined, “The problems facing educators in the area of data storage are such that people are careless with data. People don’t preserve documents even personal documents such as pay slips, declaration of age, marriage certificates, receipts of payment made on schools fees and even certificates are being poorly kept and lost” (p.4).

Apart from these, the fact that the data were redundant also caused problems with accessibility. As it was also reported by Turban et al. (2008), the data throughout the educational organization were often out-of-date and redundant and hence, data managers faced problems in their maintenance while also, due to the fact that applications and their data files were created by different programmers over a period of time, another problem occurred that involved the duplication of the same data in various files. In addition to these, as Turban et al. noted, as a consequence of accessing data from different applications, the problem of data isolation occurred since the data were organized differently, were stored in different formats and were often inaccessible to other applications.

The educational organization’s data lacked of another data quality dimension, timeliness and this was due to some other data problems such as the non availability of the data. In specific, some important records and data that were necessary were not kept while others were poorly kept and therefore, the needed information could not be found because it had not been obtained by the educational organization and also because data have been lost due to poor storage (Durosaro, n.d). In addition to these, as Durosaro also reported, some data have been mixed up to the extent that retrieval was very difficult when required for use.
The educational organization encountered also problems regarding data security. Due to the fact that new applications were continually added to the system on an add-hoc basis, and hence, with more applications more people had access to data, security was very difficult to enforce in the file environment (Turban et al., 2008). In addition to these, as Turban et al. also reported, another problem that was encountered involved the selection of the data management tool and this was due to the large number of the products that were available while also, another problem that was encountered was that the applications were developed with regard to how the data were stored and in fact, the applications and data in computer systems were not independent but rather they were dependent on each other. Apart from these, another problem that was also reported by Turban et al. and the educational organization also encountered, involved the delegation of data-quality responsibilities to the technical teams that impacted and affected negatively the high-quality of the data.

The particular educational organization in Cyprus encountered a variety of data problems and this has had as a result the data to lack of important data quality dimensions such as accuracy, accessibility, relevance, timeliness, and completeness. In order to solve this kind of problems, the educational organization took some measures that are next mentioned and discussed. The educational organization used data and analysis in order to drive decision-making practices and in specific, it made use of an Education Data Warehouse (EDW) in order to store, manage, and analyze the data. “A data warehouse is a repository of data that are organized to be readily acceptable for analytical processing activities (such as data mining, decision support, querying, and other applications)” (Turban et al., 2008, p.100). The data warehouse gathers data into reports that help guide decision making at schools, districts, and individual student levels (Durosaro, n.d). As Mills (2008) noted, data warehouses are structured in order to facilitate data collection, management, querying and reporting for decision making. Data warehouse can be used to address issues in academic institutions regarding the effectiveness of new instructional techniques, student satisfaction, etc. (Chaplot, 2007). In addition to these, as supported by the Michigan Association of Intermediate School Administrators (2005), data warehousing is a tool that can help districts become data driven in order to meet the requirements of the No Child Left Behind legislation and allows districts to find answers and ask complex questions that uncover underlying problems, thus leading to the design of data driven student achievement and school improvement strategies. As Chaplot supported, the fundamental goal of the data warehouse is to support strategic planning, modeling and forecasting at the organizational level while also, it must fulfill the need for knowledge for an area of uncertainty or growth in the organization and therefore, in order to achieve this goal, the data warehouse must provide a comprehensive and consistent view of the organization.

As Turban et al. (2008) noted, data warehousing makes it easier and faster for organizations to process, analyze, and query data while it also provides for improved analytical processing which involves analysis of accumulated data and it includes decision support systems, data mining, Web applications, enterprise information systems, querying, etc. As Mills (2008) stated, “A successful and sustainable data warehouse can be an important contributor to a district’s ongoing success” (¶9) Additionally, effective data warehousing can help create a meaningful relationship between information technology and organizations, thus facilitating enterprise-level strategic planning and growth as well (Chaplot, 2007). A data warehouse may include data such as: personnel data, student demographics and achievement data, financial data as well as assessment data (Michigan Association of Intermediate School Administrators, 2005).
Moreover, as Chaplot noted, a data warehouse can address various phenomena and issues such as the following three: (a) how can instruction be modified in order to help students learn to write more effective essays; (b) are students who attend classes full-time more likely to succeed academically than those who take classes on a part-time basis; and (c) what kind of training is necessary for new employees?

As Chaplot (2007) reported, a data warehouse has four main components: operational systems of record, the data staging area, the data presentations area, and data access tools, and each of these components serves a unique function in preparing data for manipulation and examination. According to Turban et al. (2008), a data warehouse has the following eight characteristics: (a) organization, where the data are organized by subject and include information relevant for decision support; (b) time variant, where the data are kept for many years in order to be used for trends, forecasting, and comparisons over time; (c) consistency, where the data in various databases may be encoded differently; (d) integration, where data from various sources are integrated while also integration is supported by the use of Web services; (e) real time, where it is possible to arrange for real-time capabilities despite the fact that most applications of data warehousing are not in real time; (f) nonvolatile, where the data are not updated once entered into the data warehouse; (g) web-based, where data warehouses are designed to provide an efficient computing environment for web-based applications; and (h) relational, where the data warehouse uses the client/server architecture to provide the user an easy access to its data.
The educational organization as noted previously, faced various data problems such as inaccurate and inaccessible data, irrelevant data as well as problems regarding the security of the data, the completeness of the data, etc. In order to solve this kind of problems and hence provide for a data management, the educational organization decided to use an Education Data Warehouse (EDW). In addition to solving these particular data problems, as Turban et al. (2008) noted, this kind of data management will help the educational organization ease the burden of maintaining data and will enhance the power from their use while also, it will be able to support easy data access and quick, accurate and effective decision making. Apart from these, the aim of using an Education Data Warehouse is for better management and resource allocation decisions to flow if information can be made available (Durosaro, n.d).

In order to solve the data problems related to education, the educational organization has built and used an education data warehouse. The education data warehouse is a place that helped the organization to easily view and analyze the data collected from multiple data sources and a key support to data-driven decision making (Sanders, Romond & Ferrara, n.d). In specific, the education data warehouse was created in order to unify data collection efforts and to allow the organization to conduct trend analyses and track students and teachers to evaluate programs (The Center for Teaching Quality, n.d). As it was also supported by the Florida Department of Education (2005), the education data warehouse integrates existing, transformed data extracted from various sources that are available at the state level and it provides a single repository of data concerning students served in the public education system as well as educational facilities, curriculum and staff involved in instructional activities.

The education data warehouse has the following characteristics, as reported by the Florida Department of Education (2005): it allows longitudinal analyses, ensures confidentiality, includes historical and current data, is student-centric, and has state-of-the-art analytical capabilities. The data sources include student demographics and background, progression through grades, staff data, basic teacher data such as demographics, addresses, certification data, instructional activities, teacher salary and compensations, licensure test scores, degrees earned, course assignments, licensure status as well as endorsement areas and these data about the teachers are extracted from the databases of the Department of Education (The Center for Teaching Quality, n.d). In addition to these, as it was also reported by the Florida Department of Education, the education data warehouse includes student courses taken, enrollment, test scores, financial aid, awards, educational curriculum as well as educational institutions.
At the education data warehouse, each teacher record is assigned a different unique ID and each year new assignments of IDs are checked using ethnicity, name, and birthday while also, the warehouse includes data on teacher demographics, current work, and certification data with information on preparation (The Center for Teaching Quality, n.d). Additionally, as noted by the The Center for Teaching Quality, the warehouse includes individual records from public higher education institutions that allow tracking of teacher preparation graduates to work in the schools, including information on their coursework.
The education data warehouse was chosen as a solution to the data problems that the educational organization was facing due to the following three benefits it provides, as they were also reported by the Florida Department of Education (2005): (a) it provides capabilities to perform trend analyses and to track students and teachers over time and across delivery systems; (b) it allows the users of the educational organization to run their own queries against summarized data in a timely and efficient manner; and (c) it provides decision-makers with tools and information necessary to make informed, fact-based decisions about education.

Apart from these, the education data warehouse stores information on student achievement and outcomes from various sources that help teachers and administrators to better serve every student in the school and in fact, these data allow for the continual and longitudinal tracking of individual student achievement (Edudata Canada Team, 2005). In addition to these, as noted by the Edudata Canada Team, the education data warehouse provides students’ data that help identify specific learning strengths and weaknesses and help develop strategies to address them and this is achieved by identifying students who need additional help and by informing teachers about student specific weaknesses across the elementary instruction while also, information about individual students from elementary schools can be easily shared with secondary schools through the data warehouse. As Doe (2009) stated, the warehouse forms a longitudinal history from the data that can provide insights into student achievement and educational effectiveness. Furthermore, as Doe supported, users can query the data warehouse in order to compare different types of data such as assessment scores according to demographics while also, the warehouse enables the evaluation of the same kinds of data for various reasons.
The use of the education data warehouse for consolidation of information helped the educational organization in Cyprus, as it was also reported by the Microsoft Corporation (2010), to: manage information in its many forms such as documents, sensor data, imagery, etc.; access information where it is useful such as from desktop to mobile devices; share information with the people who need it within and across the organization; and to secure information for different users in various operations. The education data warehouse, as outlined by the Microsoft Corporation, “provide[s] the technologies that academic institutions need to contain proliferation, manage data in all its divergent forms, and make information easy to access and use” (¶12).

The current essay provided a report on the data problems encountered in an educational organization in Cyprus and the measures taken in order to solve them. The data problems that were mentioned and discussed previously, occurred due to the fact that the organization’s data lacked of important data quality dimensions such as accuracy, accessibility, relevance, timeliness, and completeness. In specific, the data problems that were encountered in the educational organization included the following: data increase, data validity, data integrity, data inconsistency, data security, data redundancy, non availability of data, inaccuracy of data, poor storage and retrieval of the data, problems regarding data accessibility, concurrency problems as well as other problems that were presented due to the fact that the data were scattered. In order to solve this kind of data problems the educational organization made use of an education data warehouse that integrates and stores education data from multiple sources in various methods in order to support organizational decision-making (Chaplot, 2007). The use of an education data warehouse was found to be effective and helped the educational organization to manage the data and solve the data problems that were encountered. Closing up, as Chaplot outlined, a data warehouse “provide[s] users access and control to a wide variety of centralized and formatted data to choose the best course to action and support…decisions. Users can manipulate and customize the data to support specific queries that will enable positive changes at various…levels. Since the various stages increase data accuracy and integrity, complex queries can be conducted with a strong sense of confidence” (p.5)

References
Building Intelligent and Performing Enterprises Institute (n.d) Data Quality
Definition-What is Data Quality? Retrieved January 19, 2010, from http://www.bipminstitute.com/data-quality/accuracy-consistency-audit.php

Chaplot, P. (2007) An Introduction to Data Warehousing. Retrieved January 17,
2010, from http://www.mtsac.edu/administration/research/pdf/tips/DataWarehouses.pdf

Doe, C. G. (2009) A Look At…Data Management and Analysis Systems. Retrieved
January 18, 2010, from http://www.mmischools.com/Articles/Editorial/Features/A-LOOK-AT...-Data-Management-and-Analysis-Systems-59877.aspx

Durosaro, D.O. (n.d) Problems Confronting School Personnel in Educational Data
Collection, Analysis and Storage. Retrieved January 16, 2010, from http://www.kwsubeb.com/data-collection-collation-analysis/PROBLEMS_CONFRONTING_SCHOOL_PERSONNEL_IN_EDUCATIONAL_DATA_COLLECTION_ANALYSIS_AND_STORAGE.pdf

Edudata Canada Team (2005) Data Warehouse. Retrieved January 18, 2010, from
http://edudata.educ.ubc.ca/exampleproject/NorthVan/datawarehouse.htm

Florida Department of Education (2005) Education Data Warehouse Fact Sheet.
Retrieved January 20, 2010, from http://edwapp.doe.state.fl.us/EDW_Facts.htm

Melissa Data Corporation (2010) 6 Key Quality Dimensions. Retrieved January 17,
2010, from http://www.melissadata.com/enews/articles/1007/2.htm

Michigan Association of Intermediate School Administrators (2005) Data
Warehousing in Michigan Schools: Executive Summary. Retrieved January 20, 2010, from http://michiganedusource.org/Technology/DataWarehousingSummary.pdf

Microsoft Corporation (2010) Server Consolidation and Data Warehousing.
Retrieved January 19, 2010, from http://www.microsoft.com/education/solutions/datamanagement.aspx

Mills, L. (2008) Getting Started with Data Warehousing. Retrieved January 19,
2010, from http://www.schoolcio.com/showarticle/1048

Sanders, D., Romond, B. & Ferrara, J. (n.d) Vermont’s Education Data Warehouse &
Analyzer. Retrieved January 23, 2010, from
http://www.setda.org/c/document_library/get_file?folderId=23&name=Vermont+Education+Datahouse.pdf

The Center for Teaching Quality (n.d) Florida TQ Data Landscape (K-20 Education
Data Warehouse). Retrieved January 18, 2010, from http://www.teachingdata.org/pdfs/cpre_data_fl.pdf

Tupek, A.R. (2006) Definition of Data Quality. Retrieved January 21, 2010, from
http://www.census.gov/quality/P01-0_v1.3_Definition_of_Quality.pdf

Turban, E., Leidner, D., Mclean, E. & Wetherbe, J. (2008) Information technology
for management: Transforming organizations in the digital economy (6th ed.). New York, NY: John Wiley & Sons.

Wand, Y. & Wang.R.Y. (1996) Anchoring Data Quality Dimensions in Ontological
Foundations. Communications of the ACM, 39(11). Retrieved January 17, 2010, from http://web.mit.edu/tdqm/www/tdqmpub/WandWangCACMNov96.pdf

No comments:

Post a Comment