Paying attention to metadata in DW2.0

In order to be successful with second generation data warehousing (DW2.0), organizations need to pay attention to metadata. Metadata is the key to reusability of data and analysis. With metadata the analyst can find out what has already been built. Without metadata, the analyst has a hard time finding out what data structures and infrastructure have already been built.

METADATA IN DW 2.0

TI PIACE QUESTO ARTICOLO?

Iscriviti alla nostra newsletter per essere sempre aggiornato.

Metadata is loosely defined as data about data. Though this definition is easy to remember, it is not very precise. The strength of this definition is in recognizing that metadata is data. As such, metadata can be stored and managed in a database, often called a registry or repository. Metadata is a concept that applies mainly to electronic data and is used to describe the definition, structure and administration of data files and their contents and context. In a broader sense, metadata provides meaning to all enterprise artifacts, including business processes, technology platforms, and so forth.

One of the essential ingredients of the DW 2.0 environment is that of metadata. Unlike first generation data warehouses where metadata either was not present or was an afterthought, metadata is one of the cornerstones of the DW 2.0 data warehouse.

There are many reasons why metadata is so important. Metadata is important to the developer who must align his/her efforts with work that has been done previously. Metadata is important to the maintenance technician, who must deal with day to day issues of keeping the data warehouse in order. But metadata is perhaps most important to the end user who needs to find out what the possibilities are for new analysis.

REUSABILITY OF DATA AND ANALYSIS

Consider the end user. The end user feels the need for information. The need for information may come from a management directive, come from a corporate mandate, or simply come from the curiosity of the end user. However it comes, the end user ponders how to approach the analysis. And it is metadata that is the logical place to turn to. With metadata the analyst can determine what data is available. Once the analyst has determined what the most likely place to start is, the analyst can then proceed to access the data.

Leggi anche:  Analytics e hackathon per contrastare il climate change: le novità di SAS Explore 2023

Without metadata the analyst has a really hard time determining what the possible sources of data are. The analyst may be familiar with some sources of data. But it is questionable whether the analyst is aware of all of the possibilities. In this case the existence of metadata may save huge amounts of unnecessary work.

In the same vein, the end user needs to use metadata to determine if an analysis has already been done. Answering a question may be as simple as merely looking at what someone else has done. But without metadata, the end user analyst will never know what has already been done.

For these reasons then (and plenty more!), metadata is a very important component of the DW 2.0 environment.

THE LOCATION OF METADATA IN DW 2.0

Metadata has a special place in the DW 2.0 environment. There exists separate metadata for each sector of DW 2.0. There exists metadata for the interactive sector. There exists metadata for the integrated sector. There exists metadata for the near line sector. And there exists metadata for the archival sector.

The metadata for the archival sector is different from the other metadata in that the metadata for the archival sector is placed directly in the archival data. The reason for this is that over time the archival metadata must not be separated from the archival data that is being described.

Fig 1 shows the general location of metadata in the DW 2.0 environment.

 1

There is a general structure for metadata as it exists in DW 2.0. There really are two parallel structures – one metadata structure for the unstructured environment and one metadata structure for the structured environment. Fig 2 shows the high level structure for the metadata for DW 2.0.

1

For unstructured data, there are really two types of metadata – enterprise metadata and local metadata. The enterprise metadata is also referred to as general metadata and the local metadata referred to as specific metadata.

Leggi anche:  Information quality, deduplicare i dati per una governance migliore

For structured metadata, there are three levels – enterprise metadata, local metadata, and business and technical metadata. There is an important relationship between these different kinds of metadata. The best place to start to explain that relationship is at the local metadata level.

Local metadata is a good place to start because most people have the most familiarity with that type of metadata. Local metadata exists in many places and in many forms. Local metadata exists inside ETL processes. Local metadata exists inside a DBMS directory. Local metadata exists inside a business intelligence universe.

Local metadata is that metadata that exists inside a tool that is useful for describing the metadata immediate to the tool. ETL metadata is metadata about sources and targets and the transformations that take place as data is passed from source to target. DBMS directory metadata is metadata about tables, attributes, indexes, and the like. BI universe metadata is metadata about data used in analytical processing. And there are many more forms of local metadata other than these common sources of local metadata.

Fig 3 shows some local metadata.

1

Local metadata is stored in a tool or technology that is central to the usage of the local metadata. Enterprise metadata, on the other hand is stored in a local that is central to all of the tools and all of the processes that exist within DW 2.0.

Fig 4 shows that enterprise metadata is stored for each sector of the DW 2.0 environment in a repository.

1

In Fig 4 it is seen that sitting above each sector of DW 2.0 is a collection of enterprise metadata, and that all of the enterprise metadata taken together forms a repository. Actually, all of the sectors except the archival sector have their metadata stored in a repository.

IN SUMMARY

Metadata is the key to reusability of data and analysis. With metadata the analyst can find out what has already been built. Without metadata, the analyst has a hard time finding out what data structures and outputs have already been built.

There are four levels of metadata – enterprise, local, business, technical. There is metadata for both the structured environment and the unstructured environment. Archival metadata is stored directly in the archival environment. By storing the metadata directly in the physical storage of archival data, a time capsule of data can be created.

Leggi anche:  Giunge a Milano il QlikWorld 2023

What is DW2.0? In the two decades that data warehousing has been around, there has been much change. Older technologies have matured, there is new technology, and organizations have accepted Business Intelligence as a standard part of the infrastructure. Today there are many different renditions of what a data warehouse is – an active data warehouse, a federated data warehouse, a star schema data warehouse and so forth. Unfortunately none of these types of a data warehouse are the same. There is no integrity in the definition of what a data warehouse is. In addition, 1st generation data warehouses have failed to take into account many important requirements that are now recognized as legitimate aspects of data warehousing. Now there is DW 2.0 which is the definition of data warehouse architecture for the future of data warehousing.

Some of the more prominent features of DW 2.0ä include the recognition of the life cycle of data within the data warehouse; inclusion of unstructured data along with structured data inside the data warehouse; inclusion of metadata as a tightly integrated part of the data warehouse; matching of unstructured data to structured data; and the ability to seamlessly handle massive amounts of data.

————————————-

DEREK STRAUSS

E’ fondatore, CEO e principale consulente della Gavroshe USA. Ha più di 25 anni di esperienza nell’industria IT e oltre 16 anni di esperienza nei settori dell’Information Resource Management, Business Intelligence e Data Warehousing. E’ membro attivo della Data Management Association e collabora con il Data Warehouse Institute. E’ stato speaker in molte conferenze internazionali sul Data Warehousing e tiene seminari in USA, Europa e Africa. Tiene seminari con Bill Inmon con il quale ha scritto il libro “DW 2.0. The Architecture of the next generation of Data Warehousing”.

Presenterà a Roma per Technology Transfer il seminario “DW 2.0: La nuova generazione del Data Warehousing” dal 22 al 23 novembre 2010.