When we talk about master data, we often refer to a set of data that appear in multiple transactions and documents. Examples are data defining customers, suppliers, products, parts, prices, etc. Naturally, since this data are used in various processes' inputs and outputs, their accuracy and completeness is very critical. Master data, like any other, changes over time. It gets created, modified, deleted, transferred from one data repository to another, and is used as reference in multiple reports. Master data is usually spread across many systems that can author it, exchange it and delete it. Thus, our chance of fully controlling the use of master data across all processes is very slim.
We often take it for granted that within a single system we can always perform any inquiry on the use of any data element. Suppose that any data field defined within a transportation management system can ask the system "What am I?". The system should be always able to answer something like "You are a string value field '3' in the table '342'". Better option would be that the system comes back with "You are a system defined attribute called 'address number' in the entity called 'address'." Best would be to have an answer like "You are a 'street number' for the 'ship to address' of the final 'delivery point' on all internal 'transportation orders' and '3rd party delivery orders', but not on 'drop ship orders' because I don't handle those." Which one of these answers do you think a legacy system will most likely give back? I'd say first. Which answer would an ERP or a specialized logistics system give back? If we are lucky and this is a system defined field, second is the best we can hope for. Can we expect any system to give back the third answer? No. Because that system does not really know that the street number in a ship to address of a customer is "master data" and does not know where else in the processes outside of that system is it being used and how.
Examples like this abound in all process domains. In product development we have many entities in many systems representing requirements, functions, parameters, components. Same is with order items, suppliers and contracts in sourcing and procurement. Lead times, resources, capacity, inventory in supply chain... Is it really worth making sure that all these elements represent exactly the same thing? Is it that critically important that they all have a single source of definition? Common belief is YES, but I have not seen any analysis defining all possible uses of master data across all systems and considering all implications (preferably in monetary terms) on the integrity of transactions and decisions they support. Simply because the moment such an analysis is complete and accurate, the problem is probably solved. So, why are we so much concerned with what we call master data?
The main reason is fear of loosing control. But haven't we already lost it? Our processes are not well defined to even understand all occurrences of "master" data within all inputs and outputs requiring them. Our systems are so much proprietary and closed that we do not even dare to question accuracy and adequacy of the meaning of the data within them. We simply close our eyes to the fact that we are really not masters of the master data. Consequently, we cannot really govern it, we cannot control it, we cannot influence its evolution. We can only hope for a comfort of knowing that all master data reside in a single system where we can at least see the field within a table where it is defined, regardless of its true semantic value to support all inputs and outputs of all processes that may need it.
Well, let's imagine a different approach. One where there is no master data, just data. And where data is not tied to any specific system's definition of what it is, but to our preferred understanding (even without a full agreement) on what are all the things that that data may represent. Let's then get back to our previous example. The data field asks the system within which it has been defined "What am I?" The system says: " I think I know what you are, but let me ask the federation registry." Then the system asks the federation registry:"I have the data that has been defined as field '3' in table '342'. Have I ever exchanged that data with you in any of the service calls?" The registry says "Nope." Then the system goes back with an answer :"It does not matter, just stay there and be happy." Let's say the registry comes back and says:"Yes, you have send me 35,564 calls with that field, and in 31,504 of cases it has been used in the 'Transportation Order' informational reference document, that had undergone 3 versions since the first use and where the latest definition of the field was the 'street number' of the 'ship to address'. I have also seen it in 4,060 cases in a cloned informational reference document called 'Expediting Drop Ship Order', that had only one version and where the field was the 'street number' of the 'ship to address'." The system comes back to the field and says: "Your are a 'street number of the ship to address'." There may be different kinds of inquiries and semantic searches returning all sorts of results like this once the data crosses processes and system domains.
The example above follows the utility approach, since it tells us that "it does not matter who you think you are, it only matters what people think they can use you for." Same goes for master data. The only way to really control it is to understand all different meanings in all contexts within which the data adds value to information processing. And at the point of placing the informational reference between the content (data fields in a document type, e.g. 'street number of a ship to address') and context (input/output in the process flow using that document type, e.g. 'Transportation Order' or its clone 'Expediting Drop Ship Order') is where we need to capture that link. That is how decoupled content and context work in SOA process modeling. We have written about this extensively in our FERA research (http://www.cpd-associates.com/index.cfm?content=subpage&file=include_RPPage.cfm&ID=72404138&DOC=194759508) and this approach has been a basis for the SOA-IM standard offered by VCG (http://www.value-chain.org/).
Semantic approach is based on modeling the business process using explicitly defined meaning of each building block - activity and input and output like in VRM (see link above). Then, referencing each I/O with the document types that represent its content, like in SOA-IM (see link above). Only then, makes sense to point document types to the web services required for populating their content with data from federated systems. Each system can consequently retain its internal meaning of the data independently from other systems. Yet, they can all inquire about all different uses of their data within the context of the connected process. Thus, "master" data really becomes data that has multiple uses and for each use a specified meaning and a preferred source. In semantic approach, we can even dynamically decide what source and what version to use given the context within which we want to use it. Orientation of the users relative to data and the process is achieved by decoupling content and context and allowing for semantic reconciliation of data elements between different activities. It is in our best interest to master all our data. It is not in our best interest to pretend that we can manage master data only to eventually apply governance that reduces its reach and limits its use.