I’m often asked about Reference Models. What are they? Why do we need one? What benefit does it give? How do I use it? Commonly these questions are coming up when we talk about e-Science and building e-infrastructures to support research; for example around research infrastructures for the environmental sciences (ENVRI) where we have the ENVRI Reference Model (ENVRI RM). The questions come up all the time.
Why is this? I think it’s because people often don’t understand reference models (RM). When they meet them for the first time they’re overwhelmed. It’s not something most people are familiar with. Even technically literate IT people often find them abstract; saying things like “it’s not relevant to what I do”, “it’s not reflecting reality” or “it’s difficult to understand”.
In this article I want to give an introduction to RMs – what they are and why they’re useful. In particular, that is going to be in the context of one specific RM, the ENVRI RM. I want to offer a few personal thoughts about adopting RMs. I want to suggest how to engage with RMs for the first time and how, particularly to get the best out of the ENVRI RM.
So, what is an RM? A good place to start is with this Wikipedia article on reference models. Its opening paragraph explains an RM as “an abstract framework consisting of an interlinked set of clearly defined concepts produced by an expert or body of experts in order to encourage clear communication. A reference model can represent the component parts of any consistent idea, from business functions to system components, … …”. It goes on to say that an RM can “… then be used to communicate ideas clearly among members of the same community”. This then, is the essence of an RM. It’s a descriptive conceptual framework, establishing a common language of communication and understanding, about elements of a system and their significant relationships, within a community of interest. That’s particularly important when, as in the environmental research infrastructures sector that community of interest brings together significant numbers of experts from vastly different scientific and technical backgrounds to talk about building distributed ICT infrastructures.
I’m not going to go into details about the differences between reference models, reference architectures and concrete architectures, nor the details of when a model becomes an architecture. Suffice to say for our purposes that a) reference models drive the development and emergence of reference architectures, which in turn drive concrete architectures; and b) when the Engineering and Technology viewpoints are added to the already existing Enterprise (Science), Information and Computational viewpoints of the ENVRI Reference Model, then what you get is the ENVRI Reference Architecture. Note however, that despite this expected evolution we will for convenience still rather loosely describe it as the “ENVRI Reference Model”. If you want to read a bit more about these distinctions, then I can suggest resources from OASIS, for example: Reference Model for Service Oriented Architecture 1.0. OASIS Standard, 12 October 2006 and an accompanying explanatory Wikipedia page. There is a helpful explanation, using residential housing as a straightforward example.
Let’s look now at why RMs are useful. When we talk, for example about curating data it’s essential that we each have the same common understanding of what that means; of what’s included in the definition and what’s out of scope; where boundaries are and what’s on the other side of those boundaries. Boundaries in engineered systems often surround a block of functionalities for performing a particular task. Blocks exist within blocks. Boundaries imply interfaces and interfaces imply flows of information. These involve formatting (syntax) of information and the meaning (semantics) of it, as well as the protocols for exchanging that information. Information flows imply sources and sinks. Physical interfaces exist between different pieces of physical equipment. They can exist at different logical places too, especially in systems that are largely made of software and/or described in hierarchical and nested manners. They may be horizontal interfaces between blocks of equal standing. Or they may be vertical interfaces between blocks within different “levels” (layers) of a system. Curating data, for example involves more detailed elements concerned with annotation of the data, cataloguing it and transferring it. How are the interfaces between annotation, cataloguing and transfer within a data curation block (subsystem) related to other blocks (subsystems) around it? It helps when everyone has a common understanding of what interfaces exist and where; how they can be referenced and what is supposed to take place (in terms of information transmission) across them. Bugs in communication across interfaces most often arise from misunderstanding and misinterpretation of the relevant specifications at that interface. They are a major source of problems in distributed systems.
Reference models do not inhibit design freedom. Quite the contrary. They leave details of practical implementation in the hands of those best equipped to do it – innovative equipment and software designers, and systems architects with vision and knowledge of the overall goals. RMs only say “if you have such and such a block (subsystem) then you may consider that it should have these functions”. And, “if you have subsystem A communicating with subsystem B, then here is a set of interface characteristics that can apply between them”. RMs do not say “you must implement it like this”.
I have seen over many years the benefits from adopting a Reference Model approach. They’re used widely in the telecoms and defence sectors, as well as among architects of enterprise and public sector systems. All these sectors are characterised by what I would describe as “needing infrastructure at scale”. All these sectors involve multiple vendors who have to work, if not together then to a common framework of principles and concepts to bring about widespread interoperability. Just think about why it’s so easy to make a phone call to more or less anywhere on the planet, or to receive streaming video there. That is the result of using reference models and standardising interfaces.
In international standardisation work (ITU-T, ISO, CEN, CENELEC, ETSI, as well as numerous industrial fora) RMs designate the “reference points” between blocks of functionality. Knowing this is fundamental to the correct preparation, writing and application of standards. The scope of a standard derives from knowing the reference points (i.e., possible interfaces) at which the standard can apply. Understanding how different standards relate to one another is assisted by having a conceptual framework into which situate them. Think of the OSI and TCP/IP reference models, for example. Similarly, in regulatory work, knowing the reference points to which a particular piece of legislation applies has been critical to understanding the scope of the legislation, its interpretation and, if necessary enforcement.
In the research infrastructures sector we have to move to a RM oriented approach for three reasons. Firstly, so that we can achieve interoperability within and between different infrastructures. Secondly, because there are multiple players and stakeholders in the sector that have to work together and talk to one another. And thirdly, so that the sector can achieve the economies of scale within and across infrastructures that we need for attracting the attention of industry. There is, of course a role for bespoke design and development due to the unique attributes of individual infrastructures. But wherever possible, off-the-shelf capabilities should be adopted first. We can do this more easily when we have a commonly accepted conceptual foundation upon which to base procurement.
Each expert in a community brings their own set of terminology and understanding. If they’re scientists first and foremost and computer literate second then their range of comprehension is often limited by what they’ve learnt as necessity to perform their science. A common progression is to start small, designing and building something for their own use or as one part of a larger whole for a specific purpose. Even experts with principal background in computing technology and practice may be encountering distributed software and systems at scale for the first time. Exposure to projects of increasing size and complexity increases peoples skills and experience. But the converse is also true. People can find themselves underequipped to deal with complexity. With clustering of communities using different approaches and technologies the problem is magnified.
How then can these individuals engage with RMs for the first time and gain some benefit?
A Google search for the term “reference model” will turn up more than a million results. Near the top of the list is the previously mentioned Wikipedia article on Reference Models. You can also find a companion Wikipedia article on Reference architectures to help you understand the differences between the two. Dotted among the top search results you’ll probably find mentions of the OSI model (ITU-T Rec. X.200 / ISO 7498), the OASIS SOA reference model, the OGC reference model for geospatial standards and using the OAIS (Open Archival Information Systems) model for data curation, as well as many other examples. Reading the Wikipedia articles and cursory browsing of some of the different reference model descriptions gives an insight into the role of RMs and the many different forms and purposes of them. One particularly relevant Wikipedia article is the one explaining the Reference Model of Open Distributed Processing (RM-ODP). This is the basis of the ENVRI RM, as well as several other models of relevance (ORCHESTRA in OGC, LifeWatch).
Of course, even with just a basic background from the Wikipedia articles, you can get started directly with the ENVRI RM. All you need to know is self-contained there. Try this introductory overview video tutorial for example, or read: getting started with the ENVRI RM or some of the guidelines pages for using the model. Everything is more or less self-explanatory. Below I give some step-by-step practical suggestions for how you can begin to get the best out of the model, together with links into the relevant pages of the model documentation.
Start by thinking about infrastructure you’re familiar with and working with. Think about its purpose, the data it handles, the functions it performs. Step back and take an overall view. At the highest level try to match that with the sub-systems defined by the ENVRI RM. Think about which subsystems apply to your context and focus on those – one at time: data acquisition, data curation, data access, data processing and community support. It’s almost always possible to identify parts of your infrastructure that correspond to concepts in the RM. Take your architectural diagram, like the one below for ICOS and draw lines on it to delineate the different sub-systems. Look at the common functions within each subsystem to help you get an idea of the scope. Adopt the definitions.
ICOS did this and it helped them a lot to get a clearer description of their complex data streams and the related responsibilities. Werner Kutsch, Director General of ICOS told me: “We had a long process in clarifying our data life cycle and the respective responsibilities and used the RM as a tool. The data life cycle in ICOS goes through several instances run by different institutions and as you can imagine everybody has interests and also the fear to be in a worse position than the others. Breaking the whole story down to single steps and having a visualization on the table helped a lot, brought a clear picture and will definitely increase the trust during the operational phase. I was very happy to have this tool and I hope we can develop it further.”
Think about things from different viewpoints. This takes you deeper into the RM. You don’t have to do this serially. You can jump backwards and forwards between different viewpoints and develop each in parallel. A tutorial video, Understanding the ENVRI Reference Model – An Overview is available to take you through this in detail.
What is the business or science perspective (alternatively, the stakeholder perspective)? What “work” has to be accomplished by the scientists using the infrastructure? This is the science viewpoint. You can think about this in terms of the communities of people that have active roles and the roles played by various scientific instruments (passive roles). People (and instruments) can belong to multiple communities. What behaviours do they have when they play these roles? You can choose communities, roles and behaviours from the standard ones suggested by the model. Or you can define your own. If you can recognise the standard ones, this helps towards interoperability eventually.
What is the data/information that will be handled as the work is performed? This is the information viewpoint. You should think in terms of units of information (information objects) to be manipulated within the infrastructure. Typically, this is the scientific data that has to be collected, assured, identified, curated, published, processed, etc. But also, you can think about the specifications for how the data is to be acquired and what information you need to track its provenance. The manipulations (or action types) cause changes of data state associated with each information object. Every information object should have at least one action type associated with it. So, for example: a specification of measurements or observations information object will have a perform measurement action type associated with it that leads to a measurement result information object being created. That object will have the data state “raw” until another action is performed upon it. The dynamic schemata describes how your information evolves from one state to another as it is manipulated and processed, perhaps guided by specific policies.
In the information viewpoint it is important to focus on WHAT is being manipulated more than on HOW it is being manipulated. The idea is to gain an understanding of what information is important and how it evolves over its entire lifecycle. Changing your focus to be more precise on how the information will be processed or transformed at each stage (as the work is performed) is the concern of the computational viewpoint. Here you will think about the computational objects, perhaps mapping functions you already know about to them, and the interactions between them.
Lastly, you have to consider correspondences between the different viewpoints. Correspondences are defined relations between entities in different viewpoints. So, for example: If a scientist (a role in the science viewpoint) configures an instrument (a behaviour in the science viewpoint), this corresponds to a specify measurement action (action type) that creates a specification of the measurement (information object) in the information viewpoint; which itself can correspond to a configure instrument interface on an instrument controller and a (data) acquisition service in the computational viewpoint. You can infer from that illustrative description that there is also a correspondence directly between the behaviour in the science viewpoint and the configure instrument interfaces in the computational viewpoint. You can see it illustrated below. There is a comprehensive tutorial video, Main Processes of the ENVRI Reference Model – Corresponding Viewpoints available to take you through these ideas in detail.
We haven’t completely developed the engineering and technology viewpoints of the ENVRI RM yet. But this shouldn’t stop you thinking about those aspects.
Basically, the engineering viewpoint is concerned with physical distribution of the logical entities described by the information viewpoint and the computational viewpoint. It is concerned with allocation to underlying platforms and with communications. Physically distributing functions implies communication between functions. Thus, engineering is more concerned with the functional and non-functional aspects relating to the specific engineering that is needed to support the work to be performed by the scientists.
The technology viewpoint is where choices are made about the technologies necessary to implement the entities, the basic engineering objects defined by the engineering viewpoint. Here, preferences for technologies to make the work possible to perform, including standards already in use or applicable can be defined.
Alternatively, if you’re already working with a particular (sub)system, block of functionality, or component – however you define it – look to see how it fits into the RM. You can work the whole process from the bottom up, starting with what you already have to see how it supports the generic capabilities defined by the RM. How does it help the scientists to perform their work? How is it situated within the boundaries of the RM? Which subsystem does it fit into? Perhaps it straddles boundaries? What does it connect to – either practically, or ideally? Where are interfaces needed? Are they defined? Do they need to be? And if so, what standards apply there?
In conclusion, it’s a particular way of thinking. RMs are a systems modelling way of thinking that draws together all the conceptual elements and relations of a large class of very complex distributed systems. It gives us a means to cope with that complexity and to practically engineer and manage real systems. What I haven’t talked about at all in this article is the possibility to represent everything I have talked about using a modelling language like UML. This is very exciting because it means, for example that the ENVRI RM and all its concepts can be built into software engineering IDEs with all that implies for inheritance, conformance, etc.