2.4 : Modern trends in data modeling and databases

written by Oleg Shilovitsky

For many years, Relational Database (RDBMS) was the only possible data management technology used by PLM vendors. It was driven by enterprise IT requirements as well as demanded maturity and reliability of RDBMS.PLM vendors were relying on mainstream relational data management technology to develop PLM solutions. All PLM systems available today are using one of the following RDBMS – IBM DB2, Oracle, Microsoft SQL server and few others. You can find few newest PLM system developed using MySQL or Postgres.

Modern web and open source technologies changed data management landscape. The last decade of web development create new fast growing data management solutions. New web architectures, cloud and open source adoption made possible to get out of the closed loop of IT approved database systems. PLM vendors need to wake up and check how new technological development in database and data management can provide a competitive advantage or improve existing PLM solutions.

Data modeling and abstraction levels

The following diagram I captured on DATAVERSITY blog is a good demonstration of different levels of data management and data models – The Data Model Pyramid

data-model-pyramid-1

picture credit Steve Hoberman

Clearly, two top levels – Business Subject Area model and Application Subject area model represent a specific set of data models required for any database driven solution. PLM is not an exception from the rule. However, high level of diversity in product development and manufacturing brought software vendor to develop their own tools for data modeling, which relies on the set of private data-management tools and abstractions. I found the following passage from Steve Hoberman post interesting:

There are dependencies between the different types of data models shown in the pyramid, between data models and other artifacts or models that represent other aspects of business and requirements, the enterprise and solutions architecture, and application design. The activities required when producing and managing data models are only part of a wider set of business and technology activities; integration with associated activities is key to the success of data modeling.Without a tool that provides specialized support for data modeling, the data modeler cannot hope to work effectively in this environment.

The effectiveness of data modeling tool is important element of every PLM system. It includes core modeling, usability, integration, collaboration, management and communication. It made me think about the future of PLM data modeling tools. For many years, PLM data modeling tools relied on proprietary technologies. Are we going to see some sort of unification and standardization of tools to deliver a variety of BSAMs and ASAMs? The key unsolved problem is the ability to populate and maintain multiple BSAMs tailored to specific business needs.

PLM was long time relying on private tools to manage and operate with data modeling delivered by vendors. I believe future of data modeling will provide a shift towards more openness in tools and, as a result of that, a shift towards faster data model tailoring, customization and efficiency.

Diversity of data management options

The core fundamental part of every PDM/PLM application is database and related data model. The history of data modeling is going long back to applications with proprietary data models. The cornerstone moment was introduction of RDBMS. Not many of us remember, but the original assumption of RDBMS inventors was to provide a way to make data model transparent and accessible to programmers. Codd’s vision was to make programmers to code against fixed data base schema. Take a look on the original Codd’s paper back in 1970 – A Relational Model of Data for Large Shared Data Banks. Here is my favorite passage:

Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information.

What Codd said about internal data representation back in 1970 is true now for many product development solutions. The end of quote emphasized the reality of many data-driven solutions today and PLM systems are among them. The complexity of solutions, diversity of requirements, mergers and acquisitions, application upgrades – this is a short list of situations when you data model and underline code is going to change.

RDBMS and Dynamic Data Models

The original introduction of RDBMS assumed definition of static data model. The reality of many solutions development introduced a new concept – dynamic data models. Dynamic data modeling is what most of PDM/PLM solutions have today. I can hardly name a PLM system that doesn’t apply at least some elements of dynamic data models. The ability of data customization is on the short list of every prospect and advanced customer.

RDBMS vs. NoSQL

Dynamic data model was a technique proposed by many developers of PDM and PLM solutions for the last 15-20 years. New data management solutions grew up for the last decade out of massive web and open source development. Branded under broad name of ‘noSQL’ these days, these solutions provide an alternative way to manage data. Some of new noSQL data models are more flexible and can allow changes of data models in much easier way. Programmers are not restricted to define data model schema before start coding, which can speed up development process and solution delivery.

Data Driven Business Processes

Business processes are driven by data in an organization. Organizational changes, M&A, application upgrades, diversification of supply chain and 3rd parties data, social data, internet logs, internet of things (machine produced data) – this is only a very short list of data sources modern PLM systems need to manage. The diversity of data sources create even bigger demand for flexible PLM solution than before.

Flexibility is one of the fundamental requirements for any PLM system. A growing number of data related business processes will push your data models will be in a perpetual non-stop flux. Majority of PLM providers built their flexibility around customization of RDBMS schema. Most of these technologies are 15-20 years old. New data modeling approaches will be coming from open source and web to solve the needs of future data modeling and data-related processes.

PLM data management challenges

Last decades of PLM development create a stigma of PLM as costly, complicated, hard to implement and non-intuitive. For the last few years the you can hear many voices about the need to simplify PLM and make it more user friendly.

In fact, problems of non-intuitiveness of PLM systems are largely coming from core database management technologies. Most of PDM/PLM software is running on top of data-management technologies developed and invented 30-40 years ago. The RDBM history is going back to the invention made by Edgar Codd at IBM back in 1970.

relational-database-technology

I was reading Design News article – Top automotive trends to watch in 2012. Have a read and make your opinion. One of trends was about growing complexity of electrical control units. Here is the quote:

As consumers demand more features and engineers comply, automakers face a dilemma: The number of electronic control units is reaching the point of unmanageability. Vehicles now employ 35 to 80 micro-controllers and 45 to 70 pounds of onboard wiring. And there’s more on the horizon as cameras, vision sensors, radar systems, lane-keeping, and collision avoidance systems creep into the vehicle.

It made me think about potential alternatives. Even if I cannot see any technology these days that can compete on the level of cost, maturity and availability with RDBMS, in my view, now it is a right time to think about future challenges and possible options.

Key-Value Store, Columnar, Document

These types of stores became popular over the past few years. Navigate to the following article by Read Write Enterprise – Is the Relational Database Doomed? Have a read. The article (even if it a bit dated) provides a good review of key-value stores as a technological alternative to RDBMS. It obviously includes pros and cons. One of the biggest “pro” to use key-value store is scalability. Obvious bad is an absence of a good integrity control.

The definition and classification of noSQL databases is not stable. Before jumping into noSQL bandwagon, analyze the potential impact of immaturity, complexity and absence of standards. However, over the last 1-2 year, I can see a growing interest into this type of technology.

Semantic Web and Graph Databases

Semantic web (or web of data) is not a database technology. Opposite to RDBMS, Key-value stores and graph databases, semantic web is more about how to provide a logical and scalable way to represent data (I wanted to say in “semantic way”, but understand the potential of tautology :)). Semantic web relies on a set of W3C standard and combines set of specification describing ways to represent and model data such as RDF and OWL. You can read more by navigating to the following link.

I created the following diagram few years ago to show the applicability of different data management technologies for Product Lifecycle Management. The full slide deck is here.

The following table can give you an idea how to compare different database technologies:

plm-database-tech-spec

I think, the weak point of existing RDBMS technologies in the context of PLM is a growing complexity of data – both from structural and unstructured aspects. The amount of data will raise lots of questions in front of enterprise IT in manufacturing companies and PLM vendors.

Manual PLM data modeling is a thing in the past

One of the most complicated parts of any PLM implementation is data modeling. Depends on PLM vendor, product and technology, the process of data modeling can be called differently. But fundamentally, you can see it in any PLM implementation. This is a process, which creates an information model of product and processes in a specific company. To get it done is not simple and it requires lot of preparation work, which is usually part of implementation services. Even more, once created data model needs to be extended with new data elements and features.

Is there a better way? How other industries and products are solving similar problems of data modeling and data curating. It made me think about web and internet as a huge social and information system. How data models are managed on the web? How large web companies are solving these problems?

One of the examples of creating a model for data on the web was Freebase. Google acquired Freebase and used as one of the data sources for Google Knowledge Graph. You can catch up on my post why PLM vendors should learn about Google Knowledge Graph. Another attempt to create a model for web data was Schema.org, which is very promising in my view. Here is my earlier post about Schema.org –The future of Part Numbers and Unique Identification. Both are examples of curating data models for web data. The interesting part of schema.org is that several web search vendors are agreed on some elements of data model as well as how to curate and manage schema.org definitions.

However, it looks like manual curating of Google Knowledge Graph and Schema.org is not the approach that makes web companies to feel happy about and leapfrog in the future. Manual work is expensive and time consuming. At least some people are thinking about that. Dataversity article “Opinion: Nova Spivack on a New Era in Semantic Web History” speaks about some interesting opportunities that can open a new page in the way data is captured and modeled. He speaks about possible future trajectories of deep learning, data models and relationships detecting. It can extend Schema.org, especially in the part that related to automatically generated data models and classifications. Here is my favorite passage:

At some point in the future, when Deep Learning not only matures but the cost of computing is far cheaper than it is today, it might make sense to apply Deep Learning to build classifiers that recognize all of the core concepts that make up human consensus reality. But discovering and classifying how these concepts relate will still be difficult, unless systems that can learn about relationships with the subtly of humans become possible.

Is it possible to apply Deep Learning to relationship detection and classification? Probably yes, but this will likely be a second phase after Deep Learning is first broadly applied to entity classification. But ultimately I don’t see any technical reason why a combination of the Knowledge Graph, Knowledge Vault, and new Deep Learning capabilities, couldn’t be applied to automatically generating and curating the world’s knowledge graph to a level of richness that will resemble the original vision of the Semantic Web. But this will probably take two or three decades.

This article made me think about the fact manual data curating for Freebase and Schema.org is a very similar process to what many PLM implementers are doing when applying specific data and process models using PLM tools. Yes, PLM data modeling happens usually for a specific manufacturing companies. At the same time, PLM service providers are re-using elements of these models. Also companies are interconnected and working together. The problem of communication between companies is painful and still requires some level of agreement between manufacturing companies and suppliers.

Data modeling is an interesting problem. For years PLM vendors put a significant focus how to make flexible tools that can help implementers to create data and process models. Flexibility and dynamic data models are highly demanded by all customers and this is one of the most important technological element of every PLM platform today. New forms of computing and technologies can come and automate this process. It can help to generate data models automatically via capturing data about what company does and processes in a company. Sounds like a dream? Maybe… But manual curating is not an efficient data modeling. The last 30 years of PDM/PLM experience is a good confirmation to that. To find a better way to apply automatic data capturing and configuration for PLM can be interesting opportunity.

PLM Book

Social Links