Monday, June 4, 2012

The Big Data Value Chain

So Big Data systems are different in kind from 'traditional' information systems, and are motivated by the need for large volumes of information with little latency. What do these systems look like?

You could be forgiven for thinking that Big Data systems are all similar. In fact, 'Big Data' is actually a family of complimentary systems, each of which fill a different role in a Big Data 'value chain'. Which one you select for your application depends on the specific problem you are trying to solve. Let's familiarise ourselves with this Value Chain. In later posts we can probe deeper into each individual stage:
Big Data Value Chain
Big Data Value Chain
Reading left to right -- fairly normal! -- we can see the Value Chain comprises 9 steps:
  1. Collection: getting the data we will use to create our information.
  2. Integration: combining data from multiple sources and applying context to the data: date & time, source, name, quality, etc so that we can use it.
  3. Modelling: applying a model of the real world the data came from -- a factory, a patient, a web-site, a social media service, with all the meaning and behaviour that goes with this context -- so that the disparate pieces of data we collected becomes information about the real-world environment we are interested in and which has value to us.
  4. Analysis: making judgements about what is happenning in the real world. This might be defect rates vs targets, bottlenecks, comparisons of physiological parameters to population norms, statistical trends, free to paid user conversion rates, depending on the context.
  5. Presentation: remember that time -- Latency and Freshness -- are critical to the value of a Big Data system. In many Big Data applications, it is critical that information is pushed at those relying on it as it is created, or at the very least is available to them 'immediately' they request it. There is always a time attribute to information with value, so we will be saying a lot more about this.
  6. Storage: once we have created the information we need, we need to preserve it in as rich a way as possible so that we can come back to it over time. Many discussions of Big Data only start at this point, as if the information came into being spontaneously! As we will see, some systems create information as a natural by-product. Many do not, and the Big Data practitioner, or 'Data Scientist' will find that those that do rarely present the complete modelling and analysis that their Users require by accident.
  7. Integration: once again, even when the information required has been safely stored, it may not all be in the same store. Information required for historical analysis often needs to be integrated together from multiple stores.
  8. Finally we get to Historical Analysis and Reporting. We separate these because altough superficially similar, they are in fact quite different: a Report being a way to present information in the same way repeatedly, which is of use to multiple Users, while Analysis is typically more complex manipulation, preformed by an individual to explore the information and gain new insights.
I hope this has been a useful survey of the Big Data Value Chain. As  I said, we will explore the different stages, their requirements and implications in more detail over the course of this Blog.

Tuesday, May 8, 2012

What is Big Data?

Big Data is a wonderful term, but what does it mean? Many IT professionals will say that they have been providing systems which capture, analyse and present data -- often in very "Big" quantities -- for years. What's new? I think the most useful definition of a Big Data system is a very pragmatic one: you need a Big Data system to capture, analyse and present information when traditional Relational Database centered systems cannot meet your Users' requirements. The reason Big Data is such a current topic right now is because so many businesses are running up against these limits and being forced to look beyond them. Why might this be? Here are the three key drivers, in my opinion. We will look at each in much more detail over the course of this blog:
  1. First and foremost is time. Time comes in two flavours: Latency -- how long it takes to get your data -- and what I'll call "Freshness" -- how up-to-date it is when you get it. If you cannot meet your Users' time expectations with traditional technologies, then you are into the realm of Big Data. These expectations are becoming more demanding, and data volume, which  always costs time, is growing by orders of magnitude. We will talk a lot more about time in this blog.
  2. Second is flexibility. The need to be flexible and adaptable can overload a traditional approach to data systems in three ways: Variety of Sources, Variety of Structures and Future Uncertainty. A common requirement for Big Data systems is that they collect their data from an enormous variety of sources. This data comes organised in a similarly large variety of structures or "schemas". While it is possible to come up with ingenious ways to get data from many sources into relational databases, these databases do not take kindly to data which does not conform to a predefined structure, or schema. Unfortunately, Big Data systems are often required to accept data coming in a huge variety of structures, and often have to deal with new, previously unanticipated, data structures and analysis requirements as time goes on.
  3. Third is complexity. What we are talking about here is the ability to model complex real-world behaviour and perform complex analyses, often in real time. Not something talked about a lot in the world of Big Data at the moment, but I believe will become more and more important as the applications of these systems become more sophisticated, and again, not something relational database technology, with its focus on selection of lists and relatively simple aggregations -- average, max, count, etc -- is ideally suited for.
So in summary, Big Data systems are by definition different in kind from 'traditional', relational database centric systems, and are driven to be so by User requirements which beyond certain limits of volume, structure and sophistication, are not deliverable by this 'traditional' technology. So if you are an IT professional, and your User or Customer is asking for something not achievable with your existing tool-set, it is not because they are wrong, it's because they are asking for Big Data!

Tuesday, May 1, 2012

Big Data

I have been quiet on the Fraysen blog for some time now, but very busy in the 'real world!' While we have been working away with Clients and on new product development, the market for the software Fraysen develops has aquired a name: "Big Data." I have spent some time thinking about how I can best contribute to the Big Data discussion. It is a very broad term, spanning a value chain of systems which can overwhelm and confuse, so my first goal will to try to bring some clarity and simplicity to the conversation, give you some reference points, and a context to understand it in. I hope you find it useful.

Friday, April 2, 2010

OEE: how to approach root cause analysis?

An interesting question was posted on the LinkedIn MES - Manufacturing Execution Systems group recently by Elisa Rocca of Siemens: "OEE: how to approach root cause analysis?" She goes on to say that "I was wondering how to best execute the root cause analysis for a filling & packaging line." and later in the discussion that:

"I would like to decouple the RCA investigation from levels 1 and 2 [PLCs and SCADA systems directly controlling the machines - FG], conducting the entire analysis into MES.

...  I'm wondering whether you see the RCA as an "online" analysis performed during the acquisition of OEE data, or as an "offline" analysis done on historical data using reporting tools"


My Response:

The challenges with root cause analysis of any live process metric, including OEE, are:

1. A priori you do not know what data you are going to require to solve the problems that exist, since you do not know what these problems are!

2. You will have a range of users for this RCA, from Operators and Supervisors, who's needs will be near real time, to Engineers, who will perform their analysis mainly historically.

3. You will probably need to combine 'hard' data from your equipment with judgement / big picture input from humans in the process.

4. You need to ensure that the effort required to perform the RCA is minimized: the more work required, the less likely it is to get done, and the greater the opportunity for confusion and arguments!

What does this suggest?

You are right to decouple your RCA from levels 1 & 2: they are great for machine management, but are too inflexible, machine focused and simplistic (analytically) to provide the solution you need. Similarly, ERP (in a manufacturing context) and MES systems are focused on dispatching, routing and recording production, rather than analysing the performance of the process.

Your requirements are the motivation for Enterprise Manufacturing Intelligence systems. I would suggest looking for EMI systems and rating them by the following attributes:

1. Flexibility -- could a Process Engineer configure their own analytics?

2. Scalability -- you will rapidly scale to large numbers of data sources and volumes of data. Is the system architected to cope with this?

3. Analytical Power -- you are going to need to analyse your process along many axes: machine, product, batch, shift, material, etc, and to perform sophisticated analyses to correlate causes and effects. Does the system provide the power you need in its tool set?

4. Ability to combine data from many sources, including friendly interfaces for human operators. Is there an easy way to capture human input and combine it with machine data?

5. Automation -- the system should do 80% of the analytical work automatically, especially the 'drudge' number crunching that does not require human judgement, otherwise this will not get done at all! How is this handled, and how long does the user have to wait for the results? Again, if the answer is >> a minute, the User is unlikely to wait, no matter how useful your data!

See here for the full discussion.

Monday, November 23, 2009

Impressive Opportunity from a Small Experiment

Just published an analysis on the Fraysen Systems website today called "Material Flow Co-ordination Case Study" which gives the results -- with their agreement, of course -- of a quick opportunity analysis we did for one of our Clients. What is striking is:
  1. The amount of information that we were able to collect from a simple experiement;
  2. The size of the improvement opportunity we uncovered together.
There are huge cost reduction opportunities in many businesses today. Sometimes they are 'traditional' OEE improvement projects, sometimes they are a little bit different, in this case just making sure all the parts arrive at the same time. The other nice thing is that this opportunity doesn't even require a plant wide solution. A point application of the right Information Technology can make a world of difference!

Thursday, November 5, 2009

There is a Gap between Operational Excellence and Business Reality

Why do Managers not rush to put into practice every recommendation by their Lean Consultants, or to install Manufacturing and Business Intelligence systems? There is plenty of data to suggest that in general these methods and systems produce outstanding returns, and this is backed up by detailed analyses of selected leading companies.

I think there are just too many "IF"'s between the proposition and the return. In essence, when a Lean Consultant presents their report, they are saying "if you invest management time in changing your organisation's behaviour, and if you get your people to behave in this way, and if the consistently apply these methods even if it appears more expedient in a crisis to go back to the old ways, then you will reap the benefit of my advice." Similarly vendors of Operational and Business Intelligence tools are saying "if we pick the right KPIs (Key Performance Indicators) to monitor, and if you make monitoring these KPIs part of your people's management process, and if they respond quickly and appropriately when they are alerted to an issue, and if they carry the action through to closure, then our system will deliver a return on your investment."

Think about it: in a world where at least on the face of it a new machine will increase your throughput because it's advertised cycle time is 20% lower, or you can save millions in labour costs by moving to China because they earn 1/10 of people in the West, it is asking a lot of your Client to make such a complicated and contingent cased to their Board / Senior Management and then ask them to invest money in it. Should we just give up? No - we just need to make a better case, and I believe that by taking responsibility for the Client's problem, rather than limiting ourselves to being just a part of the solution, we can!

Wednesday, October 21, 2009

The Operations Management Dilemma

Having spent a large part of my career in developing and selling Operational Intelligence Systems, I am one of many businesspeople around the world -- Manufacturing Systems vendors, Business Intelligence providers, Lean Manufacturing consultants and Six Sigma Black Belts who make their money persuading corporations to focus senior management attention on how they operate, and to spend money on ways to improve it. After all, a company's operations are what deliver value to its Customers and Owners, so why wouldn't they want to invest effort and resources in making them better. Right?

A presentation by Eoin O'Driscoll, (chair of the Irish Government's Enterprise Strategy Review Group) at the 2009 Midwest Entrepreneur Showcase gave me pause. In discussing how to create value for Customers, he referred to Peter Drucker's assertion that "Because its purpose is to create a customer, the business enterprise has two—and only these two—basic functions: marketing and innovation. Marketing and innovation produce results; all the rest are costs" (my italics). Therefore we in the "Operational Excellence" industry are trying to persuade corporations to spend scarce management time thinking about things that they do not consider core to their success, and then follow that up by asking them to invest even more money into an area of the business they already consider to be primarily a cost. What’s wrong with this picture?

While the issue is not quite as black and white as I describe above -- otherwise there would be no manufacturing software companies or Lean Six Sigma consultants at all -- it does jib to a startling degree with what we see in the market. In 2004, average western manufacturing efficiency was quoted as 45%, with world class companies at 85%. The figures have not changed that much since; the opportunity has been obvious for decades -- see the US vs. Japanese auto industry saga -- and yet many companies still at best reluctantly invest leadership time and effort into Operational Excellence as it's currently sold. They don't want what the industry is trying to sell them. Could they be right?