Wednesday, April 1, 2015

A new look at Moore’s Law – what it means for Data warehousing and BI

Today an article with a headline like “CRAMMING MORE Components Onto Integrated Circuits” would probably not do much for you as far as click baits go. But this was the title of Gordon Moore’s article, published in the Apr. 19, 1965 issue of Electronics magazine – 50 years ago this month–that introduced the world to a singular, shape-changing idea that would later become known as Moore’s Law. 

Moore begins his article thus: “The future of integrated electronics is the future of electronics itself. The advantages of integration will bring about a proliferation of electronics, pushing this science into many new areas. Integrated circuits will lead to such wonders as home computers–or at least terminals connected to a central computer–automatic controls for automobiles, and personal portable communications equipment. The electronic wristwatch needs only a display to be feasible today.”

The point Moore was trying to make was that semiconductor chips, invented just a few years before, were improving at a mind boggling pace. He tried to plot these price and performance points on regular graph paper, but the slope was too steep, so he switched to logarithmic graph paper. He got a nice straight and shallow line. It showed that the performance of these chips was doubling every few years (18 months at the time, 24 now).  It was undoubtedly impressive, but no one then–including Moore himself–realized this graph would continue to hold for 50 years and set the blistering pace of change for the modern world. We all now live in the world of Moore’s Law, and we likely will for at least another quarter-century.

As a result, data warehouses aren’t just bigger than a generation ago; they’re faster, support new data types, serve a wider range of business-critical functions, and are capable of providing actionable insights to anyone in the enterprise at any time or place. All of which makes the modern data warehouse more important than ever to business agility, innovation, and competitive advantage.

An interesting whitepaper from Oracle highlights and quite succinctly might I add, the top trends and opportunities in data warehousing. The following are the trends that are a direct result of Moore’s Law:

In-memory technologies supercharge data warehouse performance


Making the most of big data means quickly acquiring and analyzing a high volume of data generated in many different formats. All warehouse data used to be stored on magnetic disks. Now that data is being moved into RAM to achieve performance improvements that are orders of magnitude faster than previous methods. Thanks to new database capabilities, the entire data warehouse doesn’t need to be placed in-memory at the same time, enabling even the largest data warehouses to gain in-memory performance benefits. Data warehouse administrators can configure these environments to optimize data among RAM, Flash, and disk-based access methods based on heuristic access patterns. Important or frequently accessed data resides in RAM where processing becomes instantaneous. This lightning-fast database processing can entirely eliminate the need to create analytics indexes.
All that power opens up new analytics opportunities. For example, a restaurant chain might want to generate a daily consumption report to do dynamic spot promotions the following day. A bank might want to monitor unusual purchase patterns in real time to minimize loss from fraudulent credit card transactions.

On-demand sandbox analytics environments meet rising demand for rapid prototyping and information discovery


Business intelligence and analytics are resource-intensive activities that lend themselves to on-demand computing due to their iterative nature and fluctuating workloads. Forward-looking organizations are establishing analytics as a service (AaaS) environments within public and private clouds. These versatile “sandbox” environments can flex with a shifting volume and velocity of data, making them ideal for analyzing energy usage, monitoring shop floor operations, gauging consumer sentiment, and undertaking many other large-scale analytics challenges. Cutting-edge technologies, such as multitenant databases, enable organizations to set up one cloud environment with dozens or even hundreds of “pluggable” databases for people to use. Some leading companies are even monetizing their AaaS environments by setting up subscriber networks for other companies in their supply chains. On-demand provisioning makes it easy for business users both inside and outside the enterprise to utilize these environments as they respond to high-velocity demands of big data analysis.

The “datafication” of the enterprise spawns more-capable data warehouses


Historically, data warehouses were populated with structured business data from enterprise applications. Today, however, data is pouring in from human-generated, cloud-generated, and machine-generated sources—more than 90 percent of which is unstructured. This data can be collected not only from computers, but also from billions of mobile phones, tens of billions of social media posts, and an ever-expanding array of networked sensors from cars, utility meters, shipping containers, shop floor equipment, point-of-sale terminals, and many other sources. With the help of new big data technologies, data warehouses are expanding in variety and scope. This improves the quality and speed of business decision-making as people learn how to acquire, organize, and analyze this massive influx of information.
For example, manufacturing companies commonly embed sensors in their machinery to monitor usage patterns, predict maintenance problems, and enhance production quality. Studying the data streaming from these sensors allows them to improve their products and devise more-accurate service cycles. CIOs are responding to an increased use of data in the enterprise by deploying newer, faster, and more-capable information management systems.

My Take 

Storage and performance are no longer a limiting factor. A company with a half decent BI budget could create a DW monstrosity if it wished to. However, there is a tipping point and it is coming. We’ve grown so accustomed to living in the world of Moore’s Law that we forget we’re dealing with one of the most explosive forces in history. We’ve become so adept at predicting, incorporating and assimilating each new, upward tick of the curve that we assume we have this monster under control. We don’t.




References:

Thursday, March 5, 2015

Visualization and Presentation


As clichéd as it may sound – “a picture is worth a thousand words” and it couldn't be more apt in
today’s world where we are constantly bombarded by information. There is an increased need to depict this information in a way that makes it easier to ‘get it’ without putting the onus of analyzing the data on the consumer.





Finance: Let’s talk money – Credit Card Spend Analyzer

“How did I end up racking up such a huge credit card bill” – we have all been there. Well, some organizations are now making an attempt to help you better analyze your spending patterns, see purchases by category and drill down to individual transactions within a category and sort them by any attribute such as transaction date, dollar value, etc.

Case in point, the Discover Spend Analyzer



I find this to be one of the better ways for showing credit card transactions. 

What you see above is a combination of
  • Pie-chart - used to depict the categories of the purchases for the selected period, which is one month in this case. In this example, you could immediately figure out that the major purchases for the selected period were supermarket and medical spending.
  • Bar-chart - used to depict monthly spending for the current and the past months. Again, very easy to know that for the spends for the selected period is within the average spending whereas it was above average for the month of April.
  • A great value-add is that line on the bar chart which depicts the average spend across all the months which immediately lets you figure out how much you are spending in a month. This lets you narrow down to those months where spends were way above average, drill down to those months to see why was that and then take corrective actions if required.
  • You could also look at that pie chart over different time periods by selecting 1 month, 3 months, 24 months, etc.

Sports: Twitter Activity Analysis

This particular example would make Football fans really happy (I am talking about football which is played using your foot).

What you see below is an analysis of the twitter feed by the game being played, (for example, Germany versus Brazil) which includes:
  • The top 10 hashtags that were trending over time during the game (the radius of the circle is directly proportional to how many times the hashtag was used)
  • A line graph showing the total number of relevant tweets over time
  • Location information from where these tweets originated superimposed on a map


This is a classic example where a ton of data has been represented in a single view. The biggest advantage of such representation is that it puts things in new perspective and lends new insight which otherwise would not have been possible my merely looking at the raw data. This is the power of visualization. The above dashboard was created using Tableau.


Retail: Sales Data Analysis


Let's look at leveraging dashboards and visualizations from a business perspective. Assume you are the  CEO of a large consumer goods company and you want a bird's eye view of your business to ensure everything is fine and if anything needs special attention.


What you see above consists of three important business dashboards:

  • The first dashboard gives you at-a-glance understanding of profitability, with views presented by geography, product category, and customer segment. Here you see bar charts sales information by customers and product category. The overall sales by locations is depicted by superimposing sales data onto a map where the radius of the circle is directly proportional to sales quantity.
  • The second dashboard focuses on products - here the size of the squares represent quantity sold. For example, it is clear that for the month of Jan, Paper and Binders were sold the most. At the same time, the color of the box indicates profit or less. For example, in Jan, Art Supplies led to a loss of $360 in-spite sales of 1,183 units.
  • The third on customers - here you see a scatter plot which immediately lets you see who are your most profitable customers as well as who aren't (intuitively colored red). The horizontal bar chart again depicts similar information in a different way.

Each visualization offers much deeper layers of information, allowing you, the CEO (or any executive for that matter) to identify specific problems and opportunities in minutes.

Using the filter panels on each dashboard enables you to further navigate the data with your own criteria, giving you greater control and flexibility. In short, you get to dive into the information that is important to you.


Final words....

All industries and applications are headed the visual way. The above examples are just a few of thousands out there that cement this fact. Not surprisingly, there is a huge demand for professionals that can understand the business and create these dashboards that cuts analysis time from days to minutes. Thus, enabling big businesses to individual customers make quick and informed decisions.




References



Thursday, February 19, 2015

Big Unstructured Data v/s Structured Relational Data – FIGHT!

This is one of those things where you need to completely understand one before attempting to understand the other. Hence, let’s start with Structured Relational Data.

Structured Relational Data

The last time you fired up Microsoft Excel (I am guessing it would have been for the BI homework of creating the fact table for the transcript) and you filled in information in those cells which are formed by those neat little lines called rows and columns – what you created here was in essence relational structured data. Imagine this at an industrial scale and you have database management systems such as Oracle Database, IBM DB/2, Microsoft SQL Server, etc.
Here is the bookish definition of structured database (Here I talk about database and not just data because all structured data is stored in a database of some sort be it the table in an excel workbook or the Hilton's customer relationship management system running atop Oracle):
relational database is a digital database whose organization is based on the relational model of data, as proposed by E.F. Codd in 1970. This model organizes data into one or more tables (or "relations") of rows and columns, with a unique key for each row. Generally, each entity type described in a database has its own table, the rows representing instances of that entity and the columns representing the attribute values describing each instance. Because each row in a table has its own unique key, rows in other tables that are related to it can be linked to it by storing the original row's unique key as an attribute of the secondary row (where it is known as a "foreign key"). Codd showed that data relationships of arbitrary complexity can be represented using this simple set of concepts.


Big Unstructured Data 

Coming to Big Unstructured Data – Simply put it is everything that structured relational data is NOT.
Here is the bookish definition of unstructured data:
Unstructured data (or unstructured information) refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.
IBM has done a fantastic job of defining Big Unstructured Data using four characteristics also called the The FOUR V’s of Big Data. They are:

Volume – Scale of Data



Velocity – Analysis of Streaming Data



Variety – Different Forms of Data




Veracity – Uncertainty of Data





The Three types of Data – Data, Data and Data

Here is an interesting video explaining what are the three types of data i.e. Structured, Unstructured and Semi structured. BEFORE you watch the video, here is a mind exercise for you – count how many times the word ‘data’ is spoken in the video.






Present Day Scenario

If it wasn't already clear from above, Unstructured Big Data is growing at an exponential rate. As per the info-graphic below by IBM, by 2015, structured data would account for only 20% of all data whereas unstructured data such as VOIP, Social Media, sensors and devices would account for four times as much.





Where does data warehousing fit in all this?

The short answer is “Use the best tool for the job”. What I mean by this is that the traditional data warehouses are not going anywhere anytime soon. Data warehouses have had staying power because the concept of a central data repository—fed by dozens or hundreds of databases, applications, and other source systems—continues to be the best, most efficient way for companies to get an enterprise-wide view of their customers, supply chains, sales, and operations.

That said data warehouse and big data environments can come together in an integrated and very complementary way. In the following scenario, the Hadoop system can perform quickly. For instance, a high-tech company might want to extract data from its social networking page and cross reference it with data from the data warehouse to update a client’s social network circle of friends. The environment might also use Hadoop to quickly “score” that person’s social influence. Then that data will be provisioned back to the data warehouse so that, for example, a campaign manager can view that person’s influence score and re-segment him/her as a result.



The point to make here is that each system here is doing what it is best designed to do. Although, every rule has an exception, big data and data warehouse technologies are optimized for different purposes. Again, the goal is to use these solutions for what they were designed to do. In other words: Use the best tool for the job.



Limitation of Data Warehousing


Limitation of traditional Data Warehouses

These are based on those neat lines called rows and columns we spoke of earlier. When we throw an audio file or a tweet at it, it goes for a toss and simply doesn't understand how to interpret it. As a result, it can’t handle more than a few terabytes of data efficiently. Also, all those ETL transformations inherently introduce latency.

Limitations of the fancy Big Data Warehouses

These warehouses eat millions of tweets for breakfast and a gazillion Facebook profiles for lunch. However, ask it crunch hardcore reports and it struggles because it is so bulky and lethargic given all the unstructured data it has gobbled up. That’s where the traditional data warehouse is reign supreme. And good luck querying in Hadoop – the SQL support is very limited.



Future of Data Warehousing

A white paper by Oracle explored the top 10 trends in data warehousing and I believe it pretty much sums up where Data Warehousing is headed.
  1. The “datafication” of the enterprise requires more adept data warehouses - Mobile devices, social media traffic, networked sensors (i.e. the Internet of Things), and other sources are generating an exponentially growing stream of data. IT teams are responding by adding new capabilities to data warehouses so they can handle new types of data, more data, and do so faster than ever before.
  2. Physical and logical consolidation help reduce costs - The answer to datafication isn’t simply to invest more money in these systems. In other words, 10x the data shouldn’t translate into 10x the cost. So expanding data warehouses must be amalgamated, through a blend of virtualization, compression, multi-tenant databases, and servers that are engineered to handle much higher data volumes and workloads.
  3. Hadoop optimizes data warehouse environments - The open source Hadoop program, given its distributed file system (HDFS) and parallel MapReduce paradigm, excels at processing enormous data sets. This makes Hadoop a great companion to “standard” data warehouses and explains why a growing number of data warehouse administrators are now using Hadoop to balance some of the heaviest workloads.
  4. Customer experience (CX) strategies use real-time analytics to improve marketing campaigns - Data warehouses play a pivotal role in CX initiatives because they house the data used to establish a comprehensive, 360-degree view of your customer base. A data warehouse of customer information can be used for sentiment analysis, personalization, marketing automation, sales, and customer service.
  5. Engineered systems are becoming a preferred approach for large scale information management - If one is not careful, data warehouses can become a complex association of disconnected pieces—servers, storage, database software, and other components—but not necessarily. Engineered systems such as Oracle Big Data Appliance and Oracle Exadata Database Machine are preconfigured and optimized for specific kinds of workloads, delivering the highest levels of performance without the pain of integration and configuration.
  6. On-demand analytics environments meet the growing demand for rapid prototyping and information discovery - Akin to cloud computing’s software-as-a-service model, the concept of “analytics as a service” is a technical breakthroughs allows administrators to provide “sandboxes” in a data warehouse environment for use in support of new analytics projects.
  7. Data compression enables higher-volume, higher-value analytics -  The best way to counter non-stop data expansion is nothing but data compression. The organization’s data may be growing at 10X, but advanced compression methods can match that enabling companies to capture and store more valuable data without 10X the cost and 10X the pain.
  8. In-database analytics simplify analysis - Ideally, a data warehouse will have a range of ready-to-use tools—native SQL, R integration, and data mining algorithms, for example–to kick start and expedite data analysis. Such in-database analytics capabilities minimize the need to move data back and forth between systems and applications for analysis, resulting in highly streamlined and optimized data discovery.
  9. In-memory technologies supercharge performance - The emergence of in-memory database architecture brings sports car-like performance to data warehouses. The term in memory refers to the ability to process large data sets in system RAM, accelerating number-crunching and reporting of actionable information.
  10. Data warehouses are more critical than ever to business operations - While it’s true that data warehouses have been around for years, their significance keeps increasing since they represent a firm’s most valued assets—prized information on clients and business performance. Moreover, organizations are finding new applications for data warehouses, such as the example above, where healthcare providers are using enterprise DW/BI solutions to enhance patient care and streamline processes.









References

Tuesday, February 3, 2015

Business Intelligence and Analystics Platforms - Demystified

E
ver walked into a Walmart and wondered why Eggs are placed close to Milk or bread is placed near cereal or for that matter how do they decide what items should be placed on their shelves and where to place them. It is obvious that these decisions are not based merely on gut feeling. They are getting help - these decision are driven by Business Intelligence.


Simply put, it is the ability to make intelligent decisions based on facts. In the Walmart example above, every time you go to a checkout counter, the attendant scans the items, generates a bill and you pay for it. Now imagine this happening 10,000 times per second. Every such transaction is recorded and in essence is a fact. Walmart feeds this data into a magic software and out comes the results. The results are in the form, if a customer buys A and B, then he/she buys C. What then Walmart does is it conveniently places the yogurt right next to the milk and eggs. This is a case of market basket analysis which enables Walmart make intelligent product placement decisions.

So what are the benefits? To put into perspective, in 2009, Amazon’s revenue was $24.5 Billion. A staggering $5 billion came from “recommended” products which was nearly 20% of their total revenue! All of this is made possible using Business Intelligence and Analytics Platforms.




Before we delve deeper into BI platforms, here is the textbook definition of BI:

Business intelligence (BI) is the set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis purposes. BI technologies are capable of handling large amounts of unstructured data to help identify, develop and otherwise create new strategic business opportunities. The goal of BI is to allow for the easy interpretation of these large volumes of data. Identifying new opportunities and implementing an effective strategy based on insights can provide businesses with a competitive market advantage and long-term stability

Below is a pictorial representation of a BI implementation example:





Business Intelligence and Analytics Platforms

Parameters for comparison

  • Intuitive
    • This is a measure of how easy is it for a new user to start using the application and generate meaningful results.
    • This is  also a measure of the learning curve. Steeper the curve, lower the points awarded
  • Cost to implement
    • This is an important factor since it could often be the deciding factor in the selection of the BI platform especially for small companies
    • Another aspect that this parameter implicitly accounts for is the value for money. Higher points if more features are provided at a lower cost.
    • Extra points if there are different versions available to cater to different market segments.
  • Dashboards
    • This is a platform's ability to enable its users to create rich, visual and interactive dashboard and data visualizations within a short time with minimal technical know-how.
    • Extra points for the ability to have dashboards in the cloud and delivery through mobile devices.
  • Support
    • This is the measure for customer experience during the sales process, after sales support and implementation support.
    • This also implicitly measures how often upgrades and updates are issued and how easy or difficult is it to apply the upgrades/updates and how much time it takes (i.e. any downtime that the client might face).
  • Security
    • In light of the recent security breach and huge financial implications  that it brings with it, Security is as important if not more compared to other parameters.
    • Severe point penalties for any vulnerabilities that could compromise data security and integrity.



Comparative Analysis

Below are some of the BI & Analytics offerings from different vendors. For the purpose of this discussion, I have considered offering from the vendors that belong to the Leader’s quadrant in the Gartner’s Magic Quadrant:




Tableau


Strengths - Tableau's biggest strength is that it offers extremely intuitive, interactive data exploration experience. Many competitors have tried to follow in their footsteps. They have carved out a huge market share with their ability to meet dominant and mainstream buying requirements for ease of use, breadth of use and enabling business users to perform more complex types of analysis without extensive skills or IT assistance — and competitive differentiation continue to increase its momentum, even though it operates in an increasingly crowded market in which most other vendors view it as a target.

Areas of improvement - In-spite of its strengths, clients do not use Tableau as their primary BI platform.  With traditional vendors investing heavily in data discovery capabilities, it could threaten Tableau's dominance. Another aspect is poor after sales experience which could keep potential buyers away. Also its enterprise features such as metadata management and BI infrastructure are below average.



Qlik - Qlikview

Strengths - Market leader in data discovery. Selfcontained
BI platform, based on an inmemory associative search engine. But their biggest, most bold move is to address the need for a BI platform standard that can fulfill both business users' requirements for ease of use and IT's requirements for enterprise features relating to re-usability, data governance and control and scalability.

Areas of improvement - Not enterprise ready. Below average meta-data management, BI infrastructure and embeddable analytics. Concerns about its capability for managing security and administering large numbers of named users. 


Microstrategy

Strengths The key strength of Microstrategy is its ability to store the reports and dashboards on cloud. It also supports BigData and Hadoop. It is a very user friendly tool for non-technical users who can build reports with just drag and drop functionality. Another distinguishing feature is its ability to provide mobile experience and also access the data in offline mode.

Areas of improvement - One of the areas it needs improvement is it allows very rigid data structures so data processing needs to be done before using it for analysis. Also there is no feature of predictive analysis supported by this tool which is disappointing for many users.


SAS

Strengths SAS's core strength is in its advanced analytical
techniques, such as data mining, predictive modeling, simulation and optimization. Industry and domain specific
advanced analytic offerings. Support for extremely large volumes of data.

Areas of improvement - Higher than normal complexity making it most difficult to use and most difficult to implement. Needs significant improvement in reporting, dashboards,
OLAP, interactive visualization and other traditional BI functionality.


IBM

Strengths - Amazing sales and product strategy coupled with support from IBM Global Services and a global presence. Capability to support larger deployments. Radical new approach to data discovery with the Watson Analytics offering. Simplified licensing model. Innovative features such as natural language query.

Areas of improvement - Significantly high cost of procurement and implementation. Poor sales experience and clients have expressed frustration with IBM's sales and contracting, and high numbers of audits. At 6.2 days, the time taken to generate a report is much higher than the industry average of 4.3 days. The move to "smart discovery", bypassing traditional data discovery progression, may result in technical challenges for customers.



So how do they stack up against each other?

Here is how I rate the different platforms across the parameters discussed above:



Below is a composite graphical representation of the individual factor scores as well as the weighted totals:




My Recommendation

There are many things that many vendors do right and they have their niche. However, if I was to pick one, it would undoubtedly have to be Tableau. And customer feedback echo the same sentiments. Tableau checks all the important boxes such as ease of implementation, super intuitive, particularly with its core differentiator — making a range of types of analysis (from simple to complex) accessible and easy for the ordinary business user, whom Tableau effectively transforms into a "data superhero."



References:
http://www.statisticbrain.com/wal-mart-company-statistics/
http://ecr-all.org/files/I.Liiv_Gaining_Shopper_Insights_Using_Market_Basket_Analysis.pdf
http://en.wikipedia.org/wiki/Business_intelligence
http://www.gartner.com/technology/reprints.do?id=1-1QLGACN&ct=140210&st=sb
http://www.tableau.com/new-features/8.3
http://public.dhe.ibm.com/common/ssi/ecm/yt/en/ytw03250caen/YTW03250CAEN.PDF
http://www-01.ibm.com/software/analytics/cognos/solutions.html
http://www.microstrategy.com/us/analytics
http://www.qlik.com/us/explore/resources
http://www.sas.com/en_us/software/business-intelligence.html

Monday, January 19, 2015

Stuff Business Intelligence Users Say

Watch the video for some laughs before we delve into depths of the Mariana Trench equivalent of the IT world i.e. Business Intelligence!




PS: This video is meant as an ice-breaker and may be excluded from grading considerations