Written by Kaichang Zhang, PhD

Information is power and big data certainly carries a lot of power because it is a huge data set of information.  Big data is data that is too large for a company’s typical database systems to process it.  It refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.  The data could simply be too big to process it or it might not be in a format that the database can do anything about.  In order to process big data, companies need to find alternative methods and that is where big data analytics comes in. Big data analytics also  help sort through the data, which can be unstructured, structured or multi-structured.

Why would companies want to compile such vast amounts of data?  Because inside that data trends, patterns and correlations can be found, all of which can prove to be useful to a business.  In health, there are vast amounts of data from multiple sources that need to be analyzed to better understand health.  These include but are not limited to quality, outcomes, disease management, access to care, fraud, population health, and personalized medicine to name a few.   In fact, a 2011 McKinsey report estimated a potential $300 Billion in annual value by leveraging big data.  In this blog, I’m going to take a quick look into big data for organizations in general rather than just health.

Big data analytics is different from business intelligence because the data that big data analytics is data that is not often used by business intelligence so that the information found through big data analytics is different than data round in business intelligence reports.   Business intelligence also involves data gathering and analysis but big data analytics is typically data that business intelligence tools do not use; such as transaction data, clickstream data, web server logs, mobile phone call records, social media activity, and other data, such as metadata.

Unstructured data is data that is hard for traditional databases to read because it is not in a traditional data form.  Unstructured data is often text heavy and can be data that is not easy to organize.  Social media posts and Twitter tweets are unstructured data, as is metadata.  Structured data and multi-structured data comes in different formats and it is usually derived from interactions between machines and people such as social networks or web applications.  Web log data, web form data and transactional data are good examples of structured data.  Structured data can include text and images.

When talking about big data there are three terms that are used often:

  • Variety – Variety refers to the variety in the data formatting.  Big data analytics must be able to handle data in a variety of formats.  Unstructured data accounts for 85% of information available to us.
  • Velocity – Velocity refers to the data speed.  Data streaming in from real time needs to be processed quickly and so velocity is sometimes a challenge.
  • Volume – Volume refers to the actual volume of the data and how and where to store it.

Another attribute need to be considered is “data complexity”.   How a system (or systems) can handle and process the extreme volumes of data from a variety of seemly uncorrelated sources and formats and find needed pattern or singularity in a brief amount of time is a high form of complex systems engineering.  Analytics tools and other computing technologies are an indispensible part of   the data engineering process.

What is Big Data used for?

One of the values that big data analytics has is that it can enable new products and services.  By analyzing market data, customer feedback, and this is where social media data comes in handy, companies can develop new products and services that will meet the needs and wants of their customer base.  If a company is looking to gain an advantage over the competition, big data analytics is certainly the way to go.  It is also used to optimize a company’s current products or services as well.

When it comes to analytics, big data analytics can helps sort through the large volumes of data that would be too costly to process by conventional means.  Insights and patterns can be found by analyzing transactional data, geographical data and social data.  Big data analytics involves all of the data, instead of just data obtained through random data sampling of customers.  When you take sampled data, you can never be sure if it is inclusive or indicative of your entire data set.  With big data analytics, you are receiving results based on your entire data set.

In addition to new products and services, big data analytics can also help companies find ways to reduce their costs, improve their business decision-making, and improve efficiency.  Any company can acquire the data; it is what they do with it that makes the difference.  Smart ways to use big data analytics can help cut costs by helping to determine the cause of failures or defects, help determine better delivery routes, help clear inventory and maximize profits by analyzing prices.

Big data analytics can also help cut down on fraud through the use of data mining and clickstream analysis.  It is also the driving force behind being able to offer point of purchase retail coupons to customers based on their past or even current purchases and sending mobile alerts or texts to customers as they enter a certain area, recommending products and services in that particular area.  Financial departments can use big data analytics to quickly analyze and recalculate risk portfolios.   This is only a fraction of what big data analytics can do for a company.

Best Practices for Big Data Analytics

When it comes to big data analytics, there are several factors that come into play.  There are many companies and vendors who offer big data analytics services and tools.   For companies who are using big data analytics, there are some best practices to keep in mind, no matter what analytics tool they are using.

  • Logistics and Planning.  Big data projects are complex, and unless there is a set plan, they can fail.  Companies need to have all of the logistics and execution steps clearly defined, as well as a plan for what the data is going to be used for.  Make sure to have all of the available resources required for the project, and that includes a staff able to handle the actual project.  Starting a big data project and then finding that the resources and/or staff cannot support it is a waste of time and money for any company.
  • Data Security.  No matter if the data is stored on site or in a data warehouse, it needs to be secured.  There is very little point in securing big data if it is not safeguarded.  Clearly defined procedures should be in place that determines who accesses the data and for what prior to the project beginning.  Without security protocols, sensitive data can end up being available to those who have no business having it.  Companies need to put their data handling policies in writing and have procedures already in place before beginning.
  • Select Data properly.  Companies should already know what data they need for their analytics prior to the start so that they can only analyze the data subsets necessary and not the entire data set. By streamlining the data to fit their analytic companies goals will save time and money.  Companies will always have more data than they will use, picking out the useful data is important.
  • Data Maintenance.   Having a maintenance plan in place is also vitally important.  Big data analytics is an ongoing project that requires monitoring and maintenance, such as updates.  Big data analytics projects needs to be managed just as any other aspect or business project would need to be.  Overlooking the maintenance portion is a mistake that can cost money.  As a company’s business requirements change, so should their big data analytics project evolve to reflect any changes.

Big data analytics can be handled in a variety of ways but no matter how it is handled, it is a valuable asset for companies in a number of industries to have because the information that can be gained is highly beneficial.  The above practices will help to ensure that big data analytics projects are successful, no matter what technology route the company uses.