Friday 20 April 2012

Big Data - Future of Business Intelligence

With the rise of the social media in internet and the integration of the business with the social media fueled an exponential growth in data. These data can be used to derive good information for making better business decisions. This exponential growth of data is called Big Data. Big Data is an umbrella term for the huge amount (petabytes or exabytes or zettabyte) of unstructured and/or semi- structured data. Almost 95% of the big data is in text format.

To better understand “Big Data” Imagine a scenario in personal banking credit card business line.
If we have an application that tracks
  1.     How many people browsed on credit cards page of financial institution website?
  2.     How long they stayed on particular page?
  3.     What are the card types they searched and compared?
  4.    What are the search strings they used (rental car insurance, low interest, high reward points etc...)?
Etc….
     These unstructured data are structured in our internal data warehouse and  used by the business to design better product, better marketing campaign etc…. Consider adding additional set of related data provided to us from credit card aggregation websites like creditcards.com, chargecards.ca, Canadiancards.ca etc…Add to it the data set which captures where (Amazon, eBay, retail stores websites) a person uses their credit card frequently. After adding this data set to our datawarehouse our product team can design better product lines They can even analyze where the market is moving. This sort of analysis can be done only by the “Big Data”.  This is “Big Data” in simple terms.

Size of Big Data:
     As name itself says it is very big in terms of thousands of terabytes or several petabytes.
Ex:If we have 15 million people accessing an average 5 credit card related website .Each customer making 25 records on average in each website. Roughly we are looking at 2 billion records per day. Add to that the different dimensional requirement for better business decision making. Roughly we are looking at 4 billion records per day for a single business line. On top of this big data can use the data from social media (face book, twitter, YouTube etc…) for its analysis. Each day terabytes (TB) of data are added to the system. This is the size of the big data. (Just a scenario to understand the growth and size of “Big Data”)

Is it possible to have such a system?
     As big data came into existence with the growth of social media on internet, this big data will be a reality in future.Since the data is semi or un-structured it is very difficult or costly with the existing database technology or the exiting BI process to convert this data in to useful information. Even to process this petabytes of data we require powerful computers that work in parallel. With new technological process like HADOOP, Map Reduce, cloud computing  this sort of parallel computing in becoming a reality.

Are we not doing this today?
     Some applications are already available to use the part of the semi or unstructured data. As of now Google does this (advertising based on our search & history), Netflix does this based on our user settings, Amazon does this based on market purchase pattern and our purchase patters, CIA and many international investigating organizations does this . Many multinational organizations do this sort of analysis by combining these data with sampling process. We have to understand that all these organization use only their internal data for their analysis.

     Over all Big data takes the business intelligence to the next level. We are in early stages of this technology development.With this sort of “Big Data” analytics available, organizations can produce better product that would satisfy the customers, design better marketing campaigns, even predict the market movement.The problem with the big data is not only the computational capability.The structuring of the unstructured or semi structured data is the biggest problem ... we will see in detail about this in the next blog….

Note: With this “Big Data” analytics growth we have to revisit the sampling theory that has been used for centuries in decision  making  process.