Friday 11 May 2012

Big Data - Data Concerns..

All,

Hope the last post had provided you with an understanding of what is big data and what it can do. let us see say what are the data problems we may face to move ahead in this new technology.

The major problem in Big data Implementation are

  1. Processing the un-structured and semi-structured data
  2. Deciphering the information form unstructured or semi structured data

what is unstructured  or semi structure data?

In simple terms any data elements that can be stored in rows and columns in a database are called structured data. If it can't be stored in the rows/columns and to be stored as BLOB's (Binary Large Objects) they are called as unstructured or semi structured data.
(Note: Yet the science could not clearly define the unstructured or semi structured data. But this is the base line which the science group is working upon).

From our personal banking credit card division example

Structured data                : credit card details like card type, interest rate, benefits , maximum limit etc...
Un / Semi structured data :  search parameters on bank website, email to bank representative, blogs on other websites etc...

If we like to have answer to the following
  1. what are the credit card types having interest rates of 19% pa?
  2. what are the credit card types having minimum card limit of 5000 $? etc...

These questions can be answered by querying the structured data with specific inputs .From technical stand point we were able to retrieve the information directly by writing simple queries.

Let us consider a scenario in unstructured data that we want to analyze "how many people searched / looked for credit cards with maximum limit of 5000 $ ?" and let us consider these are the search parameters that has been done at our website

  1. card limit 5k
  2. card limit 5000
  3. card limit 5000 cad
  4. card limit five thousand
  5. card limit five thousand canadian dollars
  6. limit five thousand dollars
  7. credit cards + 5000
  8. 5000

First problem is understanding the unstructured data.
How can we conclude that the searches are made for credit cards of limit 5000 dollars ?
       Example :
  •  Search parameter no 6 ("limit five thousand dollars") , the user may be searching for the saving account where minimum balance should be 5000 limit or the user may be looking for investments with limit of 5000.
  •  Search parameter no 8 ("5000") this parameter is too vague to co-relate it to the credit card

If we ignore this data understanding  problem and consider all the searches were looking for credit cards having limit of 5000 dollars.
what will be my search parameters ? How the typical query has to be structured ? etc...
currently we have to make lot of assumptions to derive an information from un / semi structured data.

One approach that I can think of to tackle this problem is, to capture the metadata of the search. By co-relating the search parameters with the metadata of the search we can come to certain conclusion.
     Ex:  which page the search was made ?
            If the search parameter were made on credit card page then we can come to conclusion the user is looking for credit cards.
On this approach also, How to correlate the metadata with the actual data element captured from the user is another problem ?

Conclusion:
Big data is like an gold mine. we will have to process  huge set of data to get useful information for the business to help them for decision making.Yet this technology is in its infant stage.By the growth of could computing, parallel processing technologies BIG data will be a reality in near future. The useful ness of BIG data is highly seen in the field of personal business intelligence, Health care industry, marketing industry.
"To get one ounce gold we have to process 33 tons of rock, same goes to big data"

Friday 20 April 2012

Big Data - Future of Business Intelligence

With the rise of the social media in internet and the integration of the business with the social media fueled an exponential growth in data. These data can be used to derive good information for making better business decisions. This exponential growth of data is called Big Data. Big Data is an umbrella term for the huge amount (petabytes or exabytes or zettabyte) of unstructured and/or semi- structured data. Almost 95% of the big data is in text format.

To better understand “Big Data” Imagine a scenario in personal banking credit card business line.
If we have an application that tracks
  1.     How many people browsed on credit cards page of financial institution website?
  2.     How long they stayed on particular page?
  3.     What are the card types they searched and compared?
  4.    What are the search strings they used (rental car insurance, low interest, high reward points etc...)?
Etc….
     These unstructured data are structured in our internal data warehouse and  used by the business to design better product, better marketing campaign etc…. Consider adding additional set of related data provided to us from credit card aggregation websites like creditcards.com, chargecards.ca, Canadiancards.ca etc…Add to it the data set which captures where (Amazon, eBay, retail stores websites) a person uses their credit card frequently. After adding this data set to our datawarehouse our product team can design better product lines They can even analyze where the market is moving. This sort of analysis can be done only by the “Big Data”.  This is “Big Data” in simple terms.

Size of Big Data:
     As name itself says it is very big in terms of thousands of terabytes or several petabytes.
Ex:If we have 15 million people accessing an average 5 credit card related website .Each customer making 25 records on average in each website. Roughly we are looking at 2 billion records per day. Add to that the different dimensional requirement for better business decision making. Roughly we are looking at 4 billion records per day for a single business line. On top of this big data can use the data from social media (face book, twitter, YouTube etc…) for its analysis. Each day terabytes (TB) of data are added to the system. This is the size of the big data. (Just a scenario to understand the growth and size of “Big Data”)

Is it possible to have such a system?
     As big data came into existence with the growth of social media on internet, this big data will be a reality in future.Since the data is semi or un-structured it is very difficult or costly with the existing database technology or the exiting BI process to convert this data in to useful information. Even to process this petabytes of data we require powerful computers that work in parallel. With new technological process like HADOOP, Map Reduce, cloud computing  this sort of parallel computing in becoming a reality.

Are we not doing this today?
     Some applications are already available to use the part of the semi or unstructured data. As of now Google does this (advertising based on our search & history), Netflix does this based on our user settings, Amazon does this based on market purchase pattern and our purchase patters, CIA and many international investigating organizations does this . Many multinational organizations do this sort of analysis by combining these data with sampling process. We have to understand that all these organization use only their internal data for their analysis.

     Over all Big data takes the business intelligence to the next level. We are in early stages of this technology development.With this sort of “Big Data” analytics available, organizations can produce better product that would satisfy the customers, design better marketing campaigns, even predict the market movement.The problem with the big data is not only the computational capability.The structuring of the unstructured or semi structured data is the biggest problem ... we will see in detail about this in the next blog….

Note: With this “Big Data” analytics growth we have to revisit the sampling theory that has been used for centuries in decision  making  process.

Thursday 9 February 2012

Business Intelligence for start-up's

I always wonder, when the economic bureau news  says around 50 % of start-ups fail in first 2 years and more than 90% start-ups fail in first five years. Only 2% of start-ups survive after 10 years. I hear the same news for last 10 years. 
On the other hand BI market is growing around 10% annually. As BI (Business Intelligence) is a process that helps the stake holders to take decision, which will increase the life of the business.The economic bureau news should be reversed.Surprisingly not!!!
  • Is it really true, that BI helps the business to thrive?
  • Does the BI make any difference in the business life cycle?
  • Can BI help in avoiding the start-up failures?
  • Can’t the start ups and the young companies with small bank balances use BI for their growth?
  • Or Is Business Intelligence solution should be used only by the big companies with bulk bank balances?
 Before answering our question let me explain what is BI and where it is making an impact  

Business intelligence (BI), is an umbrella term that refers to a variety of software applications used to collect and analyze the  organization’s raw data.BI as a discipline is made up of several related activities, including data mining, analytical processing, querying and reporting.

Companies use BI to improve
Decision making                                     - Restaurant’s chains like Wendy’s
Cut costs                                               - Wal Mart
Identify new business opportunities          - Target, GE
Launch new products                              - Wendy, Ruby Tuesday’s
Planning                                                - Professional sports team red sox, Patriots (They claim that BI is the reason for them to win the Super bowl 3 times), South African cricket team
Etc…The list is end less

It is true the BI is helping in increasing the life cycle of an organization (Ex: Walmart, Patriots sports team etc...). There will be no question about it.
Then our next question is why the start-ups fail? Is BI only for big companies?

No, for utilizing BI the company need not to be established nor it should have bulk balances in their bank.As older companies have an established process and product line, they can reap benefit from BI immediately for process improvement, cost cut down, operational improvement, financial planning, etc…The start ups don’t have anything established which they want to control, correct or to improve upon.For start up BI can be used to establish a process or correct the processes before it matures. Once the process matures it is very difficult to change. BI can be used in areas which help the start-ups to survive and grow.

The discussion point for any organization to go for a BI are
1. Usability
2. Affordability

Usability:
 If you see the reason for the start up failures, it will fall in to any of these 4 categories
-          Wrong product and market
-          Customer dissatisfaction
-          Financial problems
-          Wrong Management

Having a BI can help the company to survive these problems.
Wrong product and market: For a start up every entrepreneur has to do their due diligence even before entering in to the business.If there was a mistake in choosing product or market, establishing a BI process may not be able to help the organizations life cycle. The existing BI reports (Government or private agency reports) have to be used by the entrepreneurs to re-access their situation and make a decision. Once the organization grows, the BI process plays a major role for product and market management. BI is required for analysing various things from market reach, validating the promotion effectiveness, new customer reach, product categorisation, etc…

Customer dissatisfaction: This is the second biggest reason for the start-up failures. For any business to survive they require happy customers and word of mouth marketing from these satisfied customers. The customer reference is required for business growth. Establishing a BI process for this area will help the organization not only to survive but to grow and to out grow the competitors. The BI process will help the management to identify and rectify the customer dissatisfaction at root cause level. It will help them to validate the effectiveness of the corrective actions the management had implemented.

Financial problems: This is the third reason where most of the entrepreneurs fail. This is the area where most of entrepreneurs lack knowledge.  They fail to observe the earlier signs. They fail to see the warnings on cost increase, Profitability decrease, bills receivables, account payable etc...Having a BI for this department will help the entrepreneurs to identify the earlier signs easily and make necessary corrective actions before it goes uncontrollable.

Wrong Management: If there are any red flags created in the customer department or on the financial department, probably that is due to the wrong management. A BI process can be implemented to see the employee satisfaction level, which in turn drives the customer satisfaction of the business.

Over all for a start-up BI process is required in the financial department and customer satisfaction department. Once start-up become stabilised with cash flow the BI implementation is very much required on the marketing and product departments. Once the organization matures the BI can be implemented across organization for better cost control and process improvement.

Ex: Infosys (largest IT service provider in INDIA with 150,000 employees) had a BI process implemented on their HR department from day one of their operation. Now they were having BI process implemented across all departments.

Affordability:
As most of us think, Implementing BI doesn’t require millions of dollars to start with. For a start up spending 400 dollars on Microsoft office suite and another 500 $ on the developer is better place to start with. If the organization doesn’t want to spend 900 dollars initially they can use any of the cloud solutions from Google, Microsoft web apps, Zoho etc… These products may cost maximum 30 dollars per month. By spending maximum of 1000 dollars (if you know development on excel / Access then it is only 500 $) we have a  BI solution for start-up in our hand. This investment will help the business to survive the initial  years.

Once the organization matures and the data set becomes big, either the organization can go for BI proprietary products like SAP, SAS, ORACLE, IBM etc...(If ready to spend millions) or they can target Open source solutions like Pentaho, Jasper, PHP & MySQL combo etc… (Require only 25% of the cost of proprietary products On Average the cost of Implementing an open source solution will span between 30,000 $ to 500,000 $ annually based on the complexity and the volume.

Once an organization crosses a threshold time of 10 + years, either it can stick with the existing open source solution or they can move to the proprietary product solution.  The reason I would prefer to move towards the proprietary solution is due to ease of implementation and the proprietary products provide enhanced insights and capability for the top management decisions. Moreover they are always updated with the new technologies for better performance. To me these enhanced features or technology advancements were not required at initial stages of the business.
One more thing as we have a BI process implemented from day one, without any risk we can move from open source to proprietary tools based BI solution.

Hope by utilizing an BI solution for the start-ups  and after 5 years we will have the revised news  from business bureau as "only 2% of the start-ups fail after 5 years in operation".

Friday 3 February 2012

Facebook IPO Fever.....

The topic that is very hot around the world today is Facebook IPO filing, with 100 billion dollar valuation.

I ask this question my self,
  • Why should I invest on this stock?
  • Is it really Facebook worth for 100 billion, valuation?
This company is well managed, interesting product, huge customer base and good revenue for past year.With 843 million users and 48 % of users logging in daily, easily it will have 400 million visitors daily.With 10 cents income per customers, it can easily earn 4 million dollars daily. The numbers look very attractive for any investors to buy the share.


The revenue of the Facebook completely depends upon the user’s interest to login to the system. 
Why any user should login to Facebook at first place? 
The driving factor for the end users to login to this website is very fragile. I login since all my friends’ login to the website. No compelling reasons, no atmost benefit for me because of my login.
It is like a FAD. Marketing company were utilizing this FAD

How far the people will be interested in Facebook? what is the risk of losing them ?

It is unlike Google (take users where they want in web world), Apple (outstanding product lines) & Microsoft (from Operating systems to different product line) where they provide something that people or business required for their day to day activities. This will keep users coming back to them. They have some sustained product line, on which they can build upon.
Facebook don’t have a sustained product line. I receive e-mail from Facebook asking me to login. Most of us would have received if we don’t login the Facebook for a week.Facebook is something that is not a necessity, it is something that is always a good to have. The risk for an investor is when this FAD on Facebook will end.
History had shown the impact of these fads on AOL, Yahoo, MySpace etc…

Already Facebook is over valued around 100 times the current earnings. Most of the times all compare Facebook with Google for it's income potential.
The growth of the Google stocks with an IPO price of 85$ is 700% (Current price is 595$) over 8 years. The market value of Google is 24 billion dollars at IPO and today it is valued at 154 billion dollars.
With this equation if you want to get the same return considering 22$ IPO price for Facebook, the market cap of the company has to grow to 700 billion dollars to reach a price of 154$ per share.
GE the biggest conglomerate in the world is valued at 200 billion today. So the Face book has to outgrow the GE. Hope that will happen !!!

All said one should not forget, Facebook is already having 3 billion dollars in bank and with IPO it will have around  another 10 billion dollars in bank. With 13 billion dollars at it’s kitty they may venture into another business (as Google entered into mobile market). That business may be a sustainable business model. This calculation makes the company a very valuable target.It all depends upon what Facbook do with this 13 billion .

With these facts, I would say currently the Facebook stock is not as attractive as it is claimed for retail investors. If Facebook can provide a better business model projection, then the retail investors can validate this stock for a reasonable price. For naïve retail investors, this stock and the income potential is like a mirage. Please control the emotion before jumping into.Please do your home work.

For Institutional investors, they are well informed about this situation, they will utilise this IPO to make some money.So there will be a rush from institutional investors.
Please be aware in stock market, "some one has to lose money for some one to gain money"
Retail investors control your emotion and do your home work before jumping into.

Thursday 19 January 2012

Data warehousing for human Life - Contd 5


When I spoke about this topic to my teacher and well-wisher Mr.Bruce Andrews  he asked me the following questions to answer my self.
  • “The challenging part of designing a Data Warehouse is understanding what types of questions you may want to ask, which influences the structure of your data.”
  • “Are we going to try to measure ourselves through time?
My response is :
 
“The challenging part of designing a Data Warehouse is understanding what types of questions you may want to ask, which influences the structure of your data.”

The application we are trying to build should answer the basic questions. which are directly impacting and on analysing can improve our day to day life. Which we have in our mind as an adult, as an employee, as a young parent, as an ambitious person, as parent’s of an adult, as grand parents. Our application should be capable of answering 10-15 questions under each category.

As a young parent we would lie to have the following question answered:
  1. How active is my kid?
  2. How long he / she spend in physical activity?
  3. How long he / she spend in learning activity?
  4. How long he / she spend in extra curricular activity (painting, drawing etc...)?
  5. How long he / she spend in Internet?
  6. How long he / she spend in TV & electronic media?
  7. What are his / her eating habits?
  8. How may time he / she had minor medial problem for a period of time?
  9. What are his / her seasonal medical problems?
  10. What are his / her emotional reactions (mostly dissatisfied / happy / creepy)?
  11. What are his / her medical allergic situations?
  12. What are his / her physical allergic situations?
  13. Etc…
Note: These lists are for Illustrative purpose only. We can short list the objective questions after doing a research involving the SME’s.

Going by 80- 20 rule (Pareto rule), by understanding the 20 % of the things that impact our life we should be able to tune our life for better living. Our datawarehouse should be capable of covering that  20 %.


“Are we going to try to measure ourselves through time?

My answer is a ‘Yes’ and ‘No’.
Yes:  As I mentioned in my previous posts certain things can be analysed / compared with out an impact on time. Like the Intellectual capabilities, healthy habits etc…
                        Ex: How many languages my grandfather had known at the age of 30?
                              How many hours had my parents spent on entertainment activities?

No: The things which are impacted by time and may not provide value for today’s life style like driving a chariot, building a mud house etc…
                        Ex: (Being an Asian) does my grandfather know to speak Italian at the age of 30?

TID BIT:

Nicholas Felton:  One of the designers who built the Facebook’s Time line feature had virtually built a datawarehouse for individual’s life. The interesting part is that he landed in Facebook by building an analytical reporting tool for his personal life. Knowingly or unknowingly he ended up building  a personal datawarehouse, which he ended up doing so. Facebook is building a personal datawarehouse, which is answering some of our queries on entertainment habitat. 

COMMENTS:

I received an interesting email from last week posting
- Our brain is the best data warehouse and our senses are the input. We have to be very careful while interacting with brain.
There has been a misunderstanding; we are not contemplating neuroscience here. The application we are talking is an external product which helps us to analyse the data set externally, like we do for the business. To be philosophical, the data that we are trying capturing in this application is captured by our subconscious mind. It takes a long time for us to recognise what is there in the subconscious mind and it is difficult to analyse it objectively. Our application helps us to analyse objectively and on demand.

Next week let us see how we can build such an application...

Thursday 12 January 2012

Data warehousing for human Life - Contd 4

I was asking my friends,” we were busy in building application for tuning other business to perfection, by using the various technologies (Datawarehousing, Business Intelligence, Data mining, Data Analytics, Business planning etc... ) but why we never thought of something that is required to tune our personal life.”

Some of them said,"Yes, I agree having such application will give a better result".
Some said,"it is good to have".
Some said,"do we require it? I’m good at taking my own decision".
Some said," what I will get tangibly …."

The outcome I had after a series of conversation with my friends is

We do this sort of application for tuning business, because the results we achieve are explicit and noticed by others. There was no anonymity and most important people are rewarded tangibly (money, promotion, new car, house etc…).
The changes received by having “DWH for human life” application are Implicit. It is only for individuals to see and feel. There is anonymity.

For example:
No one can force individuals to follow exercise schedule, nor can the individuals see a tangible reward like getting money or a promotion (excluding body building competition) .It is for the individuals to feel the strength he receives and the healthy life he lives. If he feels the intangible rewards (Confidence with in him, healthy life etc…), then he will follow the schedule.

The same is true for the application that we are discussing. This application will provide implicit benefit for people but it has to be felt by the individuals to use it. There are individuals who track their daily habit, sense out the pattern and tune their life accordingly.
  • Some achieve this by writing diary.
  • Some does this for their financial progress by analyzing their bank balance, savings, loans, assets, liabilities etc….
  • Some does this only for their educational progress, by analyzing the grades, white papers, no of degrees.
  • Some does this only for their health progress, by analyzing their weight, height and the medical expenditure.
But we are not having an overall application that covers most of the aspects of life that can show the correlation between them. The application we are talking here will be able to provide this for us.
Most of us take things for granted. We take decision on our self more subjectively. This application will help us to take decision objectively.

We may not be getting any monetary benefits by using this application directly.But using this application and tuning our activities accordingly will provide more monetary benefit than ever. This will provide more indirect tangible benefits.

When we build a house the person who is benefited directly is the contractor. But the people who are benefited indirectly is count less.From masonry, brick maker, steel rod maker, cement factory worker, utilities transporter, carpenter, etc…
Similarly we may not get direct monetary benefit ay this application but once we start using this we can tune our life to get wealth that is beyond our imagination.

Any other views on this are welcome.

Thursday 5 January 2012

Data warehousing for human Life - Contd 3


Hope every one had a good “NEW YEAR CELEBRATION”

While interacting about DWH for human beings, one of my colleague asked me a question, 'what is the use of having such an application?'
I asked him, have you ever wondered what our parents concerns were while we are growing and on same term, what are our concerns about our kids?

He came back with a list,
  1. What is my kid’s educational interest?
  2. What is my kid’s sports interest?
  3. What is my kid’s health condition?
  4. Is there any change in the attitude that I have to concern for?
  5. How much I have to earn to make him to go for his interest? ………
I said, ‘all these questions can be answered by having a DWH for human being or ourselves’
The application may not be providing answers, but it will provide necessary insight for us to find the answers.
Let us see how it can help  

What is my Kids educational Interest? Today what we do to answer this question is
  1. Just ask our kids to go to school and let the school identify or report to us, what is their interest?
  2. Some parents even force the kids to accept, what is available & affordable to them.
  3. Some parents spend their time in observing the kids activities and conclude themselves that he is interested in some field. Some time this approach gives better result some times don’t
  4. Some parents ask or discuss with their close friends / relation to finalise the kid’s interest.
  5. In western countries, most parents leave the kids to explore the world and find their own interest.

I’m not saying the above methods are wrong. My point is what is the probability of identifying the kid’s interest correctly by following the above said methods? Hardly a 50% success rate.
What we are talking here is we are trying for an application which will help us to achieve to an 80% success rate.

Scientific researchers had proved that,’ the kid’s interest can be identified by closely monitoring their activities from age of 6 to 14’.

Consider each parents spends 5 min daily to enter the child’s activities and reaction on various activities into this application (like the time spent on drawing, time spent on reading story books, time spent on dancing or physical activities, his energy level in doing various activities, his reaction to various activated suggested by us etc… (This list for illustration purpose only)). Down the line in 2 years we will have a pattern for our kid’s in our hand. This will guide us to find our kids interest. We can plan accordingly and help them in their growth.

What we do today is taking a slow and single dimensional approach and by the time we find their interest the precious time is lost or we have already invested in the field which our kid’s hate.
The only thing we  require is to have discipline in updating the application regularly for the data to be built. Once we do that, we can analyse how the kids are performing. Even we can compare between other's who are having similar characteristics, across the world. Even we can collaborate this analysis with various parties (Schools, Doctors, Friends, Elder parents etc…) to find where our kid’s are moving towards. This application can be used from toddlers to old people. Some times it helps us to unlock the inherent strength of our kids for that case even ourselves.

As it has been proven for ages there will be signs before anything happens in the world. Like the animals migrate, before the change of season or before a major catastrophic (earth quake, volcano etc…). There are signs in our life too. Only we have to do is, look for signs. This application helps us to notice those signs before it is too late.

At the end of our conversation he said “Let us figure out how we can achieve this ….”

While continuing my thoughts I came across one organization that does a data warehousing (DWH) for human being. That is 100plus.com
What they do is they use the data warehousing technology and their analytics process to analyse our individual health input and come out with a better prediction of our health in the near future.
We are seeing that this technology (DWH for humans) is taking pace in the world. Hope my friends 3rd concern (What is my kid’s health condition) about kids is taken care by 100plus.com currently.