All,
Hope the last post had provided you with an understanding of what is big
data and what it can do. let us see say what are the data problems we may face
to move ahead in this new technology.
The major problem in Big data Implementation are
what is unstructured or semi structure data?
In simple terms any data elements that can be stored in rows and columns in a
database are called structured data. If it can't be stored in the rows/columns
and to be stored as BLOB's (Binary Large Objects) they are called as
unstructured or semi structured data.
(Note: Yet the science could not clearly define the unstructured or semi structured data. But this is the base line which the science group is working upon).
From our personal banking credit card division example
Structured data : credit card details like card type, interest
rate, benefits , maximum limit etc...
Un / Semi structured data : search parameters on bank website, email to bank representative, blogs on other websites etc...
If we like to have answer to the following
These questions can be answered by querying the structured data with specific
inputs .From technical stand point we were able to retrieve the information
directly by writing simple queries.
Let us consider a scenario in unstructured data that we want to analyze "how
many people searched / looked for credit cards with maximum limit of 5000 $ ?"
and let us consider these are the search parameters that has been done at our
website
First problem is understanding the unstructured data.
How can we conclude that the searches are made for credit cards of limit 5000 dollars ?
Example :
If we ignore this data understanding problem and consider all the searches
were looking for credit cards having limit of 5000 dollars.
what will be my search parameters ? How the typical query has to be structured ? etc...
currently we have to make lot of assumptions to derive an information from un / semi structured data.
One approach that I can think of to tackle this problem is, to capture the
metadata of the search. By co-relating the search parameters with the metadata
of the search we can come to certain conclusion.
Ex: which page the search was made ?
If the search parameter were made on credit card page then we can come to conclusion the user is looking for credit cards.
On this approach also, How to correlate the metadata with the actual data element captured from the user is another problem ?
Conclusion:
Big data is like an gold mine. we will have to process huge set of data to get useful information for the business to help them for decision making.Yet this technology is in its infant stage.By the growth of could computing, parallel processing technologies BIG data will be a reality in near future. The useful ness of BIG data is highly seen in the field of personal business intelligence, Health care industry, marketing industry.
"To get one ounce gold we have to process 33 tons of rock, same goes to big data"
- Processing the un-structured and semi-structured data
- Deciphering the information form unstructured or semi structured data
(Note: Yet the science could not clearly define the unstructured or semi structured data. But this is the base line which the science group is working upon).
Un / Semi structured data : search parameters on bank website, email to bank representative, blogs on other websites etc...
- what are the credit card types having interest rates of 19% pa?
- what are the credit card types having minimum card limit of 5000 $? etc...
- card limit 5k
- card limit 5000
- card limit 5000 cad
- card limit five thousand
- card limit five thousand canadian dollars
- limit five thousand dollars
- credit cards + 5000
- 5000
How can we conclude that the searches are made for credit cards of limit 5000 dollars ?
Example :
- Search parameter no 6 ("limit five thousand dollars") , the user may be searching for the saving account where minimum balance should be 5000 limit or the user may be looking for investments with limit of 5000.
- Search parameter no 8 ("5000") this parameter is too vague to co-relate it to the credit card
what will be my search parameters ? How the typical query has to be structured ? etc...
currently we have to make lot of assumptions to derive an information from un / semi structured data.
Ex: which page the search was made ?
If the search parameter were made on credit card page then we can come to conclusion the user is looking for credit cards.
On this approach also, How to correlate the metadata with the actual data element captured from the user is another problem ?
Big data is like an gold mine. we will have to process huge set of data to get useful information for the business to help them for decision making.Yet this technology is in its infant stage.By the growth of could computing, parallel processing technologies BIG data will be a reality in near future. The useful ness of BIG data is highly seen in the field of personal business intelligence, Health care industry, marketing industry.
"To get one ounce gold we have to process 33 tons of rock, same goes to big data"