Pages

Friday, 25 July 2014

Do you know what is big data??

           Big data is a collection of large volume of unstructured data.It is so large that it becomes difficult for the traditional database systems to process the data.
3V's model is used to define big data which means

  • high volume
  • high velocity
  • high variety
To manage big data we use Hadoop.
Apache Hadoop is open source framework used for storing and processing big data.
It implements distributed file system concept to process the data.
For configuring and setting up Hadoop on your computer, you need to install Linux OS and install Hadoop on it.
During setup, you need to edit the following files:
1) core-site.xml
2) mapred-site.xml
3) hdfs-site.xml
Apache hive is the datawarehouse infrastructure built on top of Hadoop for query summarization and analysis.
HiveQL queries are used to query on hive database.


2 comments: