Welcome!
Welcome to my new blog! I’ve started this to document my research as I delve more into cloud computing. I’m current on a major project with the government, gaining many insights to cloud infrastructure...
View ArticlePairs and Stripes
When reading Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer, I became quite confused by their description of the “pairs” and “stripes”. I understand it now, and would like to...
View ArticleThe “Hello World” of Hadoop
In the world of MapReduce, the most common introduction program is to perform a WordCount. This is the processes of going through a corpus of data and counting the instances of the words found. I’d...
View ArticleSimple Hadoop Overview
As per the Hadoop website: Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up...
View ArticleMapReduce Overview
MapReduce is a processing paradigm for distributed processing of massive amounts of data in a cloud environment. It is not limited to Hadoop, it is a theory, or design methodology. That said, my...
View ArticleCounty Housing Search
I recently wrote a job to identify the counties with the most houses for sale (according to the 2010 Census). To do so, I ingested the data from the Census Bureau, and wrote a MapReduce job. My goal...
View ArticleHBase Installation
This morning I installed HBase onto my vm. I’ve dabbled in it before, but have my sights set on gaining greater experience in web applications and data access supported by HBase. Prerequisites I’m...
View Article
More Pages to Explore .....