A Parallel Paradigm
A full 90 percent of all the data in the world has been generated over the last two years. Internet-based companies are awash with data that can be grouped and utilized.
– Excerpt from Science daily article from May 22, 2013
No doubt! Here are a few more facts to look at..
- As of July 2011, the Hubble Space Telescope has made over 1,000,000 observations. The 21 years’ worth of observations has produced nearly 50 terabytes of data and, the orbiting observatory generates more than 80 gigabytes of data each month.
- When the Sloan Digital Sky Survey began in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy. Now, a decade later, its archive contains a whopping 140 terabytes of information.
- By 2011, U.S. healthcare organizations had generated 150 exabytes (150 billion gigabytes) of data. Kaiser Permanente alone might have as much as 44 petabytes of patient data just from its electronic health record (EHR) system.
- Everyday, it is estimated that about 2.5 quintillion bytes of data is generated
Sources are many – climate data, media, surveys, websites; but the fundamental requirements are the same. Restructuring this data and to do that, necessary processing capacity in terms of infrastructure and platform software.
To address the infrastructure requirements, Cloud has equipped us with necessary pay-as-you-go services to harness and process the data. But when it comes to platform software/frameworks, we need both distributed and parallel computing paradigms. For example, A word count in a novel is a distributed task. But, converting million color images to black and white is a massively parallel task.
While there are good number of distributed processing frameworks (Hadoop, Spark, Storm, Star cluster etc), we couldn’t find one easily consumable parallel processing framework that can be deployed and used readily.
So, we built a framework, Banyan (like a tree, self sustaining, branched out and support system in parallel) that is agile, cluster agnostic, massively parallel, shared nothing, reliable and fast. Our prime inspiration comes from our experience in using Star cluster built by MIT and matured as we had more challenging problems to solve.
We will be posting our success stories about Banyan in this blog, and how it can be a part of your success story in your business too. Follow us on LinkedIn to stay tuned.
Reach out to us by mailing to growbanyan [at] latentview.com