A Parallel Processing Compute Framework

A Parallel Paradigm

A full 90 percent of all the data in the world has been generated over the last two years. Internet-based companies are awash with data that can be grouped and utilized.

– Excerpt from Science daily article from May 22, 2013

No doubt! Here are a few more facts to look at..

  1. As of July 2011, the Hubble Space Telescope has made over 1,000,000 observations. The 21 years’ worth of observations has produced nearly 50 terabytes of data and, the orbiting observatory generates more than 80 gigabytes of data each month.
  2. When the Sloan Digital Sky Survey began in 2000, its telescope in New Mexico collected more data in its first few weeks than had been amassed in the entire history of astronomy. Now, a decade later, its archive contains a whopping 140 terabytes of information.
  3. By 2011, U.S. healthcare organizations had generated 150 exabytes (150 billion gigabytes) of data. Kaiser Permanente alone might have as much as 44 petabytes of patient data just from its electronic health record (EHR) system.
  4. Everyday, it is estimated that about 2.5 quintillion bytes of data is generated

Banyan Cover Lineup

Sources are many – climate data, media, surveys, websites; but the fundamental requirements are the same. Restructuring this data and to do that, necessary processing capacity in terms of infrastructure and platform software.

To address the infrastructure requirements, Cloud has equipped us with necessary pay-as-you-go services to harness and process the data. But when it comes to platform software/frameworks, we need both distributed and parallel computing paradigms. For example, A word count in a novel is a distributed task. But, converting million color images to black and white is a massively parallel task.

While there are good number of distributed processing frameworks (Hadoop, Spark, Storm, Star cluster etc), we couldn’t find one easily consumable parallel processing framework that can be deployed and used readily.

So, we built a framework, Banyan (like a tree, self sustaining, branched out and support system in parallel) that is agile, cluster agnostic, massively parallel, shared nothing, reliable and fast. Our prime inspiration comes from our experience in using Star cluster built by MIT and matured as we had more challenging problems to solve.

We will be posting our success stories about Banyan in this blog, and how it can be a part of your success story in your business too. Follow us on LinkedIn to stay tuned.

Reach out to us by mailing to growbanyan [at]  


  1. which type of algorithm should be used in Banyan for processing the data sets?

    • shifu

      January 20, 2015 at 11:06 am

      Any kind of algorithm can be custom written and used in Banyan.
      Banyan is more of a framework which can run any unit code in parallel across multiple machines independently with out the over head required in distributed processing frameworks.

Leave a Reply

Your email address will not be published.


© 2017 BANYAN

Theme by Anders NorenUp ↑