This is a guest post by Russell Jurney, a technologist and serial entrepreneur. His new startup, Cloud Stenography, will launch later this year. The article is an extension of a simple question on Twitter asking the importance of Map Reduce. Some subjects take much more than 140 characters.
The Technical Situation in Brief
The advent of the personal computer and the Visicalc
spreadsheet were the foundation for a revolution in computing, business
and life whereby normal people could carry out sophisticated
accounting, analysis and forecasting to inform their decisions to
arrive at more positive outcomes. As Moore’s law
has progressed and processors have become faster, and computers
inter-networked, large volumes of highly granular data have been
collected. Analysis of terabyte datasets on the same level as a
spreadsheet has been limited by the disparity of acceleration between
processor speed and computer I/O (input/output) operations. Intel has
produced ever faster processor clock speeds without accompanying disk,
RAM or bus speeds. Put simply: We have cheap and numerous computing
resources and abundant data, but bringing those resources to bear on
that data to generate real value from it has proven exceedingly
difficult.
The widespread use of relational databases
to access data in pre-defined static relationships has also limited our
ability to discover and infer new and unique relationships among data.
Dynamic analysis of large volumes of data in relational databases
requires exhaustive pre-calculation of indexes and summaries of data
for each relationship, and scaling relational databases to handle large
datasets is a complex, painful and expensive process. As a result
business intelligence systems relying on relational databases are
prohibitively complex and expensive. Other methods of raw parallel computation, such as MPI,
were exceedingly difficult. Such ‘smart kid only’ technologies have
significant barriers of entry for mere mortals. In fact,
multi-threaded, shared-memory computation in languages like C++ are
considered some of the most difficult, arcane areas of computer
science, leading to entire languages aimed at making concurrency easier.
MapReduce As the Way Forward
In order to extract value from large piles of data, we must escape
the bounds of IO by going parallel and having many processors work on
the data at once, without grinding our development to a halt dealing
with complex algorithms and frameworks. MapReduce and platforms
that implement it satisfy this requirement for a surprisingly broad set
of problems. MapReduce is a simple way to process data in parallel
among many commodity machines. You are already familiar with the power of MapReduce in your daily use of it - it is the pattern pioneered by Google to bring you the effective search on which we now all depend.
MapReduce is the design pattern
that in combination with recent developments in cloud computing and
cheap, plentiful broadband will bring us spreadsheet-style analysis of
vast amounts of data ill suited to traditional database management
systems in both scale and structure. MapReduce offers a cost-effective
way for any business to harness massive amounts of computational power
in the cloud for short periods of time to perform complex computations
on large volumes of data that would be prohibitively expensive and time
consuming on an individual machine, or that would require the
construction of a data center to handle.
The Business Impact
What does this mean for your business? Knowledge of MapReduce has
spread beyond Google, and it is now used by an increasing number of
companies to extract value from web-scale data. Facebook, Yahoo, Cloudera and many others have embraced MapReduce in the form of Apache Hadoop, the platform around which most open discussion of MapReduce has occurred. As a result, a new generation of startups
is rising that will take advantage of MapReduce to bring the same power
that google pioneered on search to bear on a variety of datasets. New
opportunities exist by ‘thinking big’ and extracting value from
ever-increasing streams and volumes of data.
Example 1: Proving Global Warming
What does this really mean? It means that developers will have a
clear way to reduce vast datasets to scales they can work with to
extract information to inform your decisions. In this example from Cloudera, Hadoop and Pig are used to query a 138GB log of weather history for the last 100 years from the National Climatic Data Center to reduce that vast data to a scale the developer is comfortable working with. The result is this chart:
As a pile of data, the NCDC log informs nothing. When queried via map/reduce using Hadoop and Pig,
we arrive at an informative chart that shows us an important trend.
Would that chart inform a discussion about global warming? If you could
get such clear visualizations about every minutiae of your business
critical to your success, would it inform your decisions? Can you log
and mine more data to streamline your operations?
Example 2: A Supercomputer for Every Biologist
When AmazonS3, EC2 and MapReduce via Hadoop are applied to the RMAP algorithm of genetic analysis, thanks to the work of one grad student, the result is a point-click supercomputer for every biologist that wants one in the form of Cloudburst for Amazon Elastic Map Reduce.
Now any biologist that wants a supercomputer for this kind of genetic
analysis can have one by the hour, and its as easy as point-click. More
map/reduce genetic analysis algorithms are sure to follow. That's
revolutionary.
Conclusion
We are constrained in our strategies by what we imagine possible.
MapReduce and cloud computing open broad possibilities and business
opportunities by placing a usable supercomputer by the hour in the
hands of every startup that wants one. There is no problem which you
lack the processing power to solve, its just a question of whether the
hourly cost is profitable. That's a profound change from being bound to
one machine. As a result of this shift, smaller companies can attack
'bigger' problems without a large up-front investment in hardware or
software infrastructure.
A new renaissance in computing is coming that will be comparable to
the business adoption of the personal computer and VisiCalc, and
MapReduce will drive it.
LMRC is right more often then other ranking methods and effective at sorting out the top teams in the later rounds. Here is Sokol's and Nemhauser's presentation that highlights the power of the methodology.
I used LMRC to pick my Final Four this year. Louisville, Memphis, Pittsburgh, and North Carolina were the result. Below is a pure play LMRC bracket up to the Final Four. LMRC puts Memphis and North Carolina in the final with the Tar Heels prevailing... shudder the thought. So I went had to go with my heart and my hope once LMRC delivered the Final Four.
Having been weaned on Louisville basketball at Freedom Hall, I took the Cards over Memphis in the semis and again over North Carolina in the final. 76 - 68. Terrance Williams gets the most outstanding player.
What is Logistic Regression Markov Chain (LMRC) you may ask? It is a tool that can be used to help with selecting and seeding the NCAA Tournament field. Or if the NCAA does not want to use it for that perhaps you can use it to win your office pool.
According to this article LRMC has predicted 30 of the past 36 Final Four teams correctly. That's pretty impressive. This year it predicted both the Final Four and Kansas as champion of the tournament.
Something to keep in mind when putting together your brackets next March. And I am looking forward to next year. On a Sportscenter broadcast Dicky V had the Louisville Cardinals coming out of the gate as number three.
I bought a MacBook Air. Perhaps not the most rational decision in the world. But Paul Stamatiou (his review) walking into a meeting with one pushed me over the edge. I purchased a gently used 1.6 GHz model on eBay for less than $1,500.
I asked my readers if I should get a MacBook Pro or the Air when it was launched. Surprisingly, most of the hackers who responded suggested getting the Macbook. The plain MacBook sans the Pro. And it is indeed has the best performance/price ratio. That would be the most rational choice for many.
But an equal number of people suggested I go with the Air. My friend Joe Reger opined; "I've seen you out and about with a smaller laptop and have to believe
that the portability and form factor is important to you." Joe knows me well. What it came down to is I did not want to increase the weight and footprint of what I carry around all the time. Given that I have a primary machine that I use for the heavy lifting of photo editing and mostly use a laptop for Internet communications, web apps, and Office, the Air has plenty of juice for me.
While I was staying up late last night contemplating if the MacBook Air was worthy, I was also watching/listening the Steve Job's keynote address. If you don't have the 90 minutes to spare this clip by Veronica Belmont hits all the highlights in about a minute.
I purchased my PowerBook G4 1.0 12" back in 2004 after it was discontinued in April of that year. It's about the size of an 8.5 by 11 sheet of paper and easily weighs less then five pounds. It was packed with everything I wanted in a laptop. Two USB ports, a FireWire port, a modem port, Ethernet port, WiFI, and Bluetooth. At the time the 40GB hard drive seemed huge for an ultra-portable. It was a deal at about $1,200. Somewhere along the line I maxed out the DRAM at 1.25GB. Great machine. It is my seventh laptop and the best one I have ever owned. It literally is my notebook. I carry it almost everyhere.
But that 40GB drive and 1.25 RAM are starting to get a little long in the tooth. So I have been waiting. And waiting. Waiting until today, when Apple would introduce its new product lineup. Let me tell you, I have been jonesing for a new laptop since last summer. Have the money set aside to buy it. I was hoping, really really hoping that Steveo would introduce the worthy replacement to my little friendly G4.
And I when I first laid eyes on the Air I thought that he did. It is a stunning, stunning piece of technology. Deserves a spot in MOMA. It is beautiful. Fake Steve would be proud.
But is it functional?
Some folks have claimed that it's a little expensive, but the price point does not bother me.
Some folks have claimed that it is underpowered, but the specs do not bother me (I have a 2.4GHz 4GB memory iMac for video/photo editing and game on a console).
What bothers me is that to make the Air beautiful they removed all the holes and seams. It needs holes and seams. No Ethernet port? One USB port(which is claimed by the Ethernet adopter they sell for $29)(and a big BTW, I have never understood the whole Apple dongle thing, Steve must only like certain types of holes)? And battery that cannot be replaced by the user.
Yes, the Air comes with a battery that cannot be replaced by the user. Perhaps another Apple first. The cost of the battery is not that important. I am on my third battery on the G4. They routinely last about a year. No big deal. Go online order a new one when it gets a little weak. Have a spare. Take it on a trip. But to have to take the machine into Apple and get it replaced. Don't know the details yet, but it sounds like I am without a laptop for a few days. Not acceptable.
So here I sit. With money to buy a new machine. A little disappointed. Not knowing what to do. The Air is a lacking in very important features. The Pro is a larger form factor than a prefer.
So that begs the question Air or Pro? What would you do?
Dear Lance,
Thank you for being a member of .Mac! Your .Mac membership is set to renew on September 27, 2007 PDT. Your credit card will be charged the day before your .Mac membership anniversary date and your account will renew for another year.
As you know, we've been enhancing .Mac to make your connected life even easier. .Mac makes it easy to share what you create with iLife '08 and publish with iWeb - Photocasts, blogs, podcasts, and other web pages. Your .Mac account includes 10 GB of combined iDisk and email storage, with options to upgrade to more. Your iDisk is accessible directly from the Mac OS X Finder and from a browser on any Internet-connected Mac or Windows PC. Backup 3 makes safeguarding your valuable files easy and convenient with features including one-step backup of photos, movies, and music. .Mac Groups lets you bring the groups you belong to online to share messages, calendars, group files and web pages. And .Mac provides a steady stream of discounts and member benefits on Mac-related software and services. Be assured, there's more to come in the year ahead.
Please take a minute to review your account settings. If you want to change any of the details regarding your account, click here to update your Renewal Settings first.
I want to cancel my account but can not do so via the "click here" link. iTarded.
Why do I want to cancel. "Because it is not possible to simply change your .Mac member name." iTarded.
"If you activated or renewed your .Mac account within the past 30 days through the .Mac website with a credit card, you can cancel your .Mac account for a prorated refund." So what exactly I do when you send me an email with no option to cancel? Let them bill me and then cancel? iTarded.
There is a big big difference between being a hardware company and a service provider. Not sure if Apple gets that just yet.
The free iPhone that I tweeted about last week arrived yesterday. The box is so beautiful that I might just have to look at it for awhile. And I might as well because I have to figure out some things before I can activate it.
I have to come out of the closet and admit that I am still running OS X 10.3 on my PowerBook G4. Yes, it is a bit long in the tooth. But I did not want to upgrade the OS when I heard of heating problems from others when they did so.
But the point is OS X 10.4.10 is a system requirement to use the iPhone. I have an OS that is just over two years EOL and can't run an iPhone on it. Forget the fact I need a computer to use a phone, how iTarded is that? Well about as much as calling a new computing platform a phone.
This line of thought leads to Apple's sloppy branding, which others have noted. Panther, Tiger, Leopard. Who has the time or inclination to remember point release associations? And yes, we are talking about something that is defined as a point release preventing me from using the iPhone. Which is not really a phone, I mean after nearly two months on the market I have yet to see someone actually talking on the thing. But I digress.
Can you believe that Apple has much better support of Microsoft's OS then their own? I can sync the iPhone with my seven year old XP box currently running as a backup server but can not use a two year old Apple OS X on a laptop I use everyday. Why?
An obvious forced upgrade. So I need to either go out and buy a whole new machine (which I want anyway but am waiting on the next upgrade cycle including OS X 10.5). Or go out and buy an OS that is going to be retired in less then three months.
It seems my options are to choose one of the above, take it to the Apple store for activation (and in which case it won't sync, which makes my Treo a preferred device), or wait. Any advice on how to activate with XP and then switch the phone to fully functional with OS X 10.3 or otherwise workaround?
Then again, the box is so beautiful that I might just have to look at it for awhile.
Unreal. I remember a time when PC Mag would not even write about Macs. Now they are not just covering them. PC Mag is giving Apple desktops and laptops Editors' Choice awards. Both the iMac and MacBook Pro have garnered the award and are sitting as top rated products on the PC mag site.
Guess that little Intel move was the right thing to do.
The opinions expressed here are mine and mine alone (with the exception of comments by others of course). They do not represent the opinion or position of any other person on entity. All postings adhere to my personal values.