Would you pay to sleep under my kitchen table? No? Why not?
A year after the brief success of Airbnb during the 2007 Industrial Design Conference, Chesky and Gebbia were $40,000 in debt and looking for a way to bail themselves out. An opportunity arose in the form of the quadruple over-attended 2008 Democratic National Convention in Colorado. The traffic pulled by the event, combined with an equally unorthodox marketing stint involving selling their own hand-crafted boxes of “Obama O’s” and “Cap’n McCain’s,” managed to pull them out of their debt. Airbnb was on its feet.
Still, despite their success during the DNC, the company was met with a lot of criticism. Like many other startups, Airbnb was posted on the developer community site Hacker News. The title of the thread was “Sleep under my kitchen table at Inauguration” and the responses of the HN community were, for the most part, sceptical and dismissive. The thread only had a meagre 25 comments and the overall consensus was that the service posed a safety risk to homeowners.
But Hacker News was wrong about Airbnb. There is no ignoring how Airbnb’s popularity has exploded since then. It’s predicted that this year Airbnb will cross the $20 billion mark–meaning people generally don’t have any aversions to strangers occupying their homes for a weekend after all.
Many developers turn to sites like Hacker News to test the waters and get feedback on their beta products. The results are often mixed, and an onslaught of negative reviews can discourage anyone. But should startups be discouraged by negative early reviews? Should startups be heed the wisdom of a specialized crowd in predicting the success of their businesses?
The wisdom of the crowd
Is a crowd wiser than an individual? The wisdom of the crowd theory, explored by James Surowiecki in his book The Wisdom of the Crowds, makes the argument that the collective opinion of a crowd is more accurate than the opinion of an individual. A classic experiment testing the theory is to have individuals look at a jar full of jellybeans and guess how many are inside. The individual guesses range from close to wildly off-base but when their guesses are averaged, the result is startlingly accurate.
In a very basic way, this concept can be applied to testing the market for a new product or service. Startups looking to gauge their prospective market can draw on the crowd’s collective responses in order to determine whether the reception of their product will be generally positive or negative.
Many companies have used Delphi panels to try and predict product revenues. This involves asking a specialized panel to make predictions about a product or market trend. The effectiveness of this method lies in the anonymity of the responses, meaning that panelists can offer their honest opinions without needing to worry about the repercussions. But what happens when opinions are shared publicly for other “panelists” to see?
Hacker News, a specialized crowd
News websites are a great example that showcases the crowd theory in action. We decided to use Hacker News as a case study to test the wisdom of the crowd theory. We chose HN for a couple of reasons. First, because it’s a perfect example of a specialized audience offering their opinions of a product in one aggregated place. Developers use HN as a place to post betas of their products, or stories about newly released products, so that the savvy community can weigh in with their criticisms, questions and predictions. Second, because archived threads allow us to compare the community’s initial responses to a product with the current success of each company.
The threads we decided to examine were all for companies that have become generally successful since their original HN posts: Dropbox, Airbnb, Codeacademy, Instacart, Quora, Meteor, Zenefits, Stripe, Heroku. The only exception is the recently deceased Homejoy, which scaled quickly but ultimately failed.
We chose posts that were either a “Show HN” post by the founders, a launch post, or something along the same lines, like an article from Techcrunch. Whenever there were similar posts, we picked the earliest post with the most points and comments. Here is the list of all the posts that were used in this analysis:
|My YC app: Dropbox – Throw away your USB drive (getdropbox.com)||https://news.ycombinator.com/item?id=8863||Dropbox|
|Pathjoy (YC S10) Offers Affordable Housecleaning With Easy Web Booking (techcrunch.com)||https://news.ycombinator.com/item?id=4660834||Homejoy (then Pathjoy)|
|Show HN: Review my startup: Quora, a topic based question and answer site||https://news.ycombinator.com/item?id=1197146||Quora|
|Show HN: Codecademy.com, the easiest way to learn to code||https://news.ycombinator.com/item?id=2901156||Codeacademy|
|Sleep under my kitchen table at Inauguration (airbedandbreakfast.com)||https://news.ycombinator.com/item?id=426120||Airbnb|
|Instacart (YC S12) wants to be Amazon with 1 hour delivery||https://news.ycombinator.com/item?id=4325317||Instacart|
|Zenefits (YC W13) Gives Startups A One-Stop Shop For Employee Benefits||https://news.ycombinator.com/item?id=5242681||Zenefits|
|Stealth Payment Startup Stripe Backed By PayPal Founders (techcrunch.com)||https://news.ycombinator.com/item?id=2380911||Stripe|
|Innovative New Rails Host: Online IDE, Web Console, Instantly Live (heroku.com)||https://news.ycombinator.com/item?id=78069||Heroku|
We wanted to determine the percentage of positive comments versus the percentage of negative comments that each product/service received in their initial HN thread. We used a natural language algorithm (Jacob Perkin’s Sentiment Analysis with Python NLTK) to categorize the first 100 comments in each thread automatically as either Positive, Negative or Neutral. Then, in order to obtain a smaller and more concentrated sample for analysis, we manually categorized the top comment for each thread.
With the exception of Homejoy and Dropbox, the majority of the sentiments towards each company were virtually the same in both the top comment and the first 100 comments in the thread. In the case of Homejoy, the top comment was positive but the entire thread (25 comments total) was 52% negative, while Dropbox had a negative top comment but 62% positive comments in the entire thread (62 comments total).
So we are going by the assertion that the top comments, having the most points and being the most replied to, reflect the collective opinion of the HN crowd. (We’re not entirely certain how HN’s complex comment ranking system works, but we assume the algorithm places the most upvoted and most responded-to comment first in the thread.)
We examined the comments in all of the threads combined to determine whether responses on Hacker News tended towards positivity or negativity. When looking at the total comments and the top comments in each thread, the ratio of positive to negative to neutral sentiments was virtually the same in both samples.
What we found was that the comments were generally negative and that the top comment tended to reflect this sentiment. About half of the total comments in each thread were negative, about a third were positive, and the rest were neutral or tangential and off-topic. There appeared to be a crowd bias towards the more tech-savvy services like Codeacademy and Dropbox, which were met with a majority of positive reviews, while more general services like Airbnb, Quora, and Instacart were met with primarily negative reviews.
Meteor had the most comments in its thread: 322. Of the first 100 comments, 22% were positive, 58% were negative, and 20% were neutral.
Many were concerned that the tool would place too many restraints on what they do could. There were several tangential threads about whether or not Meteor could actually be considered revolutionary, with multiple commenters comparing it to Ruby on Rails.
There were many commenters waxing poetic about the days-gone-by of from-scratch programming.
But despite the apprehension of early reviewers, Meteor has seen success, with MDG recently revealing plans to expand their services further. Meteor’s structured design ended up being one of its assets, with users appreciating its friendliness towards new developers and smart packages.
On the other end of the spectrum, Codeacademy was met with a positive majority–responses that accurately predicted its success today. In 2014, the site had over 24 million users.
After Meteor, Codeacademy had the most comments in its thread–232–and 787 points. This is not a surprising response, since HN is a developer community. But where Meteor was criticized by many for being too constraining, too obscured by magic, Codeacademy was commended even in its beta form for its accessibility for beginners.
While many of the comments provided constructive criticism, the general consensus of the HN crowd was that there was value in the service and they wanted to see where it went.
Then there is the case of Homejoy (originally Pathjoy). In the original HN thread, the top comment was positive–however, the comments in the thread were generally negative. As many commenters predicted, Homejoy failed to turn a prophet. In the original HN thread, Homejoy received a moderately-sized response of 25 comments. The responses ranged from lukewarm to negative. The main issue that posters had with the service was that the low rates Homejoy touted would not be enough to keep their business afloat.
Ultimately, the problem ended up being one of both homeowner security and cost-efficiency. Homejoy’s services proved useful for getting homeowners in touch with pros for the first time, but once homeowners found a pro they trusted, they often hired them offline. This created a major leakage that negatively impacted the churn rate of their business. Also, Homejoy pushed for fast growth without strengthening their core business model; they hired on pros as independent contractors, rather than Homejoy employees. They couldn’t afford to accommodate the demands from pros for employee classification and the associated wage and benefits.
Three companies, three different responses on Hacker News, and three ultimate outcomes. The HN crowd correctly predicted the success of some companies while being completely off about others. What does this tell us about the wisdom of a specialized crowd?
When are specialized crowds wrong?
Certain companies have succeeded despite a sceptical specialized audience. Why is that? The first and most obvious reason is because there is simply not enough diversity in the audience for the wisdom of the crowd theory to hold true. Members of the HN crowd are likely to think along the same lines (with some exceptions, of course).
Aldo Matteuci provides a poignant, if not blunt, explanation for why a specialized audience might be wrong:
“Why are experts not that smart? Because experts tend to be and think alike, and thus do not reflect maximum diversity of opinions; they tend to be internally inconsistent and poor at calibrating their position – in short, they are overconfident. In a group they tend to decide by authority (group-think), which makes dissent within the group improbable – conformity and bias rather than challenge is the result. Finally, past performance is never a predictor of future success – they may have just been lucky.”
Because HN comments are posted publicly for others to read, there is the opportunity for new commenters to be influenced by the already existing comments in the thread. The more clout a poster has, the more influence they are likely to have over other commenters.
Paul Graham also provides some insight into why certain products and services are poorly received, despite their later success. He argues that when a startup presents a product or service that is a simple solution to a problem, presented in an informal way and offered in its beta form, the reception tends to be overwhelmingly negative. People are more likely to respond positively to complex solutions because they are more impressive.
The fact that Quora’s beta form used Facebook as the sole method of signing in was immediately the primary deterrent for a large portion of the crowd. Many felt that there were already too many applications that required Facebook logins, while others simply didn’t want to share their personal information.
In a public forum, one person’s negative comment can set off a chain of subsequent negative comments. In a community as large as Hacker News, there is always the possibility that a highly-ranked negative post will influence the responses of other posters–this is what Graham refers to as dilution, something that he worked to avoid in his creation of Hacker News.
This implies that the top comment carries a greater weight than other comments–or, at the very least, that this comment will definitely be read, while subsequent comments further down the page may not. A bias is created that could influence the opinions of other members of the crowd. Because many posters on HN covet their karma rankings, they may be less inclined to express an opinion that is against the majority, for fear of tarnishing their reputation within the community.
Why do some crowd favourites fail?
What about the cases where an audience loves a product and thinks it will succeed, only to see it tank?
Graham’s essay on why smart people have bad ideas may be ten years old, but the points he makes still ring true. The mistake that many startups make, he argues, is that they hinge their work on a cool idea while neglecting to give the necessary attention to the fact that startups are, well, businesses.
To quote Graham:
“Going into business is like a hang-glider launch: you’d better do it wholeheartedly, or not at all. The purpose of a company, and a startup especially, is to make money. You can’t have divided loyalties […] What I mean is, if you’re starting a company that will do something cool, the aim had better be to make money and maybe be cool, not to be cool and maybe make money.”
It’s understandable that developers would want to invest the majority of their time and energy into making an innovative, high quality product. Where many startups fail is in their efforts to market their product even while they’re working out the kinks.
Everpix is a primary example of a company that had a cool idea that users loved, but that still failed due to a lack of effective marketing. Resulting from Pierre-Olivier Latour and Kevin Quennesson’s desire for a more modern photo app than the outdated platforms like iPhoto or Lightroom, they built a photo app that was well-liked among users but ultimately failed at turning a profit. The reason for their failure towards the end of 2013 was, from a business standpoint, fairly straight-forward: not enough time spent on growth and expansion, a weak first pitch deck and a failure to pit themselves against their mammoth competitors.
The truth of a successful startup is a hard ethical pill for many developers to swallow: unless making money is your first priority, it’s probably not going to happen by accident.
So, taking all of this into consideration, should startups heed the wisdom of the crowd?
The HN crowd, or any expert community, don’t have any greater a chance of predicting the success of a startup than a more general crowd.
Where Codeacademy succeeded was in its identification of a gap in the market for an interactive online coding curriculum–and HN recognized that. However, HN did not recognize that what may have appeared too limiting a design to them would in fact prove to be one of Meteor’s major selling points. That being said, in the case of Homejoy, the HN crowd did identify the major factors that fed into the company’s failure: unsustainable pricing, lack of need for the service, and employment politics.
The HN crowd also tended to react negatively towards services that posed a perceived threat to user security. Companies like Instacart and Quora that required personal information, such as Facebook profiles, to access were met with apprehension, to flat-out rejection.
Certain companies also received more comments in their threads than others–indicating a greater interest–and these tended to be the more hardcore tech services.
Such a specialized community is a great place to get feedback on the technical details of your product, but not to validate your ideas. The responses from people in these communities tend towards scepticism to negativity, even for products they use.
The people you should be listening to are, first and foremost, users of your product–preferably, paying users. These are the people who are committed to your product and who have an investment in its success. Their subscription is the greatest indicator of their approval and their feedback is the greatest indicator of user experience. Maintaining a dialogue with your customers will always be more valuable than looking for affirmation from an outlying public.
It’s a tough market for businesses and without a careful and constant attention to best business practices, as well as creative solutions to problems that arise, a cool and cutting-edge product can still fail to turn a profit.
Edit: Here’s an infographic of our findings.
Are you interested in reading more of our original research? Check out our Pinterest study.