Had written a tweet storm(Linked below) on this topic earlier, this blog a longer version of the same and addresses some questions people raised
One problem that stumps many new PMs is measuring the impact of small features.
Eg: how do you justify any changes to orders screen, or making slight changes to the text in some obscure corner of product, or just informational changes .
What if you added small delight features like a fancy error message or a funny wait timer.
While you would LOVE to A/B test everything and see if these changes are positively affecting the core metric, unless you have a large number of users hitting those scenarios, you would not be able to get meaningful stats sig results.
Companies with billions of users can test even the minutest of things, eg Google allegedly tested 41 shades of blue , and Uber can test its custom fonts, but it’s unlikely that many companies will have the scale necessary to observe a stats sig result.
So how do you make a case for making those changes and if you do that, how do you measure these.
One way of course is to simply not measure the success (Or measure but do not expect). Think of these as paper cuts, its annoying but one won’t really kill you. The idea is that you go with what you deem is right and not get into analysis paralysis mode. You see a paper cut and fix it.
But relying too much on gut as two major disadvantages
Experience: Unfortunately developing a good “gut reaction” is a tricky problem and requires a lot of past experience to rely on. “Gut” is but a culmination of lot of data you have already seen
What is not measured is not rewarded: Unless you are running your own startup or have significant stake in one, you want to make sure you get recognised and rewarded for your work. If there is no way to measure something, there is no real way to recognise and reward it. All it may get you is a pat on the back . It is also deeply unsatisfying because you have NO idea if you are actually adding value
One simple way of measuring these changes is “constant holdback”
Whenever you roll out such an experiment, instead of rolling it out to 100% of users, roll it out to only 95% of your users. Do the same for all other small changes, with the same 5% always excluded.
As experiments pile up, the effect of multiple paper cuts being fixed would start showing up .
You would potentially be able to see the holdback group having meaningfully different(worse) metrics than the rest of the group..
But how do you know which experiment worked:
That is the point, you potentially will not. You will see cumulative effects and not specific.
What if some experiments are actually harmful
The idea is not to have all super positive experiments but to know if you have been directionally right. The trick is to pick mundane obvious changes that you need to do rather than using this for large features. If you are directionally right, you could eventually see a nice bump.
Can I do this for large new features
If you can do control vs treatment A/B test for any feature I would advise you to not use this holdback method as the primary means of testing. You do not want to have large experiments in the mix that can completely change the general direction of overall result
But I would argue that this hold back is useful even for large experiments once you have tested them . Post your A/B experiment you can think about a holdback for large features as well. The reason is that not all changes are plain additive. Eg: If you see your conversions go up by 2% via one feature and 3% via another, on long term its not necessary that overall it would be 5% . A constant holdback would tell you the cumulative effect of all the large changes you may have done
Do not forget to delete the holdback eventually so that all your users can see the same improved experience.
Been hearing a version of these questions a lot. Its there on twitter and sometimes even the Musks filings. There are three premises to this argument that I hope to address in this article
Premise 1: Twitter is using a non- standard metric
Premise 2: It is a bad metric
Premise 3: Twitter is not measuring it right
Premise 1: Twitter is using a non- standard metric
Let’s be clear, there is NO standard usage metric that you are supposed to report . The government mandates public companies disclose financial data in certain format, but not how a company measures usage. Every company decides what are the most important measures for it and reports them. Even the basic metric like what is a defined as “Active” can vary from company to company based on it’s footprints
Eg: Snap chat only looks at people who opened their app(Annual report)
We define a DAU as a registered Snapchat user who opens the Snapchat application at least once during a defined 24- hour period.”
From snapchat annual report
Whereas Pinterest(Annual report) accounts for all kinds of actions including web visits
We define a monthly active user as an authenticated Pinterest user who visits our website, opens our mobile application or interacts with Pinterest through one of our browser or site extensions, such as the Save button, at least once during the 30-day period ending on the date of measurement
From Pinterests annual report
Companies, especially large ones, routinely create their own combination metrics that make most sense to them.
Eg: You won’t care much about how many times Uber app was opened, you care about how many users actually took a trip.
Uber has it’s own metric called Monthly Active Platform consumer
Monthly Active Platform Consumers. MAPCs is the number of unique consumers who completed a Mobility or New Mobility ride or received a Delivery order on our platform at least once in a given month, averaged over each month in the quarter. While a unique consumer can use multiple product offerings on our platform in a given month, that unique consumer is counted as only one MAPC. We use MAPCs to assess the adoption of our platform and frequency of transactions, which are key factors in our penetration of the countries in which we operate.
From Uber’s annual report
Similarly Facebook, which is a direct competitor of twitter, is also introducing a new metric called Daily Active People (Annual report)
Family metrics represent our estimates of the number of unique people using at least one of Facebook,Instagram, Messenger, and WhatsApp
From Facebook’s annual report
Premise 2: It is a BAD metric
In short: Twitter’s mDAU metric is the number of people who it can show ad to . Twitter removes potentially suspected bots, spam accounts, and also accounts only posting via APIs etc from mDAU count(See more details)
This is an ABSOLUTELY GOLD metric. If you are in the business of selling ads, making sure you only show ads to real humans is an extremely important measure.
Every company would have some measure of this. Twitter just chose to disclose this and make this their key metric. It is a strong signal that they are in the business of selling Ads.
Twitter can absolutely disclose and measure overall spam accounts, but that does not take away the validity of the mDAU metric.
I also keep hearing that twitter removes Bots and SPAM from it’s calculations of mDAU, do you know who else does that ? Pinterest . Here is a direct quote from their annual report
We regularly deactivate false, spam and malicious automation accounts that violate our terms of service, and exclude these users from the calculation of our MAU metrics;
From Pinterest’s annual report
Pinterests Daily Active user seems functionally equivalent to twitters monetisable Daily active users. I see nothing inherently bad in this metric.
Another point to note: Facebook and Snapchat seem to not take out spam accounts from their Daily active user count. Facebook does mention how many active users it suspects to be spam, but I did not find anything related to that in Snapchat’s filings.
There is no consistency or rule on what is a”Daily active user”
Premise 3: Twitter is not measuring it right
Here is the process twitter follows, it has a bunch of AI / ML algos that automatically remove as many accounts as possible that. it suspects are spam . Twitter than samples 100 accounts per day(9000/quarter) from the rest of the accounts and have manual reviewers rate if these accounts were SPAM or not(Triple checked I read somewhere).
They do this everyday to get a trend and find that approx number of spam users that pass through their filters is <5%.
There are three objections I hear in this regard
Objection 1: The Sample size is too small
Statistical significance is not related too much to sample size, but rather to sample selection. Typically a sample size of 100 can give great representative results for a large population, twitter is doing 9000(over a quarter).
Eg: CNN did 2020 presidential election exit poll with a sample size of just 15,590
Objection 2: They do manual review
Of course they do. They already used their AI/ ML algos to filter out all spam they could and now manual is the last step. Infact, even facebook uses manual reviewers to tag spam
Facebook defines them as “Violating accounts” (Another non standardised name)
We define “violating” accounts as accounts which we believe are intended to be used for purposes that violate our terms of service, including bots and spam.
From meta’s annual report
It goes on to explain how they determine if an account is violating
Such estimation is based on an internal review of a limited sample of accounts, and we apply significant judgment in making this determination. For example, we look for account information and behaviors associated with Facebook and Instagram accounts that appear to be inauthentic to the reviewers
From meta’s annual report
Objection 3: it can’t be 5%
Do you know how much TOTAL spam facebook claims it has? 3%. NO Kidding
we estimated that approximately 3% of our worldwide MAP consisted solely of violating accounts”
From meta’s annual report
Assuming there is no lying, if facebook can have only 3% of all its users as SPAM, twitter’s 5% SPAM after SPAM filters does not seem off.
I do suspect though that I may have missed something and facebook also removes suspected SPAM accounts before calculating “Violating accounts”, which makes it potentially functionally equivalent to mDAU of twitter
Also remember, mDAU is NOT a user facing metric.Your own experience is immaterial. Twitter can have 20% SPAM and still have only 5% SPAM in mDAU.
I will leave you with this diagram to chew on
So NO mDAU is not really that non standard, nor is it specifically bad , nor does it seem twitter’s revealed methodology is anything shady.
There can obviously be something deeply wrong with twitters count if they are hiding something, but I am unable to see any info about that.
Want to know about a real BAD metric that’s super popular and is also stated in annual reports? Read why NPS scores are useless
Ever since Elon musk raised concerns about spam accounts on twitter, tonnes of twitter experts , tech media, and “Social media analyst companies” have been talking about how twitter’s claim in its filing that less than 5% of it’s users are spam is wrong. How their estimates are much much higher
Only problem, Twitter did not exactly make that claim , and as usual the tech media decided to ignore that, deliberately I believe(more on that later in article) .
Lets first look at the filing that everyone keeps referring to Here is the exact line from the filing
The actual claim is
Average of false or spam accounts during the fourth quarter of 2021 represented fewer than 5% of our mDAU during the quarter.
Lets define the terms:
DAU: Daily Active user mDAU: Monetizable daily active user
The “m” is super important. So what is the difference? While I would love to think that it’s possibly industry specific terminology that most people do not get, twitter in its annual report actually defines for anyone who bothers to read.
We define mDAU as people, organizations, or other accounts who logged in or were otherwise authenticated and accessed Twitter on any given day through twitter.com, Twitter applications that are able to show ads, or paid Twitter products, including subscriptions
So what twitter is saying is that of the number of people who they could have shown Ads to , only 5% of them were SPAM as per their estimates.
This implies you will have to remove any accounts that tweet using systems where No ads can be shown.
For eg, its likely that you would not see an Ad if you used an API to post a tweet, and this may extend to third party clients which allow you to post. Eg: I sometimes use roam(My notes app) to directly post.
Twitter APIs allow you post 200 tweets in a span of 15 minutes
This changes a lot
Lots of bots and spams would be using automated scripts and APIs to post. They would never be on a surface where they can be shown ads, hence Non Monetisable. Thy are not counted
Real users tweeting using certain clients (Or automated scripts like IFTTT) may not be counted
Any account which is spam or even likely spam may be tagged by ad engine as such, and removed from potential monetisation and hence not counted. Twitter even mentions that in their filing in the same para as the 5% claim
After we determine an account is spam, malicious automation, or fake, we stop counting it in our mDAU, or other related metrics
So possibly a large swatch of accounts that may be labeled as potentially spam and fake never get to see an Ad, and hence not counted.
Not every potential spam account is deleted , possibly because there can be lot of false positives . Lot of real people behave like bots and the ad engine may have stricter rules
Fun thought exercise: If you behave like a bot, do you get ad free twitter?
A good visualisation of this would be something like this
So fake accounts on twitter could be 20% or even 50%, if they are not being monetised, it’s not counted.
The main claim in some sense is : If an advertiser spends money to reach users on twitter, only 5% of those users would be Fake.
This is an advertiser facing metric and not a user facing one. Your own experience is not what is being measured
Now coming back to how it gets reported. Remember the screenshot of reuters I shared above? In the the sub heading they do decide to make that distinction, indicating that they know this difference but chose to NOT talk about it in main heading.
This is repeated across many articles across various tech media sites. Either they ignore it and assume mDAU =DAU(which is incompetence) ,or hide it in text which I think is not very ethical.
This distinction is so important that it needs to be called out in the MAIN heading
Also read how other tech companies have similar metrics:
Not exactly. There are 5% fake users on a platform is very different from “of the people who can be shown ads, only 5% are Fake”. The data needed to verify this claim is
Who was monetised
Take a sample of these monetised users
Define and agree on the principles if what is SPAM/ Fake account
See what %age of these users fit that definition
This. is why its almost impossible to verify this claim without having access to twitters internal systems.
What percentage of SPAM accounts exists severely affects users and have a negative effect on user experience. This absolutely needs to be addressed, but the claim twitter is making is not about a user facing metric but rather an advertiser facing.
The big question that needs an answer is : What percentage of twitter’s daily active users are in monetisable bucket, but even that is not exactly relevant to the 5% claim.
Its very much possible that twitter is lying, or maybe they count every DAU as monetisable, maybe their SPAM engines are too lenient but we need internal data to know that .
I for one do not suspect twitter doing anything shady .
Looks like Twitter’s Ex Head of Security became a whistleblower(Source) and revealed a lot of details about its security practices and also Spam accounting.
Keeping the security bits asides, it seems that even the whistleblower, who typically would be very antagonist to twitter, more or less confirmed that what twitter was reporting all along was correct.
SPAM in mDAUs are ONLY the users who slip through their existing spam filters
Twitter, Zatko’s disclosure claims, actually considers bots to be a part of a category of millions of “non-monetizable” users that it does not report. The 5% bots figure that Twitter shares publicly is essentially an estimate, based on human review, of the number of bots that slip through into the company’s automated count of monetizable daily active users, the disclosure states. So while Twitter’s 5% of mDAU bots figure may be useful in indicating to advertisers the number of fake accounts that might see but be unable to interact with their ads, the disclosure alleges that it does not reflect the full scope of fake and spam accounts on the platform.
Executives are incentivized to avoid counting spam bots as mDAU, because mDAU is reported to advertisers, and advertisers use it to calculate the effectiveness of ads. If mDAU includes spam bots that do not click through ads to buy products, then advertisers conclude the ads are less effective, and might shift their ad spending away from Twitter to other platforms with higher perceived effectiveness.
However there are many millions of active accounts that are not considered “mDAU,” either because they are spam bots, or because Twitter does not believe it can monetize them. These millions of non-mDAU accounts are part of the median user’s experience on the platform. And for this vast set of non-mDAU active accounts, Musk is correct: Twitter executives have little or no personal incentive to accurately “detect” or measure the prevalence of spam bots.
Twitter announced a new, proprietary, opaque metric they called “mDAU” or “Monetizable Daily Active Users,” defined as valid user accounts that might click through ads and actually buy a product. 19 From Twitter’s perspective, “mDAU” was an improvement because it could internally define the mDAU formula, and thereby report numbers that would reassure shareholders and advertisers. Executives’ bonuses (which can exceed $10 million) are tied to growing mDAU.
Unless you’re a Twitter engineer responsible for calculating mDAU, you probably wouldn’t know what Agrawal is talking about. He is not saying that fewer than 5% of all accounts on the platform are spam. He’s saying, more or less, that Twitter starts with all the accounts on the platform, tries to automatically put all the human accounts that could be convinced by advertisers to buy products (but no spam accounts) into mDAU, and then uses humans to estimate the error rate of spam accounts that nevertheless slip through into mDAU. And naturally, Twitter “can’t share” its special sauce for determining mDAU.
Even though it’s written in a very antagonist fashion, what Zatko is saying should be music to Advertisers and twitter BD teams.
It says that Twitter took great care in making sure ads were not shown to suspected fake users and voluntarily removed them from its monetizable pool. It further claims that Exec comp was tied to increasing this specific metric rather than the “Vanity metric” DAU.
This is a GOODthing. Anyone who works in Ad tech or marketing would tell you that.
Sure twitter can do more to fight spam, sure spam makes user experience worse, but there is currently no evidence that twitter lied in it’s SEC filing.
Some tips I follow to manage my personal finance in India (Tweet Thread)
If someone suggest insurance+ investment combo opportunity …RUN: NEVER EVER mix insurance and investments. It makes no financial sense. So many of us are stuck paying premiums for junk polices like Jeevan Anand, just because our neighbourhood uncle convinced our parents that it is a good deal. Should we start a “Jeevan Anand Peedit Sammelan”
Remember Benchmark: Whenever someone talks about “great returns” always benchmark it against its peer. Eg in last one NIFTY 50 gave 42% returns. Almost everything did well post the covid market crash. I am pretty sure some agents are trying to sell you policies quoting an awesome 20% return last year . Ideally ignore the last year anyways
Fee only financial planners ONLY . Avoid your bank’s wealth managers at all costs. The incentive of anyone who makes money via your investing is not aligned. More often than not they would try and sell you what gets them maximum commission rather than be beneficial to you.
Direct mutual funds only: There is no real advantage of giving commission to the brokerage. Direct mutual funds not only give better returns, they are now super accessible. Just go open an account with Zerodha (referral link) and start investing there. I liquidated all my MF holdings after they crossed the threshold of exit load and moved to Zerodha
No point buying an apartment: I personally see no point plonking in a huge sum of money into an apartment, especially in places like Bangalore. – The rise in real estate is more or less correlated with stock market growth – Rental yield is just 2-3%. – The houses you want to stay would not have the same EMI as rent ( as some ppl claim). You just end up paying “Rent” to the banks, and its super expensive and you are locked in – There is also cost of non mobility
The only reason to buy a house is emotional which obviously cannot be priced in. If owning a house helps you feel good and has a positive effect on your self image, thats your call. To me at the moment it makes no sense. I may eventually buy when I have enough spare.
Book I recommend
I typically recommend Lets Talk money (Amazon Ref link) by Monika Halan . It is simple, and good enough for most people. Personal finance should be simple, you are not managing a hedge fund
It’s a fun exercise, and people’s comments help me learn as well. I am also a strong believer in putting your money where your mouth is, hence when I say I like company X, it carries more conviction if I actually hold that stock.
This is obviously NOT an investment advice of any sort. It’s just be tracking how my personal portfolio evolves over time. There will probably be NO financial estimates
This also does not include my Employee Stock options from Google and Microsoft( haven’t sold a single share in MSFT ever). This is only my Vested holding
July 29 2021
Palantir remains a big bet as before. I think with China US tensions escalating, Palantir has a chance of becoming an even more important company
Moderna growth is primarily due to growth in stock value itself. Bought it as soon as they had vaccinations available. MRNA is a watershed moment in vaccine development and it was an almost mindless decision to double down on the pioneer.
Twitter as usual remains an all time favorite . I think they are extremely undervalued, but seem to have recently been shipping at incredible pace. It’s a pure and pure product company, something I hope i=I understand :), and I like what I see. (Twitter as identity and social capital management)
Sold most of Snapchat after stellar earnings. Will enter again
Sold a lot of clover to take money off table during short-squeeze, entered again when price dropped. Will hold long now
My cash holdings are down to <2% primarily because I did some short term investments when the market went significantly down
Small wild bets:
Didi , because why not.
Still hold a bit of AMC
Plan: More cash holding . Target ~15%
June 11 2021
Biggest change: CASH : <1% –>17%
Clover got short squeezed , so YAAY
Uber FTW as always
Palantir , Moderna, Twitter conviction still stands
Feb 26 2021
Doubled down on Twitter and is now my BIGGEST holding
Doubled down on Palantir, its becoming an extremely important company
Exited most of Apple. It feels like it may be a while before M1 sales show up. I also have Mutual funds in india that invest in Apple so may not need such a significant bet. Not to mention, despite great results the stocks didnt show any excitement. I obviously don’t understand stock market.
More clover added. It looks like a sound company
Jan 25 2021
Reduced my stake in Tesla, I wanted to book some profits. I had opened US stock account primarily because I wanted to buy Tesla stocks 🙂 , that was a good decision in hindsight and paid off handsomely
Sold most of Uber to book some profits at 42 (should have held on, but I had bought a bunch at 16, 30, 38 )
Added Moderna. It’s an almost mindless investment. Not just because of covid vaccines but because MRNA is possibly the future of vaccination. It’s like buying amazon of future. Thats how vaccines would be designed
I am hoping Apple earnings would surprise everyone 🙂 . Their new processor is a game changer
Risky bet: Clover health. In Chamath we trust..sometimes 🙂
June 30 2020
Thus was my holding last year when I started posting .
Largest holding first
Norwegian cruise line was a wild bet primarily because it seemed like the most stable cruiseline with deep pockets. Big pandemic recovery bet 🙂
Slack and Tesla, favorite since long
Uber stocks still held in morganstanley employee account. not sold a single share
While I speak from a more Product manager’s perspective, this is generally true for everyone who wants to be intellectually curious
It’s easy to be be a believer, it gives you sense of purpose, a sense of comfort, but if you really want to succeed, you need to be a bit of a skeptic.
And skepticism can be taught. Here is a simple trick I suggest
Developing a habit of skepticism
Whenever you read or listen to something interesting, something that catches your eye, especially something that makes claims: Think about one small fact that you can verify. I typically add a small note “really?” in my personal notes.
Go ahead and try to see if you can substantiate that. It could be a very simple thing: Eg someone says that a new study says that Covid vaccine is very effective, you can just check if the study exists and it makes that claim. It could be even simpler than that: Eg a startup says that their market is all tax payers in india and that is Y Million people. Just try ad find that data, is that accurate.
Slowly you start moving to questioning the interpretation on those facts. Eg: In the covid vaccination is “good” case above, you could now try and substantiate what is good. The research report may say 80% efficacy. Is 80% good? How does it compare to other vaccines. Are there any specific things that have been missed? Which age group, which demographic, which variant?
Eventually you start questioning the whole premise of the argument itself.
At the highest level you move to the very motivations driving the argument.
Most things you evaluate would be correct, but that is not the point. You are not trying to find malice, but just building a muscle for questioning.
With enough practice you would start seeing a pattern. You will get a “gut” for understanding what to check and verify.
Every industry, and also every individual has a pattern of what they tend to overlook or what they exaggerate. It could be personal bias or just a generally accepted “industry practice” (see Uselessness of NPS score article as an example of how a generally accepted industry practice is not necessarily accurate)
With this you are not trying to be cynical, but just being skeptics.
A very good exercise might even be to treat this very article with skepticism
Have I defined skepticism right
Can skepticism be taught, or is it just genetic
Who said PMs need to be skeptics. Is there some kind of qualitative or quantitative evidence to support that claim?
After writing this article, a perfect opportunity to demonstrate this came about. I started seeing some WhatsApp forwards and tweets talking about how upto 40% apple workers intend to leave for lack of full remote, or 90% apple employees want indefinite remote.
While I am all for flexibility and do believe that full remote is here to stay, the 40% / 90% number seemed way too high.
So I decided to look a bit deeper. Thankfully one of the newspaper itself posted all the details including their own skepticism
This data does exist. It was collected from employee survey done at Apple
It was done by Apple Employees themselves
It was done in a slack group specifically meant for people invested in remote work . DUH!!!
Without even trying too hard you can see that only 36% of employees in a group specifically meant about remote work spoke about resigning.
If I was teaching a class on bias, I would use this as a perfect example.
You can draw absolutely NO conclusion about what apple employees want in aggregate from this. While I do give points to media sites for publishing the survey details, I hold them accountable for publishing it in the first place knowing fully well that this data has no validity.
It leads to absurd headlines and unnecessary conclusions amongst people who trust them.
It’s like me doing a survey in a “Board game lover”internal group about how important it is to have board games in the break room, i may get 90% Yes. That does not mean 90% people in my company want boardgames in the break room
NO company is going to lose 40% employees just because the do not offer full remote work. Ironically, the most accurate data about the pulse of their organisation might be with the company itself. The company’s survival depends on it. No one would risk losing 40% of employees.
You can obviously dig further and look at specific questions and see if these questions had inherent bias already. Surveys are not that easy and results can vary widely based on how you ask a question.
Motivation: Now you can ask, why did these news sites publish these results knowing fully well that they are widely inaccurate and biased. What is the motivation:
Sometime back I had tweeted a thread about how I dislike NPS score as a measure of success. It led to a fair amount of discussion and debate. Hence, I decided to dig a bit deeper and write a longish post about it. While I have tried giving it some structure, each section is more or less self contained. You can directly jump into a specific section if you are already aware . I have tried to give enough context wherever possible.
I have also linked to sources wherever applicable, so feel free to follow them and do your own due diligence when in doubt.
Disclaimer: I may add more details and address any specific questions and criticisms that may come my way. Please do let me know if I misrepresented or missed something
What is Net Promoter Score AKA NPS
Net promoter score is a widely used measure of customer loyalty today. It’s claim to fame is its utter simplicity. It can measure customer loyalty with just 1 question.
To calculate Net Promoter score you ask a statistically significant number of your customers to answer a single question
How likely is it that you would recommend [brand or company X] to a friend or colleague?
Ask them to select from a scale of 1-10(some orgs use a slightly different scale but 1-10 is the most widely used), where 1 means not likely at all, 10 means very likely and 5 means neutral.
Anyone who choses 1-6 is considered a Detractor
Anyone who choses 9-10 is a Promoter
Net Promoter Score= %age of Promoters – %age of detractors
The basic claim of NPS is that it can reliably measure customer loyalty and if the company focuses its efforts to increase NPS, it can lead to more healthy growth.
This two specific claims are important to keep in mind 1) Reliably measure loyalty (better than other scores) 2) Correlated to company growth (see more details in NPS Origin story sec)
NPS origin story
While internet is filled with how to use NPS, when to use NPS, and why to use NPS, before we get to all those questions it is necessary to understand how NPS even came into picture.
The origin story of anything reveals a lot about the motivations without the burden of muddled history between then and now.
NPS score was invented by Fred Reichheld who was a consultant at Bain and company . It was introduced to the world in this HBR article
The basic idea came about when they looked at a car company use a very simple method to increase customer loyalty. The company, Enterprise Rent-A-Car, simply asked people two questions – Quality of rental experience – Likelihood you would rent again
The company then counted only those customers who gave it the highest scores on both the questions. All their outlets were then asked to optimize for this specific score. It was believed that this would inspire the sales agents to be better and increase customer loyalty.
Fred wanted to make this system much more simpler and see if this could be brought down to just 1 question
The interesting point to note here was that the intent was not to find a great predictors of company growth or loyalty, rather to find one question. The aim itself was simplicity
Process of finding the One Question
20 questions were created on the Loyalty Acid test survey
Test was administered to customers in following industries
Cable and telephony
Internet service providers
Then they asked each participant to tell about a specific instance when they actually referred the company to someone. If this was not available they waited 6-12 months and asked again. This data of about 4000 users was enough to create 14 case studies which established a link between survey response and actual referral
The top-ranking question was far and away the most effective across industries:
How likely is it that you would recommend [company X] to a friend or colleague?
Two questions were effective predictors in certain industries:
How strongly do you agree that [company X] deserves your loyalty?
How likely is it that you will continue to purchase products/services from [company X]?
Other questions, while useful in a particular industry, had little general applicability:
How strongly do you agree that [company X] sets the standard for excellence in its industry?
How strongly do you agree that [company X] makes it easy for you to do business with it?
If you were selecting a similar provider for the first time, how likely is it that you would you choose [company X]?
How strongly do you agree that [company X] creates innovative solutions that make your life easier?
How satisfied are you with [company X’s] overall performance?
Link between NPS score and company growth
Then they tried to find correlation of NPS score of customers with the actual company growth
In airlines a strong correlation existed between “Would recommend” question and average company growth
Similar results existed in rental car business
“Would recommend” was irrelevant for database software or computer systems as people had limited choice, and senior execs who made the choice were not part of the people surveyed. For such industries “Sets standards of excellence” and “deserves your loyalty” were far more predictive
NPS was also not a predictor for Local telephone and cable TV company growth because they were near monopolies. Their growth was determined by how fast the population in their area increased
Who uses NPS today
Pretty much every one. As of 2020 2/3 of fortune 1000 companies seem to use a version of NPS. One simple experiment would be to search for the term “How likely” in your Inbox
Good things about NPS
It is very simple to measure and benchmark .
Its a single question and is used by multiple industry players to benchmark against competition and internally
Its easier to digest at almost all levels of abstraction
High completion rate
With users being inundated with all kinds of brands seeking their attention, it is much more easier to get them to answer 1 question rather than multiple. Infact in the paper “Assessing treatment outcomes using a single question” where they did an NPS of patients, they found that the NPS question consistently had the highest completion rate (96.5%). I would also now assume that it has become so common that users almost expect this question and are willing to answer
It defines loyalty in interesting way:
While loyalty may traditionally be defined by retention, LTV, and other metrics , it can miss out on word of mouth. NPS attempts to target that specifically by a bit lose but interesting way
Customer Loyalty Definition in Original NPS system
Customer loyalty can be defined as customers willingness to stick to certain provider even if they are not providing the best possible rate in a particular transaction. Think if this like : ” Sure you may be charging me more today, but I know you have done great work in the past and generally give me good rate so I will stick to you even though cheaper options may be available”
Customer loyalty is also more than just retention because some people maybe retained just because they cannot move out due to inertia, or exit barriers. Eg : Monopoly players , or prepaid plans
Loyal customers may also not be repeat purchasers, eg when they outgrew that service. Eg: You may no longer buy a pulsar bike because you no longer drive a bike, but you would recommend it to your nephew when he is considering one.
NPS claims and how they measure up
NPS is a slightly obtuse metric because instead of asking if people are satisfied with the product or service, we are asking if they would recommend it to someone else. It’s not exactly a measure of a customers own experience with the brand.
If you are introducing a new kind of measurement it needs to be better at something than the existing systems. It either helps you uncover a specific issue, or measure something unique.
Survey metrics also are predictors / proxies of some tangible business outcomes such as churn, growth, complaints, etc. A metrics with no business outcome is plainly a vanity metric.
So lets deep dive into if NPS measures up
NPS as a better predictor of growth
Let’s look at the claims made about NPS in its original research. There are multiple leaps of faith in it. The way I read the original article is:
Answer to NPS question seems to be the highest correlated among other questions to actual referrals in some industries
Higher NPS seems to be correlated to higher growth rate irrespective of company size
Using this above methodology claims have been made that NPS is the best predictor company growth. The big issue with this is that even in the original article there was no real comparison of correlation between company growth with NPS vs other survey methods.
Also even though the question seems to talk about loyalty in a very loopy fashion, it actually does not make a claim about it. There may or may not be no correlation between NPS and user retention
This research was not even reproducible
It is not reproducible
This is perhaps the BIGGEST issue with Net Promoter score. The biggest claim with NPS was it is the single best predictor of growth, but this 2007 paper found no support for that claim when they tried to replicate the same study that Reichheld did.
Not surprisingly, they found that NPS performed as well or as poorly as using the customer satisfaction index to predict growth
There seems to be no real statistical backing to NPS, and as per the paper, even Reichheld acknowledges that
NPS as a tool to benchmark competition
A lot of literature outside talks about using NPS as a benchmark against competition, between different departments, different franchise etc. A lot of fanfare is made about how a company’s NPS is through the roof, which company in a specific industry has the highest NPS etc.
The problem with this is that this question has so many variables that its unfair to compare . It can never be an apples to apples comparison.
Instead of simply asking if customers are happy with the service, we ask “Would you recommend X to your friends and coworkers”. There are so many more variables to consider when trying to answer this question
Do I think it’s worth it for my friend: Hobbies, cost, personal interest, my own closeness to the friend
Do I even discuss this with my friends and coworkers
I hate it, but my this specific friend may like it
More variables = more errors.
Eg: when NHS introduced NPS they found that only about 40% variations in NPS scores was explained by overall satisfaction whereas rest was explained by various other metrics such as: if the patient undertook hip replacement or knee replacement.
Even the NPS difference between patients who underwent Hip replacement(71) and knee replacement(49) were stark, making it impossible to benchmark them.
A bad action item for the hospital would be to target for same NPS across all services.
If a hospital cannot even benchmark within its own departments, it’s useless to try and benchmark to other hospitals.
It’s also very dependent on services availed and demographics of the user . Eg: when they compare NPS of Uber vs Ola, they fail to talk about if it’s the exact same mix of users or not. Did the users take similar number of trips, same kind of vehicles , pool vs non pool etc. Without that, comparing NPS of brands Uber vs Ola is not really any benchmark, and potentially worse than just satisfaction surveys. It’s not a worthy complication you are introducing. This is why brand NPS are not really benchmarks
For internal benchmarks, some companies, especially in ecommerce, go overboard and try to find NPS linked to each product and service, which again seems unnecessary. It is no longer a measure of loyalty but just a feedback of the product, which may be better asked directly via ratings and reviews.
Eg see below Myntra trying to do NPS linked to a specific product. But even if I rate it low they may have no action item because they do not control the product itself. Lot many questions need to be asked to even understand my response. A better question would have just been do you like the product, which not only would be direct but also feed their rating system.
NPS as a better loyalty metric
This is another way some people use NPS for . This is potentially because of the nature of question where it talks about referrals. Loyalty here is defined as user’s willingness to recommend.
NPS tries to force fit people into specific boxes of promoters and detractors ignoring any reasoning behind the users response
There are many reasons you would not recommend a specific product to someone. Just like in NPS article they mentioned that a loyal customer may not be a repeat customer because they outgrew the product, but would happily recommend it to someone who did not.
Using the same logic, someone who may be a loyal customer may not recommend it to friends who may not be the target customers. NPS would classify these loyal customers as detractors.
Loyalty is very fluid and detractors and promoters are not rigid boundaries as NPS tries to bucket them i
Your personally may hate the product but if you were to suggest a product to someone you would play a matchmaker role and take into consideration their individual needs and circumstances.
This 2019 survey found that 52% users who actively discourages others from using a brand also actively recommended it. You can be a promoter and detractor at the same time based on who you are talking to or how your last experience with the brand has been.
To make matters even more complicated, NPS is not even an accurate predictor of users own measurable behaviour such as repeat purchase, churn rates etc
The purest measure of loyalty in my opinion is customers actually spending money to buy your product. I criticise my bank a lot, but despite many many alternatives I have stuck with them for 15 yrs. By every definition, I am a loyal customer who they would want.
Another argument for use of NPS is that its less susceptible to manipulation, but NPS has same pitfalls and anyone who owns an NPS goal can use the same old tactics to improve it.
Just like satisfaction surveys which can be manipulated, so can NPS. Simple techniques could be
Asking the user to rate you after a good interaction. Eg: As soon as order is delivered, or a ticket resolved. At this time you are no longer trying to find and fix issues, you are simply trying to get that score. This technique may have an effect for Play store reviews and youtube videos where ratings and likes are a social signal to other users, they are counter productive for NPS , unless this NPS is being collected as a vanity metric. Eg: Pitch deck, Presentation to leadership
Incentivising the user: Eg give your software product for a free trial and see almost every customer give you high ratings on NPS. It is meaningless and may have no correlation to your growth
I also did a very unscientific survey on Twitter and Linkedin to know if companies took NPS targets, and if the person responsible for the target also controlled stuff like when NPS was sent and how to pacify the user: Here are the results
As per the survey above ~30% respondents said their company has NPS goals and the owner of the goal optimises of things like when to send the survey, and in some cases even customer incentive . You get what you optimise for and in this case my hypothesis is that system is designed to make the NPS go up not necessarily the loyalty
You get what you optimise for
This maybe the reason why companies with really high NPS also go bankrupt
NPS and other big misses
Its arbitrary and ignores all cultural nuances
There seems to be no clarity on why someone who says 6 vs 7 are in a different bucket while 5 vs 6 are not. Also no clarity why focussing only on difference between promoters and detractors matter. What if we just tried increasing average score?
It actually hides the actual improvements. Eg: movement of a large chunk of users from 1 to 5 has no effect on the NPS score.
It also ignores all cultural nuances. Eg: if you travel in Uber in US vs India, you may see a huge difference in your ratings. Anecdotally I have seen my ratings drop in india and rise in US. I presume there is a cultural difference here. In India low rating is 1* whereas in US its 4* .
I read a comment on some blog that put it well: NPS is just lots of numbes disguosed as maths
It is more noise than signal
What do you do with NPS? One common theme is that you work towards increasing it by using it as a north star, but that is not a good reason to ask this question in the first place. There is no evidence that it is better than working to optimise other tangible metrics.
It’s not a single question
While the whole USP is its single question, you invariable would need more information as soon as the users rate <7 , defeating the whole purpose of simplicity.
Loyalty is multidimensional
While NPS seems to acknowledge that loyalty is multi dimensional, it tries to collapse it into a single dimension of word of mouth.
Its probably not for your industry
This is less of NPS issue and more of marketers abusing NPS because of its perceived simplicity.
In the original paper, NPS was not found to be a predictor of growth in industries such as computer databases.
Remember the ONLY thing it was supposed to do was predict if you will grow, without that correlation the score is more or less useless.
Sales is complex and any industry with high inertia, top down decision making, and monopolistic players NPS is not even applicable. This makes me wonder why so many startups are obsessed with it.
It’s also possible that NPS should not even be a goal.
Eg in the NHS paper I referred to, difference in NPS among patients was not due to actual patient care and recovery. Perhaps NPS is not even a measure for hospitals.
NPS seems to be a arbitrary score with little statistical backing. It is not even be valid for many industries . While it can be used as a tool in your armour of many other signals, over reliance on this for making decisions is not prudent.
NPS is popular perhaps because it is simple, but this reminds me of the phenomenon of Bikeshedding .
Bikeshedding: If a committee were to design a nuclear power plant, they may spend far more time than necessary to discuss the bike sheds, its color, its position, and its capacity . The reason for this is that bike sheds are easy to design and everyone can have an opinion on it. In corporate we sometimes spend a lot of time on bikeshedding activities just because our minds automatically go towards simplicity first.
NPS to me sometimes sounds like the Bikeshed of the user research world
As a startup / company, I would be more worried about actual referrals, customer churn/ retention, cost of acquisition, than NPS.
Low NPS maybe a sign of something wrong, but it’s likely also showing up in other survey questions. NPS may not be adding any value
NPS may be simple, but not necessarily useful
As a product manager, I become very suspicious when some startup or product touts high NPS scores with little else to back it up.
As an investor, I would ideally ignore the NPS score, or give it very less weightage unless backed by actual metrics. it is easy to manipulate and if it’s rewarded it would be in any company’s best interests to figure out how to get better scores.
As some of you who follow me on twitter know, I am a fan of short form videos. Recently I decided to give short video apps popular in India a try. I tried Roposo , MX takatak, Moj, and Chingari. After a few initial hiccups I was able to use all platforms.
I am bullish about the bharat / India story and believe we are primed for India first innovations
My plan was simple: Upload the same video on all platforms and get a sense of onboarding process, the speed, community, India specific features etc. I did not expect any major traffic. I also set my language as Hindi because I wanted to experience the whole deal.
The video I used was me reciting Shiv Kumar Batalvi’s punjabi poetry . It had no music, was not following any trends, and was in punjabi. I had low expectations
I uploaded my video on all and waited. Every short video platform would typically show your video to some users and based on their interaction it might decide to show it to more people. Since the user base IMO is fairly similar across all apps, I was not surprised to see more or less same response on all. A few 100 views, and a few likes on all platforms except chingari. On Chingari I had 10,000+ views and 100s of likes, curiously 0 comments.
After the initial euphoria died down(hey maybe I went viral), 0 comments started to bother me. How are these views counted? how are these likes assigned?.
To test my hypothesis that these views were not really earned by me, I uploaded a completely blank video. To my utter surprise, it followed the same trajectory. Similar number of likes and views after almost the same time. I tried uploading more blank or nonsensical videos, just to see if somehow some video deviated from the pattern, but failed to find any. I literally had a video with a 5 second black screen. I found no way to not get views and likes.
To my annoyance, my empty videos had slight higher views and likes than my poetry. Am I net negative 🙂
Videos eventually settled at 250K views and ~17K likes. It was super uncanney. I also hit 1M views. Am I a genius influencer?
So I did an experiment. I created another account on a different phone and uploaded the similar video on both my primary account and the new account. Same title, no effects, no music, nothing and I took a screenshot every few hours for 2 days. I took a total of 8 readings and here are the results. The videos were literally a few second shots of my pillow and titled “bilkul kuch nahi” (Translation: absolutely nothing)
Just to make it clearer, lets plot the above values on a chart. The line chart was so close for both videos that it hid one of the lines, and I had to make a bar chart
Unfortunately I could not take readings at hourly basis, because work and life 🙂
Since views do seem to flatten around 250K, I am sure if I keep going it may not be such a straight line and there would be a decay.
I then plonked these values into a linear regression analyser (even though on a longer scale it may not be linear) and here are the results. It was not a perfect match, but was close.
Note: Chingari does not tell you exactly how many likes or views a video has, it rounds them off. Eg 2.5k instead of 2543 . This makes regression difficult
I was also curious to see how likes vs views were behaving. Looks like they were a perfect match
The formulae was
Likes on a video = 71.71084*Views in thousands + 3.34218
To see if this formula holds up, I looked at all the videos on my account and tried to predict the number of likes based on number of views. It was pretty close
Unless they change something, this seems like a replicable phenomenon. If you are a creator, I would like to know if you see the same numbers for dummy videos(Good videos would of course have engagement)
Why is this important
A lot of bharat users are using short videos as source of fame and may make a decision to pursue a career as an influencer based on traction they are getting.
They may also be using this to decide if they should spend more time to create or not. Knowing if their fame is real is an important question. A sudden burst of views may also make people rate you better, as evident from this comment from playstore .
Views and engagement are also key metrics for any advertiser and media company. A very high number may encourage a company to invest heavily on that platform and ignore others. This happened in case of college humor which pivoted to facebook native because of insane engagement it was getting. The engagement numbers turned out to be overestimated
Now I am NOT exactly sure why is this happening . Is this a new user boost? Perhaps a quirk of their targeting algorithm, maybe there are some users who like everything? Is this a normal trajectory for every video that has no engagement?
There are potentially alternate explanations and I would love to hear them.
What I can say is that it seems reaching millions of views on chingari app at the moment is a child’s play . Just upload almost anything and it might grow as per the formulas that I presented before. Even my own channel with useless videos has 1M+ views in a few days.
Do apps boost new users?
In the startup and product management world we call it a First run experience, and considerable effort is spent to make that as pleasant as possible.
Some companies do end up doing something special for the new user. This may include a boost in views by showing it to more people, but I have not seen this level of boosting. This is also assuming that the numbers are all real. Actual people saw the videos and liked it. In case numbers are not real, it is unethical and there is no excuse.
//update : This does not seem a case for a New user. On my first account, which had 1M views, I uploaded 9 blank videos yesterday night. ALL of them have similar views and likes and the progression is still happening as predicted. Unless there is a different definition of a new user, this phenomenon does not seem to be limited to a new user
Things I did not check
When does the graph flatten : I presume it starts in day 3
Why I am excited about NFTs : A small thread(Lot of flex ahead)
About a decade ago I started recording my poetry and mixing it with music. I had some success and was semi famous for a short while.I raked up about a 1M views on Youtube and many more unofficially on other platforms
I went upto 10K subs on youtube, 5K fans on facebook, and was consistently getting traction on my poetry website. One of my poem was also amongst top viewed videos on youtube for the day /week in India. My work was also shared by a few famous people from TV and Media
So I should have made some money right?. I did try. I became a YT partner, I tried spotify, I tried selling on bandcamp, and even tried sponsored posts on website. My work was also available on amazon music, itunes and what not.
I made a few 100$ from youtube, sold 2 copies on bandcamp, a few itune sales, and listens on spotify that were too small to justify yearly costs of 15$. I made enough money to afford a cup of coffee—once a month 🙂
It was not that I had no backers. But in that old world , every backer was the same. They could buy my music on itunes, or listen on spotify, or share on facebook. But there were no clear ways of “owning it” easily. Especially if they were not in India.
NFTs bring not only a method, but also a culture of trying to OWN the art you like in a borderless fashion. I would have absolutely tried NFTs at that time and would have needed one influential backer to make a decent payout. And that backer could have been anywhere in the world.
And this is what excites me about artists of today and tomorrow. You no longer need a very large audience or need to get “discovered”. You need a small set without having to worry about so many logistics. We had patreon, and substack, NFTs are just the next level.
Yes there are flaws, but I am absolutely bullish on the tech and rooting for its success.