Generative AI has gotten very good at summarizing whatever is publicly available on the internet. "Pull together our public reviews and tell me what people complain about" — "Summarize the industry chatter for me" — these requests get reasonable answers today.
So the question naturally comes up: do we still need to run surveys?
Short answer: yes, and arguably more than before. The value of first-party data — data you collect directly from your own customers — goes up as AI commoditizes the analysis of public data. This article maps the boundary between what AI can pick up and what only a survey can capture, and re-frames what customer research is for in the AI era.
What AI can now pick up easily
In the last few years, AI has made it cheap to aggregate and analyze:
- Public review sites — Google, Amazon, Yelp, Tripadvisor
- Public social posts — X, Instagram, Facebook
- Media articles and blogs — trade press, personal blogs, company blogs
- Comparison and ranking articles — the "10 best XYZ" pages
- Search result summaries — Google AI Overviews, Perplexity
- YouTube comments and video transcripts
- Reddit, Q&A forums
All of this is information published on the open internet, which means it is fair game for AI training and retrieval.
Ask an AI to "summarize what people are saying about our product," and it will sweep these sources and give you something usable. That part is real.
Here is the catch — what AI structurally cannot reach
There is information AI cannot access, no matter how good the model gets. It is the voice of your customers as they would speak to you directly.
1. The silent majority
The customers who write reviews, post on social media, or leave comments are a tiny minority of your total base. As rough benchmarks:
- People who write product reviews → 2–5% of customers
- People who mention you on social media → fewer than that
- People who contact official support → 5–10% of unhappy customers
What you can observe on the internet is the loudest few percent with strong opinions. The moderate customer, the satisfied-but-quiet customer, the unhappy-but-disengaged customer — this silent majority lives outside everything AI can aggregate.
2. Answers to the questions you care about
Public reviews are written about what the reviewer wants to talk about. They will not necessarily answer the question that is actually shaping your roadmap — "Why did they pick Feature A over Feature B?" "Which improvement should we ship first?"
A survey can ask exactly that. Asking the question you care about is something AI on public data cannot do for you.
3. The unpublished context
Things customers never put into a public review:
- The specific reason they chose a competitor
- The alternatives they considered and ruled out
- The concrete details of how they use your product
- The real reason behind their willingness to recommend
- The real reason for churning
These things are uncomfortable to write where colleagues, vendors, and competitors can read them. They only come out when you ask anonymously and one-on-one.
4. Change over time
"How has satisfaction shifted in the last three years?" — that question cannot be answered from public reviews alone:
- Old reviews exist, but separating "still true today" from "outdated" is hard
- You cannot follow the same customer over time (you do not know who wrote what)
- Tracking change by segment is essentially impossible
Only by asking the same customers repeatedly, on your own schedule, can you see the trajectory.
5. Your own metrics
Custom scores — customer success index, activation depth, recommendation × purchase frequency, whatever your business model needs — cannot be derived from public data. They have to be measured directly. Building business-specific metrics and applying them to your customers is something only your own survey can do.
AI analysis vs. surveys — they are complementary
These are not rival approaches. They cover different ground.
| Dimension | AI on public data | Survey (first-party data) |
|---|---|---|
| Coverage | The vocal few | Everyone you want to ask |
| Questions | Decided by the writer | Decided by you |
| Volume | Massive (tens of thousands+) | By design (hundreds to thousands) |
| Context | Shallow | Goes as deep as you ask |
| Time series | Snapshot | Tracked over time |
| Your custom metrics | Impossible | Native |
The healthy operating model is:
- Use AI to read the industry-wide signal (broad and shallow)
- Use surveys to get your customers on record (narrow and deep)
- Cross-reference both when you make a decision
The deeper point — why first-party data is gaining relative value
Step back and look at what AI adoption is actually doing to the information landscape.
Observation 1: Differentiation has moved to "what data you own"
AI can analyze public information, but anything ChatGPT can read, your competitor can also read. You cannot build advantage out of that. Conversely, first-party data only you have cannot be analyzed by AI on someone else's behalf. That is the new source of advantage.
Observation 2: Public information is being optimized for AI
What happened to SEO content (heavily optimized for Google, signal-to-noise dropped) is starting to happen to reviews and word-of-mouth. "Reviews crafted to be cited by AI" and "posts written so AI will not pushback" are real now. The purity of what is on the internet is declining.
Observation 3: Direct contact with customers becomes the last clean signal
Email exchanges, surveys, customer interviews, support conversations — these are one-to-one channels with very little noise. In an era where AI is averaging out everything else, the channels where you talk to your customer directly become more valuable, not less.
The risk of not collecting first-party data
In the AI era, an organization that does not collect first-party data ends up:
- Making decisions from the same public sources its competitors use
- Blind to the silent majority of its customer base
- Unable to define its own metrics
- Dependent on AI vendors to interpret reality
- Treating customers as "subjects to observe" instead of partners to engage
The organization that builds the muscle for first-party data, on the other hand:
- Owns insight competitors do not have access to
- Accumulates facts that cannot be argued away by AI
- Engages customers actively rather than passively
- Makes higher-precision decisions
Surveys deserve a higher priority, not a lower one
"AI exists, so surveys are obsolete" is exactly backwards. The healthier conclusion is:
- Run them more often
- Invest in question design quality
- Use open-text questions deliberately (AI is great at analyzing those after you collect them)
- Tie each round to prior rounds so you can track change
- Slice by segment to dig deeper
This survey muscle is precisely what becomes a competitive moat in the AI era.
Three principles for putting first-party data to work
1. Ask what you actually need to know
Do not run a survey that confirms what is already on the internet. Only ask what only you can answer. Cut everything else.
2. Run on a schedule
A continuous survey program is worth 100× a single-shot study. Monthly or quarterly cadence is what turns a survey into a data asset.
3. Commit to the loop
Collecting and forgetting is no better than "I scrolled the internet a bit." The value shows up only when you close the loop: analyze → decide → ship → re-measure.
Repoan's stance
Repoan is built as a tool for collecting first-party data in the AI era:
- AI-generated questions to raise the quality of your survey design
- AI analysis of open-text answers to make first-party data faster to read
- Continuous-survey dashboards to help you accumulate over time
- Segment-level drill-down to use your data multi-dimensionally
- Branded delivery to keep the direct-contact channel pristine
The thesis is consistent: precisely because AI can analyze everything in public, you need a private data asset AI cannot touch on your competitor's behalf.
Summary
First-party data in the AI era:
- AI can analyze almost any public data now — that is real
- But the silent majority, your specific questions, unpublished context, change over time, and custom metrics are all outside AI's reach
- Competitive advantage is shifting toward "what data you own"
- Public information is getting optimized for AI and losing signal density
- Direct customer contact is becoming the last clean channel
The more AI averages out public information, the more valuable the data only you have becomes. Surveys are not a legacy method — they are becoming the core mechanism of differentiation.