Crowd Documentation

Crowds create more useful documentation than official sources.

Instead of relying on official documentation, developers have been indirectly documenting APIs themselves through a process called crowd documentation, by publishing blog posts and curating questions and answers about APIs on Stack Overflow.

In new research, we have collected 1,316 days of Android developer browser history and found 9,234 visits to stackoverflow, as well as 2,547 to, which hosts the official documentation for Android.

More over, we found that:

  • Developers may be getting as much as 50% of their documentation from Stack Overflow.
  • More examples can be found on Stack Overflow than the official documentation guide.
  • In web searches, Stack Overflow questions are visited 2x-10x more often than official documentation.

A typical day for a developer:

Understanding and harnessing the crowd.

API designers may want insight into the “hot spots” that are problematic for developers or may have “gaps” in coverage. For example, we observed that not many developers talked about accessibility or DRM in Android.

We have a treemap visualization tool that helps visualize the coverage and usage data of API elements in a treemap.

Finally, we can even find ways to automatically generate documentation from the crowd's efforts.

Crowd's coverage of Android classes on Stack Overflow. Click here for interactive version.

Crowd Voice

Aggregrate customer feedback from social media channels.

Tracking user reviews, complaints, suggestions, bug reports that highlight flaws and performance of the application can be leveraged to detect and fix those issues as early as possible.

The project leverages several machine learning approaches to assist in aggregrating and alerting developers of feedback.


Crowd Apis assembled from Stack Overflow snippets and github repositories.

The software development community has been steadily creating software and tools that allow developers to coordinate on increasingly larger scales. One example of an emergent form of crowd programming comes from the language that everyone loves to hate: Javascript. Rather than having a rich standard API, Javascript essentially has a “crowd API” assembled from Stack Overflow snippets and Github repositories.

The goal of the web dna project is to understand the connections and mechanisms of this ecosystem. The steps we are taking are as follows:

  • Build a corpus of Javascript libraries from package managers (e.g., Bower) and github repositories.
  • Detect usage of those Javascript libraries (and their versions) on websites.
  • Analyze relations of libraries, usage characteristics, and lifetime.
  • Understand propogation and discovery behaviors of developers.
  • Understand motivations and mechanims of creators.

Crowd Learning

Learning from the mistakes of millions.

NEW! In proposal stage. Idea: Leverage instrumentation in online programming environments to characterize the mistakes and learning processes of students when programming. Applications for intelligent tutoring, interactive recommendation systems, explanatory visualizations.

Let me know if you're interested in this!