Webzeitgeist: Design Mining the Web

Ranjitha Kumar, Arvind Satyanarayan, Cesar Torres, Maxine Lim, Salman Ahmad, Scott R. Klemmer, Jerry O. Talton

Advances in data mining and knowledge discovery have transformed the way Web sites are designed. However, while visual presentation is an intrinsic part of the Web, traditional data mining techniques ignore render-time page structures and their attributes. This paper introduces design mining for the Web: using knowledge discovery techniques to understand design demographics, automate design curation, and support data-driven design tools. This idea is manifest in webzeitgeist, a platform for large-scale design mining comprising arepository of over 100,000 Web pages and...

Learning Structural Semantics for the Web

Maxine Lim, Ranjitha Kumar, Arvind Satyanarayan, Cesar Torres, Jerry O. Talton, Scott R. Klemmer
Stanford University CSTR '2013 PDF

Researchers have long envisioned a Semantic Web, where unstructured Web content is replaced by documents with rich semantic annotations. Unfortunately, this vision has been hampered by the difficulty of acquiring semantic metadata for Web pages. This paper introduces a method for automatically "semantifying" structural page elements: using machine learning to train classifiers that can be applied in a post-hoc fashion. We focus on one popular class of semantic identifiers: those concerned with the structure—or information architecture—of a page. To determine the...

Class Projects

Structural Learning for Web Design (2012) Maxine Lim, Arvind Satyanarayan, Cesar Torres . Machine Learning | Andrew Ng | Final Project .
Recursive neural networks (RNNs) have been successful for structured prediction in domains such as language and image processing. These techniques imposed structure onto sentences or images for more effective learning. However, in domains such as Web design, structure is explicitly embedded in the Document Object Model, so structured prediction can be done using the natural hierarchy of Web pages.
Paper | Poster |
Charlotte - Visualizing Web Design (2012) Cesar Torres, Maxine Lim, Victoria Flores . Data Visualization | Jeffrey Heer | Interactive Data Project .
Exploring a design space must generally be done by manual inspection of many design examples. Visualizing designdata in an aggregate way can make this process more efficient, but as design data lies in a high-dimensional feature space, selecting the important elements to view is challenging. This work presents Charlotte, a system that enables exploration of Web designs represented in 1,713 dimensions by applying the concept of data portraits and generating visualizations that capture groups of pages with respect to a selected set of design principles. Charlotte demonstrates that meaningful patterns and trends among the design data can explored by using these principles to inform data-driven portraits.