Files in this item



application/pdfJones_MCameron.pdf (4MB)
(no description provided)PDF


Title:Remix and reuse of source code in software production
Author(s):Jones, M. Cameron
Director of Research:Twidale, Michael B.
Doctoral Committee Chair(s):Downie, J. Stephen
Doctoral Committee Member(s):Twidale, Michael B.; Smith, Linda C.; Karahalios, Karrie G.
Department / Program:Library & Information Science
Discipline:Library & Information Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Software informatics
Remix programming
Clone analysis
Source code
Programming by Google
Abstract:The means of producing information and the infrastructure for disseminating it are constantly changing. The web mobilizes information in electronic formats, making it easier to copy, modify, remix, and redistribute. This has changed how information is produced, distributed, and used. People are not just consuming information; they are actively producing, remixing, and sharing information, using the web as a platform for creativity and production. This is true of software development as well. It is frequently commented by programmers and researchers who study software development, that programmers frequently copy and paste code. Although this practice is widely acknowledged, it is rarely studied directly, or explicitly accounted for in models of software development. However, this attitude is changing as software becomes more ubiquitous, and software development practice shifts away from the formal models of software engineering, towards a post-modernist perspective. This study explores how source code snippets in programming books and on the web are changing software development practice. By examining program source code using clone detection algorithms, this study provides a comprehensive view of code copying across 6,190 PHP-language applications. These data are used to explore the concept of a "remix" method of software production, where software and systems are built out of copied and pasted snippets of code. These findings are contrasted against both traditional models of information production coming from informetrics (e.g., authorship, citation analysis), and models from software engineering (e.g., the Lego Hypothesis). Explanations for observed phenomena are discussed borrowing metaphors from linguistics, which provide a richer explanation of copy-paste programming than offered by the Lego Hypothesis. The focus and findings of this study ultimately point to a pressing demand for further research centered on the notion of software as information. Software and software repositories hold a large amount of information about how it was produced, and how it is used, adapted, and maintained. Software informatics is proposed as an organizing label to study the science of information, practice, and communication around software. It studies the individual, collaborative, and social aspects of software production and use, spanning multiple representations of software from design, to source code, to application.
Issue Date:2011-01-14
Rights Information:Copyright 2010 M. Cameron Jones Some Rights Reserved This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Date Available in IDEALS:2011-01-14
Date Deposited:2010-12

This item appears in the following Collection(s)

Item Statistics