Web link analysis has become one of the methods to delineate a communication structure among different scholars or organizations with the advent of the Web. This study represents a pilot investigation of the Web space in the Library and Information Science (LIS) field using social network analysis techniques. By relying on collected in- and out-link data using a hyperlink crawler and analyzing the directed hyperlink data between Web sites in the LIS field using different techniques, this study tried to explore the structure of the Web space in the field of LIS. Also, this study investigated which Web sites play a central role in communication in the Web space, along with the dependency between Web sites. As an exploratory study, the entire breadth of LIS-related Web sites was not covered.
2. RESEARCH QUESTIONS
This study attempts to address three research questions in relation to Web space analysis of the LIS field:
1) Which sites will be located in the central region of the LIS Web space, as measured by link analysis?
2) What types of dependency patterns emerge among the Web sites?
3.1 Data collection
Twenty-four Web sites were selected, which include relevant scholarly and professional organizations, government institutions, and American LIS schools. For scholarly and professional organizations, representative organizations were chosen such as the American Library Association (ALA), the American Society for Information Science & Technology (ASIS&T), and the International Federation of Library Associations and Institutions (IFLA). Fourteen LIS schools accredited by ALA were also selected. Table 1 shows the 24 research objects in this study, and the ID (code) used in an italic font to represent the Web site of the institution.
The list of 24 research objects
In- and out-link data for the 24 Web sites were collected using a Web crawler that was designed to collect the hyperlinks to external domains. The data collection was conducted in February 2009. The crawling depth was set to the fourth level in the study. Even though the fourth level of depth did not necessarily reflect the whole structure of Web site, it would be acceptable for an initial investigation of this topic.
3.2 Analysis techniques
For this pilot study, two approaches were identified.
1)In-link and Out-link Visualization: First, to understand the overall Web space structure of the LIS field, the in- and out-link data were visualized using Netdraw software (http://www.analytictech.com/Netdraw/netdraw.htm), which was developed for social network studies.
2)Degree Centrality Test: For more detailed investigation of the characteristics of each node, a degree centrality test was included. Centrality in network analysis can be used to quantify an individual node?s prominence, influence or dependency using in-degree and out-degree measures. In-degree refers to the number of in-links, whereas out-degree refers to the number of out-links. In network analysis, in-degree represents how prestigious a node is because it indicates how many links a node receives from other nodes.
An in- and out-link matrix was creating using the external link crawler. The total number of hyperlinks (in-links and out-links) among the twenty four nodes up to the fourth level of each site was 7,898. Table 2 summarizes the descriptive statistics of the asymmetric in- and out-link matrix. The average number of hyperlinks between two nodes was 14.31 with a standard deviation of 41.39. The ala node had the largest number of in-links and out-links, totaling 2,156 and 1,147 links, respectively. This reveals that the ALA site is the most frequently linked with others in the LIS Web space. The second largest node was loc , with 1,247 in-links and 1,122 out-links. Third was asis&t with a total of 1,096 in- and out links. The matrix revealed that the average number of in- and out-links was larger in organizational nodes rather than in school nodes. The average number of links of each organizational node was 474.13, whereas the average links of each school node was 212.65. From this we can deduce that the organizational nodes have more in-links (6,683) than out-links (4,222), while the school nodes have more out-links (3,676) than in-links (1,215).
Descriptive statistics of in- and out-link asymmetric matrix data
Figure 1 shows the visualization of the links among the twenty-four nodes. For the layout of nodes, the spring embedding algorithm, which provides an easily legible layout based on node repulsion concept, was applied with equal edge lengths. Since the spring embedding algorithm method considers distance as dissimilarity between nodes, we could interpret similarity among nodes from the output. In this figure, the red node indicates the organizational node and the blue node indicates the school node, and the size of node suggests the number of in-links. The arrow line indicates the link and its direction, and the width of the line reflects the frequency of links between two nodes.
The most notable finding was that most organizational nodes such as ala , asis&t , oclc , loc and ifla were located at the center of the space, whereas most school sites such as slis.cua , ci.fsu , slis.kent , lis.simmons , sis.utk and is.pitt were located at the periphery. A relatively distinct core/peripheral structure was apparent, where the core area consisted of organizational sites and the peripheral area consisted of school sites. Then the most prominent node was ala which had the largest size of node and the most frequent connections with other nodes. This suggests the ALA site serves as a core communicator in the LIS Web space. In addition, we can see the ala node sends and receives links to all other nodes, and the widths of those lines are relatively thicker. Furthermore, the four representative LIS related organizations, ala , asis&t , oclc and loc , located in close proximity to one another. Also, we can observe that archives and saa , which relate to the archives field, are located adjacently.
A linkage visualization of a Web structure of the LIS field in America
To investigate the dependency patterns in the LIS Web space, the Freeman?s centrality degree index values were calculated for each node. The higher in-degree centrality indicates a node gets higher attention by other nodes, whereas the higher out-degree centrality implies a node is more like to be dependent on other nodes in a Web space. Table 3 shows the degree centrality in the LIS Web space. Regarding the in-degree centrality, the top eight nodes turned out to be the organizational sites. Similar to the outputs of the descriptive analyses, we see that the organizational sites receive higher numbers of in-links from other sites. Also, we find that most organizational nodes have higher in-degree centrality than out-degree centrality. Conversely, we find that most of the school sites have higher out-degree centrality than in-degree centrality. This pattern implies that the school sites are dependent on other sites in the Web space. In particular, the slis.indiana node ranks third in out-degree centrality but fifteenth in the in-degree centrality, which indicates high dependency. From this degree centrality analysis, we are able to confirm the dependency propensity of school sites on organizational sites.
Freeman's degree centrality measures
This study investigated the Web space of the LIS field using social network analysis techniques. The findings show that the organizational sites have more in- and out-links than school sites. The visualized network diagram of the LIS Web space imply most organizational sites are located near the center region, whereas most school sites are positioned in peripheral areas. Also the centrality analyses reveal that higher prominence of organizational sites than school sites and higher dependency propensity of school sites than organizational sites.
The findings of this study yield some insights in Web site evaluations, especially how to incorporate Web space structure in a specific field to evaluate the importance of Web sites in a specific field. How a specific Web site is situated in a Web space suggests a dynamic approach to evaluate the importance or influence of Web sites in relation to other sites. Also, the application of centrality measures such as degree, closeness and betweenness centrality utilized in social network analyses is useful for evaluating Web sites.
This study also has its limitations. First, even though the sample represents a selection of organizations and schools from the LIS field, the number of research objects could be enlarged to provide a more complete picture of LIS Web space. Second, the data did not represent the entirety of each site since the limitation of the crawling level. For more complete analysis, the depth of crawling could be deeper to reflect the full content of Web sites. Based on this pilot study, further research to implement Web space analysis with a larger dataset is anticipated.