Digital information and data
Digital information and data play complex roles in research in the humanities and social sciences (SWR 2003; Arzberger, Schroeder et al. 2004; Boonstra, Breure et al. 2004). This creates particular challenges for the application of e-research methods and techniques, especially if complex and fuzzy data sets are involved (eg. visual data, music, complex texts). The increased availability of digital resources, data and collections, partly the result of digitisation of cultural heritage and of administrative databases, promises to facilitate more possibilities for comparative research. There may be more scope for interdisciplinary research that is based on the combination of data from very different types of sources. Questions that until recently could only be dealt with in a speculative way may now be approached by data-oriented empirical research. Re-use of data may become more prominent (SWR 2003). The capacity to process and visualise huge datasets is moreover expected to create additonial opportunities for empirical research with the help of new computational research methods. In short, both in the humanities and in the social sciences new objects of research, which we call “epistemic objects”(Rheinberger 1997), will emerge. This development is parallel to the creation of new experimental arrangements in e-science.
The research in this theme will address the question what the characteristics of these new epistemic objects will and should have, and how they may reconfigure scholarly research. What type of questions will be foregrounded and which questions may become less central? Which assumptions are built into the new epistemic objects and how may they influence the boundaries between scientific specialties? We will also pay attention to the specificity of qualitative data. They are often more fuzzy and less easy to standardise. This also influences the development of research traditions to share qualitative data for comparative (re)-analysis (Wouters and Schrˆder 2003).
The Studio research will strive to complement existing research into scientific and scholarly data and data standards by focusing on the epistemic and social role of data and data sources in the humanities and social sciences. Purely technical research into data and meta-data formats is the domain of expertise of computer and data science departments in the universities. Where a joint effort seems fruitful, we will seek cooperative research with research teams in information and computer science (eg. CWI and the Telematics Institute). In the area of informatics for the humanities, we will seek collaboration with humanities computing research groups in the Netherlands and abroad, and with the R&D departments of data archives and repositories.
To provide a sharper focus on the particularities of data handling in the social sciences and humanities (Hockey 2000; SWR 2003; Boonstra, Breure et al. 2004), the research in this theme will maintain a firm comparative perspective with the natural and technical sciences. This will also enable the Studio researchers to be alert to new developments in data science and technology. For example, in those fields that have undertaken major digitisation projects, how does e research change the way data is conceptualised, handled and shared? And how do disciplinary communities organise their work around digitised data, eg. do practices become standardised or do field differences persist? In this respect, the comparison of the development of data initiatives in the humanities with ¥data grids¥ in the social sciences seems relevant.
The data theme will also pay specific attention to the issue of data sharing and data sharing policies. This research is based on the completed Nerdi projects on data sharing (Wouters 2000; Beaulieu 2003; Wouters and Schrˆder 2003; Arzberger, Schroeder et al. 2004). The emergence of e-research creates specific tensions for data sharing, partly because it may no longer be clear who has control over the data sets. Increased attention to data sharing, also in the framework of the organisation of new data archives in the social sciences and humanities, may create tensions with established research practices and routines that are often not oriented to data sharing. The Studio will therefore not only study data sharing but also resistance to data sharing.
The flood of Web data poses a new challenge to social science and cultural analysis which cuts across the divide beteen quantitative and qualitative data. The Studio will organise a Webometrics Collaboratory within the theme Data and Digital Information to enable the rapid mobilisation of existing international expertise in this area.
The last decade has witnessed an increase in quantitative methods using Web data and in sophisticated quantitative analyses of the structure of the Web and the internet (Ebeling and Feistel 1990; Adamic 1999; Watts 1999; Albert and Barabasi 2002; Scharnhorst 2003). This has even led to the establishment of a new field in the information sciences, “webometrics” (Almind and Ingwersen 1997; Rousseau 1997; Boudourides, Sigrist et al. 1999; Bjˆrneborn and Ingwersen 2001). Web data can be used to analyse the internet and the Web as a complex information space in which communication patterns emerge and self-organise (Leydesdorff 2002). Webometrics can also be used to study the change of institutional structures (by means of hyperlink analysis) and the emergence of new institutional structures and infrastructures. Changes in scientific production and communication can be studied in so far as they can be represented in Web based indicators. We expect that webometrics will also contribute to our understanding of the emergence of new forms of Web based scientific communication and collaboration, such as related to e-journals, collaboratories, online databases, file sharing and collaborative simulations. Indicators developed on the basis of Web data can have both an evaluative and descriptive role. In this collaboratory, they should primarily provide insights in the nature of knowledge production in e-research.
The research in this theme builds further on recent European research projects in webometrics, in particular on WISER and EICSTES . It will extend the research questions in these projects toward a “reflexive webometrics”. It aims to develop novel methods for automated data gathering (with open source web crawlers, commercial software, Web page annotation schemes, and search engine tools) and to contribute to the development of professional standards to observe the dynamic Web. We expect that this will lead to analytic tools that can be used by other researchers in the social sciences and humanities without the need for additional programming expertise (Thelwall 2001; Thelwall 2002). We expect that these methods will be particularly successful if they are intimately related to qualitative and quantitative content analysis of Web phenomena. For instance, hyperlink network analysis has shown interesting topological features in graph theory. It is, however, still far from clear how these graph theoretical structures can be interpreted. An important aspect of future research in webometrics will be the development of dynamic observation based on the self-organizing and fluid nature of the web as a medium. New insights of complexity theory into the description of complex structures will have to be taken into account in this research.
Research is not only about data, it is also about collaboration. It is expected to facilitate new forms of large-scale collaboration and more collaboration across the boundaries of disciplines and specialties (Finholt 2002; Walsh and Maloney 2002; Berman, Fox et al. 2003). Many e-science pilots are moreover the fruit of intense cooperation between academia and industry.
The humanities and social sciences are a particularly interesting area to study the development of scientific and scholarly collaboration because the variation of forms of collaboration and non-collaboration is so huge (Fry 2003b). Virtually every possible configuration is practiced in one field or another. The spectrum goes from the traditional, lone scholar working in a decidedly low-tech environment to the tight industrially organised group in which each PhD student and postdoc solves a particular problem. This means that comparative fieldwork in the humanities and social sciences is very rewarding. The same is true for the forms in which scholars and researchers choose to communicate their work and results to larger audiences. The Studio research in this theme will focus on the way the new media interact with forms of collaboration and communication. It moreover aims to support them with building new forms of collaboration (eg. collaboratories) and communication (eg. new Web site conceptions).
A key issue concerns the ways the dynamics of collaboration are affected by mediation by new media and digital networks. How does the technological possibility intersect with traditional human need for communication? The implications of collaborative work for the resulting knowledge products will also be studied. Are forms of knowledge affected by the way they need to be communicated? Which types of intellectual work seem amenable to virtualisation and digitisation? How does audience variation across disciplines shape collaborative practices and the integration of e research? Does the organisation of research change, as units within a field become more dependent or more specialised? An interesting question is in what ways the dynamics of very large-scale collaboration differs from more modest networks and how this affects the development of scientific collaboration in the humanities and social sciences. An important question is also how e research shapes the traditional boundary between informal and formal communication across fields. Answers to these questions affect the way we think about the design of tools for collaboration. This is for example relevant to the construction of collaborative analytic instruments.
Within this theme, the use of the Web as means of representation and collaboration will be a specific focus. This will intersect with the work in the methodological focus on Web archiving. The creation and dynamics of collaboratories will be a priority within this theme. The collaboratories of the Studio themselves will be monitored in order to draw lessons about their dynamics.
In e-research, digital infrastructures and emergent institutions play a crucial role (Bowker and Star 1999). Collaboratories, research infrastructures or the lack thereof, digital libraries, digital repositories and collections, and new venues for scholarly publication directly influence the extent to which scholars in the humanities and social sciences can effectively make use of new research possibilities. Given the recent emergence of e-research, the consequences of the accompanying institutional rearrangement are not yet well understood. It is therefore relevant to understand the specificities of institution building in the humanities and social sciences. The theme Virtual Institutions will explore which institutional arrangements are conducive to the humanities and social sciences.
Important questions are how textual infrastructures affect the textual practices of researchers and scholars. Does it make a difference that digital infrastructures are also forms of writing? Standardisation and ordering of these infrastructures, such as computer interoperability or database standards, have a tremendous impact on the work of scholars. To what extent can they influence these processes if they are implemented at a higher level of organisation (such as the university or a data repository)? For example, how does infrastructure sustain various levels of formalisation and circulation of knowledge and information?
In this theme, specific attention will be paid to the systems of accountability in universities and research institutes. How universities and research institutes have organised their systems of quality control and accountability may have a profound effect on knowledge creation because of its impact on the criteria of scientific and scholarly quality and integrity. Does e-research go together with new ways of assessing research performance and output? In what ways do new research practices create problems for existing peer review and visitation procedures? How will individual careers be judged in very large-scale collaborative research institutes and networks? How do the forms of knowledge evolve in e-research and are particular practices hampered by the way researchers are being assessed? And in what ways are internet based information systems being used by the institutions of accountability?