Tutorial 6: Natural Language Processing Tools for the Digital Humanities

Half-day tutorial
Time: Sunday, June 19, 1:00 – 4:30 p.m.
Room: Meyer Library 220 (Flex Classroom)

Christopher Manning

Topics: natural language processing, semantic analysis, text analysis, content analysis
Keywords: natural language processing, software tools

Large and ever-increasing amounts of text are now available digitally from many sources. Beyond raw text, there are also increasing troves of text annotated with various kinds of metadata and analysis. This data provides new opportunities in the humanities to do different kinds of analyses and at different scales, some of which blur the boundaries between the traditional analytical and critical methods of the humanities versus empirical and quantitative approaches common in the social sciences. Since texts are central to the humanities, a key opportunity is in “text mining” – making use of computers for analyzing texts, and it is here that there is much opportunity for the use of tools from Natural Language Processing. The last two decades have also seen the field of Natural Language Processing refocused on being able to process and analyze the huge amounts of available digital speech and text, partly through the use of new probabilistic and machine learning methods. This has led to the development of many robust methods and tools for text processing, many of which are within reach of the ambitious practitioner, and often are available for free as open source software.

This tutorial will survey what you can do with digital texts, starting from word counts and working up through deeper forms of analysis including collocations, named entities, parts of speech, constituency and dependency parses, detecting relations, events, and semantic roles, coreference resolution, and clustering and classification for various purposes, including theme, genre and sentiment analysis. It will provide a high-level not-too-technical presentation of what these tools do and how, and provide concrete information on what kinds of tools are available, how they are used, what options are available, examples of their use, and some idea of their reliability, limitations, and whether they can be customized. The emphasis will be at the level of what techniques exist and what you can and can’t do with them. The hope is to empower participants in envisioning how these tools might be employed in humanities research.

