Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents
Title: | Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents |
---|---|
Format: | Journal Article |
Publication Date: | March 2024 |
Published In: | Quantitative Science Studies |
Publisher | MIT Press |
Description: | We put forward a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. The approach is used to discover public value expressions in patents. Using text (5.4 million sentences) for 154,934 US AI patent documents from the United States Patent and Trademark Office (USPTO), we design a semi-automated, human-supervised framework for identifying and labeling public value expressions in these sentences. A GPT-4 prompt is developed that includes definitions, guidelines, examples, and rationales for text classification. We evaluate the labels and rationales produced by GPT-4 using BLEU scores and topic modeling, finding that they are accurate, diverse, and faithful. GPT-4 achieved an advanced recognition of public value expressions from our framework, which it also uses to discover unseen public value expressions. The GPT-produced labels are used to train BERT-based classifiers and predict sentences on the entire database, achieving high F1 scores for the 3-class (0.85) and 2-class classification (0.91) tasks. We discuss the implications of our approach for conducting large-scale text analyses with complex and abstract concepts. With careful framework design and interactive human oversight, we suggest that generative language models can offer significant assistance in producing labels and rationales. |
Ivan Allen College Contributors: | |
External Contributors: | Gaurav Verma, Barbara Ribeiro |
Citation: | Pelaez, S., Verma, G., Ribeiro, B., & Shapira, P. (2024). Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents. Quantitative Science Studies, 1-26. |
Categories: |
|
Related Departments: |
|