Language Meets Tech in Georgia Tech’s Linguistics Program
To the casual observer, linguistics might seem to be a field obsessed with words. And it is. But there’s nearly as much math and science as humanities in the field’s DNA, making the scientific study of language in the School of Modern Languages a natural stop for some STEM-focused students intrigued by the subject’s rigor, and its potential to give them an edge in a tricky job market.
"I think of it less as a way to stand out and more as something that opens up work I couldn't do otherwise,” said computer science major Daniel Huang, who is pursuing a Minor in Linguistics. "It’s relatively easy to train a model with basic coding experience, but few people have training in how to interpret results from a linguistic standpoint, changing the angle from which I’d work on a problem and opening new opportunities for me."
For instance, Huang combined his computing prowess with the linguistics skills he’s learned in the School to understand the evolution of the invented language Toki Pona. The language has just 130 words, making it an ideal candidate to study how language changes, Huang said.
He built automated scripts to clean and filter millions of Discord messages in the language, then used linguistic theory to build a custom parser that could navigate Toki Pona’s fluid word classes, ultimately mapping shifts in usage among the language’s users. Huang is presenting a paper on his findings at a linguistics conference this summer.
It’s this sort of interdisciplinary approach that can help set computing students apart in a tumultuous job market, linguists in the School say.
"Language is not the same as numbers in a spreadsheet. It is ambiguous, contextual, variable, and tied to real speakers and communities, and it helps students ask better questions about AI and other formal systems,” said Bo Kyoung Kim, a linguist and lecturer in the School’s linguistics program.
For Kelly Smith (BS CS 2024, MS CS 2025), who now works at Visa as a software engineer, linguistics served as a vital differentiator when she was competing against a sea of identical resumes.
"Having a genuine interest outside of the typical technical curriculum often gave interviewers something memorable to ask about," Smith said.
But it also gave her practical skills.
"Computer science taught me how to build systems; linguistics helped me think more deeply about the people who use them."
Understanding Context, Fixing Hallucinations
For all the discussion about how artificial intelligence has shaken up the market for coders, there’s been little analysis of how linguistics may offer a crucial differentiator for students interested in working with AI.
“If your code is going to process language as its input/output, it's helpful to understand how language works,” said Assistant Professor Lelia Glass, director of linguistics. "For example, the difference between word types and word tokens, the difference between inflected forms and lemmas, the fact that language can be ambiguous, that kids speak with a higher-pitched voice than adults; that different people have different accents; and so on. All this knowledge is important for designing tools that work with language data.”
While large language models that power services such as ChatGPT, Claude, and Gemini are trained on language data, many of the big advancements in AI have been fueled by mathematics. But now, as developers seek to tackle the problem of hallucinations and extend AI’s utility to languages that are smaller or whose complexity eludes AI, the value of linguistics is coming back to prominence.
Where math built AI’s remarkable fluency, such systems continue to struggle with facts. Hallucinations happen in part because math-based models are essentially just complex statistical probability engines. Engineering reliability and the ability to understand context, social clues, even tone, into these systems demand more than just accumulating additional data, linguists say. They argue it requires teaching models to distinguish between the statistical probability of a sentence and its factual accuracy or context. And designing the frameworks to do that is exactly where linguists come in.
Extending AI’s utility in smaller languages with fewer written texts or ones in which tone or other complications can make it particularly challenging for AI to fully understand is another challenge software engineers with linguistics training can help solve. With English, developers have a vast supply of texts on which to train AI. But even with some languages spoken by millions — such as Swahili — AI systems either struggle to penetrate the market or fail to grasp cultural and linguistic nuance.
In some cases, AI is actively causing problems. With languages like Greenlandic or Inuktitut, poor machine translations are flooding the web and finding their way back into AI training data, according to MIT Technology Review.
This is exactly the type of problem Zach Hopton (ALIS, Psychology, 2021), who is now pursuing a
Ph.D. in computational linguistics at the University of Zurich, is working to solve. Hopton recently helped build a translation tool for Romansh, a regional Swiss language. To make up for the lack of massive training data, Hopton used his linguistic training to break down Romansh words into their building blocks — called morphological chunks — explicitly revealing the shared roots between Romansh and high-resource languages such as Italian.
"The goal is to find creative solutions for situations when not much data is available, and relying on linguistic knowledge is often a useful tool for that," Hopton said.
He added that beyond building systems, linguists are vital for evaluating them.
“Linguists are particularly good at finding these systems' breaking points,” Hopton said.
Expanding the School’s Linguistic Impact
The School of Modern Languages is expanding its ability to bridge gaps like these, producing graduates competent not only in linguistic theory, but also multiple languages, according to Glass.
“My goal is to establish a flourishing symbiosis in language and technology while taking advantage of our unique situation in the School of Modern Languages to really emphasize the languages dimension of linguistics — namely, training linguists who are actually competent in multiple languages of the world,” she said.
Part of that plan involves a potential bachelor’s degree in linguistics that could be easily paired with computer science as a double major, Glass said.
The linguistics team is also expanding, hiring a scholar currently at Carnegie Mellon University who uses computational, acoustic, behavioral, and psycholinguistic and neurolinguistic methods to study language.
The School has also had success exposing high school students to linguistics through its hosting of the North American Computational Linguistics Open Competition (NACLO).
“I initiated NACLO at Georgia Tech in 2021, and we have been hosting this annual event for high school students engaging in computational linguistics competition since then. We have seen increasing numbers of participants in this event year by year,” said Assistant Professor Hongchen Wu. “These events inspire and motivate students to pursue careers in language-related fields and contribute to the development of innovative solutions to language-related challenges. “
For Glass, the linguistics program’s expansion represents a shift in how universities should think about the boundaries of education. The integration of linguistics and computer science challenges the frequently unhelpful division between the liberal arts and STEM disciplines, she said.
"We don't need to classify it as humanities or STEM when I think it combines the best parts of both: the rigorous methods of STEM and the humanistic aim of understanding our own experience in the social world," Glass said.
As AI continues to scale, the developers who will truly stand out are the ones who recognize that beneath every data set is a human voice, these linguists say.
"For STEM students especially, I think that kind of training is valuable," Kim said. "It helps them build technologies, analyze data, and communicate with people in ways that are more precise, inclusive, and attentive to human complexity.”
Interested in Linguistics?
Georgia Tech's School of Modern Languages offers a 15-credit Minor in Linguistics and a 12-credit undergraduate certificate, both in partnership with the School of Interactive Computing and the School of Psychology. Linguistics courses offered by the school range from an introductory class that satisfies the humanities requirement to more advanced classes in advanced language process and language and computers.
The Minor in Linguistics teaches students to understand physical structures and mental processes involved in human language; psychological, neurobiological, and societal factors that enable humans to acquire, use, comprehend and produce language; and application of computer science to the analysis and synthesis of language and speech.
This certificate is an excellent complement to the study of computer science, psychology, or a specific foreign language. It is designed for undergraduates interested in theoretical or applied linguistics, culture, psychology, or language technology, and for students pursuing careers in language education, educational design, language technology, and human/computer interaction.
Past research opportunities have involved unlocking the decline of Southern accents, while a current Vertically Integrated Project is looking at how young people perceive and interpret emotional timbre in human voices compared to AI-generated voices. Get involved!