When computer scientists analyze social networks, they focus mainly on individual users, the relationships or interactions that connect them, and the timing of their interactions. Shaobin Xu believes there’s a rich source of untapped data to mine for additional insights: the text that flows among users.
Xu’s research examines two types of text: online content and the citation networks built through academics acknowledging the work of others. Links between social actors in social networks are particularly revealing because they point to who is influencing whom — powerful information when it comes to viral marketing or understanding how social attitudes and policies evolve, such as the growing acceptance and subsequent legalization of same-sex marriage in the U.S. His investigations lie at the intersection of social network analysis and natural language processing — the branch of artificial intelligence that helps computers understand, interpret, and manipulate human language.
Xu and his fellow researchers look to a decidedly old-fashioned medium to understand the flow of modern information. In the pre-intellectual property days of the 19th century, newspaper and magazine editors across the country unabashedly cribbed and printed each other’s content. Xu has been involved with the Viral Texts Project, a joint endeavor among the University’s computer scientists, English scholars and historians, and a great example of the collaborative spirit that attracted him to Northeastern.
With three internships at Google under his belt, Xu hopes to eventually conduct research there or at another large and innovative technology company.