We’ve all wished that some people on social media would use a fake news detector before sharing, and now an NJIT computer science student has built one that works pretty well.
Natalia Smith, a junior from Newark, said her fake news detector application has performed with up to 90 percent accuracy in evaluating COVID-related tweets for truthfulness.
Over time, her software learned that tweets such as “vaccines contain biochips” are probably fake, while those stating “FDA authorizes Pfizer vaccine” are probably true. She concluded that unverified accounts produced a 50/50 split of true and false COVID tweets, while verified accounts had 70 percent true posts compared to 30 percent false ones.
Smith explained that the work began when she sought a summer research project for her role in NJIT’s McNair program which helps underprivileged students work toward graduate-level education. Machine learning interests her, so she connected with Ying Wu College of Computing Prof. James Geller, interim chair of the Department of Data Science, who has done his own important work on COVID terminology. Geller introduced her to Ph.D. student Chih-Yuan (Alen) Li. From him she learned about a popular data science modeling technique called Bidirectional Encoder Representations from Transformers, or BERT — and yes, she said, there’s also one called ERNIE.
“It takes an entire sentence and it would extract information from both the left and right side of each word. That allows the model to learn word meanings based on context,” Smith explained.
Smith wrote her own software, largely through the Python language, in order to train the model and count the results. She opted for a 100,000-tweet dataset which she found on Google’s Kaggle.com data science community. She divided it into two files for verified and unverified accounts, and then inputted several thousand tweets at a time from each file.
In the near term, Smith added that her goals include publishing the fake news detector online and adapting her research for an online satire detector. The latter is a legitimate problem when people misunderstand news that is fake by design, intended for humor and social commentary, rather than for malice. Surely, it will be popular among dry-witted Highlanders (but don’t call her Shirley.)
Smith’s long-term goal is to work in backend web development, machine learning, or both, while going for an as-yet undetermined advanced degree.
Geller said he’s confident that she can achieve a doctorate. “Natalia is mature beyond her age and beyond her stage in her academic career. She has a natural understanding of what it takes to do successful research,” he said. Whether she heads for a Ph.D. or not, he said, “She has a great future in computing ahead of her. She formulated this hypothesis and supported it by data, without any guidance from me. That is really impressive.”