GenAI for Good: The Misinformation and Veracity Engine

By Calvin Nguyen, Samantha Lin, Dr. Arsanjani

Our Info

Calvin Nguyen: Linkedin | Github
Samantha Lin: Linkedin | Github

GenAI for Good: The Misinformation and Veracity Engine

Abstract

Picture of Chenly Insights
The objective of our product, Chenly Insights, is to help users combat misinformation using factual factors and microfactors. As misinformation gets easier to spread due to deep fakes and large social networks, this project empowers users by giving them a veracity (truthness) for the news they receive through a Mesop interface. A user will link and upload a news article of their choice. Then, two different types of AI, generative and predictive, judge the article on appropriate factuality factors. The scores are accumulated and then shown to the user, along with other visualizations related to the article. In addition, these scores, along with metadata related to the article, will be uploaded to a vector database that generative AI can use. This vector database is constantly updated with new information from fact check and news websites. Users can also converse with the AI and ask questions about each article and why a particular score was given. We also found the best prompting types with adjustments to the prompts that gives the highest accuracy.

Dataset

We utilized multiple different data sources to both establish the ground truth and understand factors that indicate misinformation. Generally, the more context and information you give a large language model, it can give more nuanced and thoughtful judgements on a piece of text. The data is as follows:

Liar Plus Dataset: A publicly available dataset containing labeled statements from news sources with contextual information. After preprocessing, we have a dataset of 10,234 rows and 15 columns, with a key truthness labels six levels from “pants-on-fire” to “true”. Primarily used in our vector database and predictive AI training.
\
Politifact Factchecks: A website that factchecks articles and statements made by leaders. Used to help set the ground truth of a topic.
\
Snope Factchecks: A website that factchecks articles and statements made on the web.
\
User-Selected PDFs: Article links or PDFs are analyzed with this engine and assigned a truthness label afterwards.
\
SERP API Search Results: Google search results based on the title of the user-selected articles to provide the veracity engine with more information.
Factuality Factors: These factors determine the veracity of an article and we analyze four of them for each article: sensationalism, stance detection, social credibility, naive realism.

Flowchart

Our flowchart in Lucidchart showcasing the data journey thorughout this process.

Generative AI Methods

Tool: Google Gemini 1.5 Pro 002 Picture of Genai FC

Where and how we used it:

Extracting key attributes from the user-uploaded text (e.g., speaker, context, title) to support predictive AI modeling.
Evaluating sensationalism and political stance via structured prompts.

Prompting techniques and their objectives:

Normal prompting: asking a baseline question for a general response from Gemini given a factuality factor.
Chain of Thought (CoT): asking the model to explain its response, allowing it to reason to get a more accurate response given a factuality factor.
Fractal Chain of Thought (FCoT): asks the model to go through iterations to explain its response and reasoning and to check what it missed on its previous iterations, to get a more well-rounded and accurate response given a factuality factor.

Additional information we provide to Gemini in our prompts:

Serp API: search for related articles using Serp API, and use this information as a “ground truth” for Gemini to refine its evaluations.
RAG via ChromaDB: Extract top three related saved contents from websites such as Politifact and Snopes by HTML parsing for Gemini to gain extra information on similar topics.

Predictive AI Methods

Tools: Python, HuggingFace, XGBoosted Decision Tree, Pandas, PyTorch

Built and trained different models for different factuality factors

Naive Realism: used Spacy Textblob and a HuggingFace sentiment analysis model to calculate confidence and subjectivity, and trained the model using an XGBoosted Decision Tree.
Social Credibility: Cleaned Liar Plus Dataset, one hot-encoded the speakers, context, and party affiliation, and finally built and trained a neural network model.

Website

We created a product, called Chenly Insight, a front-facting website for users to upload articles and grade them based on their perceived veracity. We utilized the predictive AI and generative AI methods listed here. If you would like to see our Figma link for the product and the poster, here it is: Link. We have a couple of philosophies for our website listed below:

Name: The name is based off the Chinese and Vietnamese word for truth, as we want our users to feel like they are discovering the truth with our product.
Color Scheme: We chose a blue to green color scheme and gradients, as it communicates trustworthiness and veracity. Additionally, we went with white, instead of black for our backgrounds, as our model is not a black box. We try to communicate all of the prompts and deciisons the model has made and have understand it through our deep analysis page
Pages:
- Home: Here, you can read snippets about our various sections and see example outputs of our tool on different types of articles.
- About Us: Here, you can learn more about the team that worked behind the project, reasoning for making the project, and for reading our report.
- Prompt Testing: See the various prompting techniques and adjustments. You can see what examples of questions we ask to our generative AI. Addiitonally, you can upload your own article and grade them here
- Pipeline Explanation: You can see an explanation of our pipeline, along with a lucidchart of our pipeline
- Chenly Insights: The main tool for our product. Here, you can pick the prompting type, the adjustments you would like, and the factuality factors. After that, upload an article and then submit it for analysis. This process will take 1-2 minutes because we are asking prompts to Generative AI and running Predictive AI. Afterwards, you get results about your article and whether it is pants-on-fire to true. From there, you can choose to do a deep analysis and find more information about the article.
  
  \

Results

Picture results \

Accuracy Table for two predictive AI Methods on different factuality factors
\
Prompting Types with Adjustments with their MSE
\
Scores given by human vs prompting techniques on articles bar chart

Discussion

In our project, we’ve attempted to follow previous work’s suggestions by combining predictive and generative artificial intelligence in detecting misinformation. We successfully developed a simple hybrid system that scores articles’ veracity, and this is a significant step forward in addressing the complex challenge of misinformation in a digital age. This product can be run by anyone using our system.

There are a couple issues with this current process though. The MSE is highly based on the human scores, which is only based on a sample size of 4. Ideally, we would like a team of 20 expert human graders to grade each article on a factuality factor to get an article’s true score.

Our hypothesis that FCoT with VB and SERP would receive the lowest MSE is wrong. Normal prompting with VB and SERP performed the best. This could be due to a couple reasons. Looking at the bar graph, it seems like FCoT struggled to judge our onion article accurately due to its satire, despite doing well for all the other articles. Additionally, we did notice that FCoT did grade each article more critically and provided more reasoning in relation to the microfactors, so it is possible that human graders did not notice that.

Future Direction

Combine generative and predictive AI for a single factuality factor using agents like CrewAI.
Adjust prompting techniques to help LLMs achieve more human-like scoring through longer and more specificized prompting for FCoT.
Expanding more data sources, such as Washington Post Fact Checker, for our vector database.
Make the website live via Google Cloud and a user inputted API keys and run jobs to perform automated scraping for our database.
Implement more factuality factors into our model to grade articles more critically.

Data Ethics

Data:

Liar plus dataset proven to have high-quality data according to the UC Berkeley Library.
Politifact and Snopes are also high-quality data approved by UC Berkeley Library
That being said, we’re confident that the data used within our project creation process is ethical and accurate.

System:

our system will not always provide the most accurate representation of the veracity.
we will mention in our system to warn the users about the information and scores they get from our system will not be 100\% correct.
Users should only use the results as a piece of information in addition to their judgment.

Acknowledgements

Calvin and Samantha thank Dr.Arsanjani for his mentorship and guidance throughout this project and other groups within section B01 for debugging and advice with coding throughout the project. We would also like to thank the rest of the capstone group (David Sun, Eric Gu, Eric Sun, Jade Zhou, Luran Zhang, and Yiheng Yuan), as they helped with bouncing ideas and keeping our group accountable

References

Jiang, Bohan, Zhen Tan, Ayushi Nirmal, and Huan Liu. 2024. “Disinformation Detection: An Evolving Challenge in the Age of LLMs.” arXiv preprint arXiv:2309.15847. [Link]
P. Qi, W. Hsu, and M. L. Lee, “Sniffer: Multimodal large language model for explainable out-of-context misinformation detection,” ar5iv, https://ar5iv.labs.arxiv.org/html/2403.03170 (accessed Nov. 3, 2024).
Arsanjani. Ali, ”Alternus Vera,” https://alternusvera.wordpress.com/veracity-vectors-for-disinformation-detection (accessed 2024).
Tariq60. 2018. “LIAR-PLUS.” Oct. [Link]