Felix Parker, Kristen Nixon, Sonia Jindal
This project develops an interpretable system for detecting misinformation on Twitter. We train models that use the content of a tweet and its metadata to classify it as either misleading or not misleading, along with a corresponding confidence score, and provide various interpretations of the predictions. We construct a new dataset for this purpose from subset of the Twitter Community Notes dataset and additional news-related tweets.
To run our system first install the required packages in requirements.txt. Then run the scripts in this repository in the following order:
Data Processing:
- data/community-notes/community_notes.jl
- data/community-notes/fetch_tweets.py
- data/community-notes/format_tweets.py
- data/news-tweets/fetch_news_tweets.py
- data/news-tweets/format_tweets.py
- data/combined/combine_datasets.jl
- data/combined/generate_splits.py
- data/twitter-users/get_users.py
- data/twitter-users/format_users.py
Models:
- models/engagementscore/engagement-model.py
- models/userscore/user-model.py
- models/linkscore/fetch-linkscores.py
- models/linkscore/link-model.py
- models/textscore/textscore_train.py
- models/textscore/textscore_inference.py
- ?
User Study:
- userstudy/data/fetch_tweets.py
- userstudy/data/format_tweets.py
- userstudy/backend.py
- userstudy/analysis/database_to_csv.py