This project built a machine learning model to predict fake news related to Covid-19. It applied the model to covid-19 tweets in three countries (United States, UK and India) to detect fake news in each country. The project also applied topic modelling to find dominant topics in fake news in each country.
This project contributes to our knowledge in the field of communication and health care. This project built a machine learning model to predict covid-19 misinformation and can be used to detect fake news in Twitter. In addition, the study deepens our understanding of dominant topics of covid-19 misinformation in social media and how it differs by country. The result can be helpful in detecting and preventing the spread of misinformation on social media.
None
The RCF developed strong awareness of opportunities and experiences involved in research computing -- something the student was completely unaware of previously.
The student involved learned to use High Performance Cluster and request proper resources needed. In addition, the student learned to run batch jobs when dealing with high volume of data. He plans to organize his code and share his code with the public so that more people can benefit from this experience.
None.
None.
None.
As mentioned previously, this project is helpful in detecting and preventing the spread of misinformation on social media and will reduce potential negative impact of social media on society.
The student working on this project was able to learn start-of-art natural language processing algorithms, learn to use GPU cluster, and run batch job. However, some jobs still took about more than 24 hours to run. A better approach needs to be developed to scale the data better in the future
The project trained a model to predict fake news and apply the model to covid-19 tweets collected between March 2020 and May 2022 in three countries (USA, UK and India). The results of topic modelling show the dominant topics in fake news in the US are related to Covid Symptom, Politics, Covid Treatment and Cases /Lock-down, the dominant topics in real news are Mask Mandate/Social Distancing, Covid Statistic, and Politics. The model has trouble distinguish between fake news and real news for India dataset due to limited training data available for that country.