Skip to main content

Vector representation with a finance corpus

Project Information

bash, batch-jobs, cluster, deep-learning, distributed-computing, hpc, hpc-getting-started, machine-learning, parallelism, programming, programming-for-hpc, python, research-facilitation, ssh
Project Status: In Progress
Project Region: CAREERS
Submitted By: Gaurav Khanna
Project Email: maydogdu@ric.edu
Project Institution: Rhode Island College
Anchor Institution: CR-University of Rhode Island
Project Address: Rhode Island

Students: Ritesh Bachhar

Project Description

This project entails generating vector representations using a general purpose and a finance corpus using the GloVe implementation. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. The steps will involve extracting text from two sets of documents and building the two corpora, then training GloVe on these two corpora and generating vector representations. These vector representations will then be used to analyze the impact of domain-specific corpus on vector representation.

This project will require storage space to save large corpora and computation power to train GloVe on these corpora. A computing platform like URI’s HPC or MGHPCC will be used to perform these tasks. The student facilitator will help the project PI to get the computational workflow set up in an HPC environment i.e. develop and test the job submission scripts and set up the required software and data properly on the chosen computational resource.

Project Information

bash, batch-jobs, cluster, deep-learning, distributed-computing, hpc, hpc-getting-started, machine-learning, parallelism, programming, programming-for-hpc, python, research-facilitation, ssh
Project Status: In Progress
Project Region: CAREERS
Submitted By: Gaurav Khanna
Project Email: maydogdu@ric.edu
Project Institution: Rhode Island College
Anchor Institution: CR-University of Rhode Island
Project Address: Rhode Island

Students: Ritesh Bachhar

Project Description

This project entails generating vector representations using a general purpose and a finance corpus using the GloVe implementation. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. The steps will involve extracting text from two sets of documents and building the two corpora, then training GloVe on these two corpora and generating vector representations. These vector representations will then be used to analyze the impact of domain-specific corpus on vector representation.

This project will require storage space to save large corpora and computation power to train GloVe on these corpora. A computing platform like URI’s HPC or MGHPCC will be used to perform these tasks. The student facilitator will help the project PI to get the computational workflow set up in an HPC environment i.e. develop and test the job submission scripts and set up the required software and data properly on the chosen computational resource.