Prepare Resources
chariot have the feature to prepare resources for NLP research. Specifically, data and pretrained vectors.
Download NLP dataset
chariot can collaborate with chazutsu that is NLP datasets downloader.
import chazutsu
from chariot.storage import Storage
storage = Storage.setup_data_dir(ROOT_DIR)
r = chazutsu.datasets.MovieReview.polarity().download(storage.data_path("raw"))
r.train_data().head(3)
polarity review
0 0 synopsis : an aging master art thief , his sup...
1 0 plot : a separated , glamorous , hollywood cou...
2 0 a friend invites you to a movie . this film wo...
Download Pretrained Word Vector
chariot can load the pretrained word vector by collaborating with chakin.
storage = Storage("path/to/project/root")
vec_path = storage.chakin(name="GloVe.6B.200d") # download word vector
vocab = Vocabulary.from_file("path/to/vocabulary")
embedding = vocab.make_embedding(storage.data_path("external/glove.6B.200d.txt"))