Prepare Resources
chariot
have the feature to prepare resources for NLP research. Specifically, data and pretrained vectors.
Download NLP dataset
chariot
can collaborate with chazutsu
that is NLP datasets downloader.
import chazutsu
from chariot.storage import Storage
storage = Storage.setup_data_dir(ROOT_DIR)
r = chazutsu.datasets.MovieReview.polarity().download(storage.data_path("raw"))
r.train_data().head(3)
polarity review
0 0 synopsis : an aging master art thief , his sup...
1 0 plot : a separated , glamorous , hollywood cou...
2 0 a friend invites you to a movie . this film wo...
Download Pretrained Word Vector
chariot
can load the pretrained word vector by collaborating with chakin
.
storage = Storage("path/to/project/root")
vec_path = storage.chakin(name="GloVe.6B.200d") # download word vector
vocab = Vocabulary.from_file("path/to/vocabulary")
embedding = vocab.make_embedding(storage.data_path("external/glove.6B.200d.txt"))