How to format TSV files to use with torchtext?

Multi tool use
Multi tool use


How to format TSV files to use with torchtext?



The way i'm formatting is like:


Jersei N
atinge V
média N
. PU

Programe V
...



First string in each line is the lexical item, the other is a pos tag. But the empty-line (that i'm using to indicate the end of a sentence) gives me the error AttributeError: 'Example' object has no attribute 'text' when running the given code:


AttributeError: 'Example' object has no attribute 'text'


src = data.Field()
trg = data.Field(sequential=False)
mt_train = datasets.TabularDataset(
path='/path/to/file.tsv',
fields=(src, trg))
src.build_vocab(train)



How the proper way to indicate EOS to torchtext?





@kmario23 done!
– Bledson
Jul 3 at 17:43





You could replace the empty line with 2 TABs.
– Danny_ds
Jul 3 at 17:49


TAB





@Danny_ds the error is gone but messed up my text/labels. '' (empty string) appears as a label, for example
– Bledson
Jul 3 at 23:55


''





@Bledson Yes, that's normal. To avoid parsing errors, a tsv has to have the same field count on every line. Do you need the empty lines?
– Danny_ds
Jul 4 at 0:02





@Danny_ds yes. i need to split sentences as the rnn is fed with batches of them
– Bledson
Jul 4 at 1:32




1 Answer
1



The following code reads the TSV the way i formatted:


mt_train = datasets.SequenceTaggingDataset(path='/path/to/file.tsv',
fields=(('text', text),
('labels', labels)))



It happens that SequenceTaggingDataset properly identifies an empty line as the sentence separator.


SequenceTaggingDataset






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

W5sPL cIhW,cyTNq4 ZG 0yGEm8Yt4V,XJZurxEK,v33nMpmItmt5wIlzy1wirCaHf4EUYLpDB d,phKHCjd
AAxmem,C0m IMS,4GizyYKw Cu q,RKpjvBTOUzA2C,i,sEKT,AMv,xN7Z,jYNiI 6dkAmlr Avu2,G

Popular posts from this blog

Boo (programming language)

Rothschild family