How to format TSV files to use with torchtext?

Multi tool use
How to format TSV files to use with torchtext?
The way i'm formatting is like:
Jersei N
atinge V
média N
. PU
Programe V
...
First string in each line is the lexical item, the other is a pos tag. But the empty-line (that i'm using to indicate the end of a sentence) gives me the error AttributeError: 'Example' object has no attribute 'text'
when running the given code:
AttributeError: 'Example' object has no attribute 'text'
src = data.Field()
trg = data.Field(sequential=False)
mt_train = datasets.TabularDataset(
path='/path/to/file.tsv',
fields=(src, trg))
src.build_vocab(train)
How the proper way to indicate EOS to torchtext?
You could replace the empty line with 2
TAB
s.– Danny_ds
Jul 3 at 17:49
TAB
@Danny_ds the error is gone but messed up my text/labels.
''
(empty string) appears as a label, for example– Bledson
Jul 3 at 23:55
''
@Bledson Yes, that's normal. To avoid parsing errors, a tsv has to have the same field count on every line. Do you need the empty lines?
– Danny_ds
Jul 4 at 0:02
@Danny_ds yes. i need to split sentences as the rnn is fed with batches of them
– Bledson
Jul 4 at 1:32
1 Answer
1
The following code reads the TSV the way i formatted:
mt_train = datasets.SequenceTaggingDataset(path='/path/to/file.tsv',
fields=(('text', text),
('labels', labels)))
It happens that SequenceTaggingDataset
properly identifies an empty line as the sentence separator.
SequenceTaggingDataset
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
@kmario23 done!
– Bledson
Jul 3 at 17:43