This is the Hunglish CDROM, featuring a sentence-aligned Hungarian-English
parallel corpus of about 54.2 m words in 2.07 m sentences, a monolingual
corpus of about 46.4 m words in 3.00 m sentences, and ancillary software.
|The data files sorted by genre
|Source code for the ancillary
|Language resources for Hungarian
The following copyright applies to all source code and executables
on this CD:
This work is licensed under the Creative Commons Attribution 2.0 License.
The raw text files in the data directory are public domain.
Papers and documentation contained on the CD are copyright of their
The CD was produced by the joint work of the Media Research
and Education Center at the Budapest
University of Technology and Economics (Dániel Varga, Péter
Halácsy, András Kornai,
László Németh, and Viktor Trón), and the Corpus Linguistics
Department at the Hungarian Academy of Sciences Institute of Linguistics
(Tamás Váradi, Bálint Sass, Gergő Bottyán, Enikő Héja, Ágnes Gyarmati, Ágnes Mészáros and Dávid Labundy), who contributed all
the monolingual source material and most of the bilingual source files
in the magazine section.
The Hunglish project is supported by an
ITEM grant by the the Hungarian Ministry of
Informatics and Communication. András Aklán (BUTE) provided
effective project management for the production process and Mike Maxwell (LDC)
advised us on the structure of the CD and found many bugs. We thank Magyar Telekom Rt. for infrastructure