[kanchilug] Fwd: [FTC] FWD: [INFITT GB] We have released 100's of hours of ASR speech data in Kannada & Tamil on OpenSLR

  • From: Shrinivasan T <tshrinivasan@xxxxxxxxx>
  • To: kanchilug@xxxxxxxxxxxxx, "puduvailug@xxxxxxxxxxxxx" <puduvailug@xxxxxxxxxxxxx>, ViluppuramGLUG@xxxxxxxxxxxxxxxx, "mailinglist@xxxxxxxxx" <mailinglist@xxxxxxxxx>
  • Date: Sat, 6 Aug 2022 11:22:17 +0530

---------- Forwarded message ---------
From: Muthu A <ezhillang@xxxxxxxxx>
Date: Sat, Aug 6, 2022, 10:41 AM
Subject: [FTC] FWD: [INFITT GB] We have released 100's of hours of ASR
speech data in Kannada & Tamil on OpenSLR
To: ThamiZha! - Free Tamil Computing(FTC) <
freetamilcomputing@xxxxxxxxxxxxxxxx>


FWD email below:

On Wednesday, August 3, 2022 at 06:21:50 PM GMT+2, Ramakrishnan Angarai
Ganesan <agrkrish@xxxxxxxxx> wrote:


Dear Tamil Language Technology enthusiasts,

IISc-MILE Tamil ASR Corpus has transcribed speech corpus for training ASR
systems for the Tamil language. It has 152 hours of read-speech data
collected from 531 speakers from different cities of Tamil Nadu in a noise-free
recording environment with high-quality USB microphones.

The corpus is split as train and test and each folder contains two
subfolders named audio_files and trans_files. The folder "audio_files"
contains .wav file recordings (16 kHz, 16 bit, mono, PCM format). The
folder "trans_files" contains .txt files in UTF-8 Unicode text
corresponding to each audio file.

Download Tamil speech data from: http://www.openslr.org/127/
Download Kannada speech data from: http://www.openslr.org/126/

Both the corpora are published by Medical Intelligence and Language
Engineering (MILE) Lab, Department of Electrical Engineering, Indian
Institute of Science, Bangalore, India. The collection of the Kannada
corpus was funded by the Department of Kannada and Culture, Government of
Karnataka.

You can cite the data using the following BibTeX entries:

@misc{mile_1,
doi = {10.48550/ARXIV.2207.13331},
url = {https://arxiv.org/abs/2207.13331},
author = {A, Madhavaraj and Pilar, Bharathi and A G, Ramakrishnan A},
title = {Subword Dictionary Learning and Segmentation Techniques for
Automatic Speech Recognition in Tamil and Kannada},
publisher = {arXiv},
year = {2022},
}

@misc{mile_2,
doi = {10.48550/ARXIV.2207.13333},
url = {https://arxiv.org/abs/2207.13333},
author = {A, Madhavaraj and Pilar, Bharathi and A G, Ramakrishnan A},
title = {Knowledge-driven Subword Grammar Modeling for Automatic Speech
Recognition in Tamil and Kannada},
publisher = {arXiv},
year = {2022},
}

Regards
Ram

-- 
You received this message because you are subscribed to the Google Groups
"ThamiZha! - Free Tamil Computing(FTC)" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to freetamilcomputing+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit
https://groups.google.com/d/msgid/freetamilcomputing/CAHwtB4ahNL7bnzmQNCf9ioBvjGtF_LKgwsqHmD4xqqyoBdvgyg%40mail.gmail.com
<https://groups.google.com/d/msgid/freetamilcomputing/CAHwtB4ahNL7bnzmQNCf9ioBvjGtF_LKgwsqHmD4xqqyoBdvgyg%40mail.gmail.com?utm_medium=email&utm_source=footer>
.

Other related posts:

  • » [kanchilug] Fwd: [FTC] FWD: [INFITT GB] We have released 100's of hours of ASR speech data in Kannada & Tamil on OpenSLR - Shrinivasan T