Text-to-speech

Posted by Amnon on November 26, 2011 · 9 Comments

As a direct result of this question on the Processing forum, I’ve been looking into a way to make use of Google’s text-to-speech webservice. The text-to-speech functionality is linked to Google Translate, but the webservice itself is still somewhat on the down low. So there is no documentation and no API for easy use. Of course this does not mean it’s impossible! 😉 In this blog post I will share the code snippets to send a string and download the resulting mp3 from inside Processing. Then I will show how to type the string in realtime and playback the sound during runtime. These are some basic examples that show you how it’s done, of course many more interesting possibilities arise from there. As it turns out the key to being able to get something back from the webservice is to pose as a browser! Some pure Java trickery is required to facilitate the back-and-forth. Most of it will seem unfamiliar to the beginning Processing coder. But hey, whatever works! Processing = Java, so why not take advantage of it? 😀

Let me start with the most basic code snippet, where you input a String and the corresponding mp3 is downloaded to the sketch directory. I haven’t tested all of them, but I believe the languages that you can use are English (en), Spanish (es), French (fr), German (de) and Italian (it). This functionality is also ready-to-use in the code snippet. The first String is the text(-to-speech), the second String the language. So when you run this code you will end up with an mp3 of your spoken text, pretty cool.

void setup() {
  googleTTS("Google text to speech is awesome!", "en"); // en, es, fr, de, it
  exit();
}

void googleTTS(String txt, String language) {
  String u = "http://translate.google.com/translate_tts?tl=";
  u = u + language + "&q=" + txt;
  u = u.replace(" ", "%20"); // replace spaces by %20
  try {
    URL url = new URL(u);
    try {
      URLConnection connection = url.openConnection();
      // pose as webbrowser
      connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 1.2.30703)");
      connection.connect();
      InputStream is = connection.getInputStream();
      // create a file named after the text
      File f = new File(sketchPath + "/" + txt + ".mp3");
      OutputStream out = new FileOutputStream(f);
      byte buf[] = new byte[1024];
      int len;
      while ((len = is.read(buf)) > 0) {
        out.write(buf, 0, len);
      }
      out.close();
      is.close();
      println("File created for: " + txt); // report back via the console
    } catch (IOException e) {
      e.printStackTrace();
    }
  } catch (MalformedURLException e) {
    e.printStackTrace();
  }
}

Now let’s see if you can actually playback that mp3 in Processing during runtime (as was the original question). Can we? Yes we can! 😀 The following makes use of the Minim library that automatically comes with Processing. So all these code snippets run in Processing, without installing additional libraries or other files. The way it works is that you can type the String during runtime. Use the keyboard as you normally would, backspace deletes a character, delete clears the whole String etc. The number of characters is limited to 100, because I believe that is the limit of the webservice. Anyway, once you have the String you want, press enter. Then the file is downloaded and played back, this code is based on the playback example that comes with Minim. For more advanced sketches, the use of threads is of course recommended. In the code example you can keep typing and playing different Strings if you want. An interesting trick is to comment or remove the line that closes the player (line 36). If you do that, you can layer sound upon sound. Also useful to remember is that – given the way the code works – all of the spoken texts are saved to the harddisk. So once you quit the program, you will still have all the mp3’s inside the sketch directory.

You can just copy-paste this code in Processing’s PDE and press RUN. Hope it’s useful to someone! 🙂

import ddf.minim.*;
AudioPlayer player;
Minim minim;

String s = "Type here";

void setup() {
  size(1280, 720);
  minim = new Minim(this);
  textAlign(CENTER, CENTER);
  textSize(50);
  stroke(255);
}

void draw() {
  background(0);
  text(s, 0, 0, width, height);
  if (player != null) {
    translate(0, 250);
    for(int i = 0; i < player.left.size()-1; i++) {
      line(i, 50 + player.left.get(i)*50, i+1, 50 + player.left.get(i+1)*50);
      line(i, 150 + player.right.get(i)*50, i+1, 150 + player.right.get(i+1)*50);
    }
  }
}

void keyPressed() {
  if (keyCode == BACKSPACE) {
    if (s.length() > 0) {
      s = s.substring(0, s.length()-1);
    }
  } else if (keyCode == DELETE) {
    s = "";
  } else if (keyCode == ENTER) {
    googleTTS(s, "en");
    if (player != null) { player.close(); } // comment this line to layer sounds
    player = minim.loadFile(s + ".mp3", 2048);
    player.loop();
    s = "";
  } else if (keyCode != SHIFT && keyCode != CONTROL && keyCode != ALT && s.length() < 100) {
    s += key;
  }
}

void googleTTS(String txt, String language) {
  String u = "http://translate.google.com/translate_tts?tl=";
  u = u + language + "&q=" + txt;
  u = u.replace(" ", "%20");
  try {
    URL url = new URL(u);
    try {
      URLConnection connection = url.openConnection();
      connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 1.2.30703)");
      connection.connect();
      InputStream is = connection.getInputStream();
      File f = new File(sketchPath + "/" + txt + ".mp3");
      OutputStream out = new FileOutputStream(f);
      byte buf[] = new byte[1024];
      int len;
      while ((len = is.read(buf)) > 0) {
        out.write(buf, 0, len);
      }
      out.close();
      is.close();
      println("File created: " + txt + ".mp3");
    } catch (IOException e) {
      e.printStackTrace();
    }
  } catch (MalformedURLException e) {
    e.printStackTrace();
  }
}

void stop() {
  player.close();
  minim.stop();
  super.stop();
}

Filed under Experiments · Tagged with code snippet, creative coding, example, processing.org, sound, source code, text-to-speech, tts

Comments

9 Responses to “Text-to-speech”

lac_777 says:

November 27, 2011 at 12:57 pm

Thank you so much! This works perfekt!
artbothack says:

February 18, 2012 at 5:19 pm

thank you very much for this great code
I have a question:
if I don’t want to record the mp3 file but just play it what can I do? …
- Amnon says:
  
  February 18, 2012 at 5:30 pm
  
  Either write your own implementation of Minim AudioRecording to circumvent the saving to and loading from a file. Or as a workaround delete the file after it’s loaded.
  - artbothack says:
    
    February 18, 2012 at 6:15 pm
    
    thank you for your réponse
    i think like you it’s esyer to delete the file after play it
    
    I found this too :
    
    http://kahimyang.info/kauswagan/HowtoBlogs.xhtml?b=757
    
    but I am not able to Translate it for the processing 8_(
Babaorom says:

December 21, 2012 at 11:01 pm

When I test this sketch , I have this warning :”Cannot find a class or a type named “URL” for this line nr 50 : —->URL url = new URL(u);
Could you help me please.
Thank you.
- Amnon says:
  
  December 21, 2012 at 11:52 pm
  
  That’s because you are apparantly running the code in Processing 2.0b7 and in this version all imports that aren’t covered in the Processing reference were removed. To solve this you can add these import statements to make it run:
  
  import java.net.*;
  import java.io.*;

Trackbacks

Check out what others are saying...

Anniversary Competition Aftermath: And The Winners Are… « Retrode.org says:

December 8, 2012 at 7:52 am

[…] I used this tutorial for the text-to-speech part: https://amnonp5.wordpress.com/2011/11/26/text-to-speech/ […]
Phase 4 Documentation-Group 9_iFit | arielthemis says:

June 16, 2013 at 3:06 pm

[…] https://amnonp5.wordpress.com/2011/11/26/text-to-speech/ […]
Midterm Project – Updates | Physical Computing S15 says:

March 30, 2015 at 4:41 am

[…] above connections link and used googles’ text to speech for added interactivity, I used this TTS code and adjusted it to my use. I also used Audacity to record some audio files for personalized […]

Amnon P5 – Experiments with Processing by Amnon Owed

Text-to-speech

Comments

Trackbacks

Tag Cloud

Recent Posts

Archives

Amnon P5 – Experiments with Processing by Amnon Owed

Text-to-speech

Share this:

Related

Comments

Trackbacks

Tag Cloud

Recent Posts

Archives