Text-to-speech

As a direct result of this question on the Processing forum, I’ve been looking into a way to make use of Google’s text-to-speech webservice. The text-to-speech functionality is linked to Google Translate, but the webservice itself is still somewhat on the down low. So there is no documentation and no API for easy use. Of course this does not mean it’s impossible! ;-) In this blog post I will share the code snippets to send a string and download the resulting mp3 from inside Processing. Then I will show how to type the string in realtime and playback the sound during runtime. These are some basic examples that show you how it’s done, of course many more interesting possibilities arise from there. As it turns out the key to being able to get something back from the webservice is to pose as a browser! Some pure Java trickery is required to facilitate the back-and-forth. Most of it will seem unfamiliar to the beginning Processing coder. But hey, whatever works! Processing = Java, so why not take advantage of it? :D

Let me start with the most basic code snippet, where you input a String and the corresponding mp3 is downloaded to the sketch directory. I haven’t tested all of them, but I believe the languages that you can use are English (en), Spanish (es), French (fr), German (de) and Italian (it). This functionality is also ready-to-use in the code snippet. The first String is the text(-to-speech), the second String the language. So when you run this code you will end up with an mp3 of your spoken text, pretty cool.

void setup() {
  googleTTS("Google text to speech is awesome!", "en"); // en, es, fr, de, it
  exit();
}

void googleTTS(String txt, String language) {
  String u = "http://translate.google.com/translate_tts?tl=";
  u = u + language + "&q=" + txt;
  u = u.replace(" ", "%20"); // replace spaces by %20
  try {
    URL url = new URL(u);
    try {
      URLConnection connection = url.openConnection();
      // pose as webbrowser
      connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 1.2.30703)");
      connection.connect();
      InputStream is = connection.getInputStream();
      // create a file named after the text
      File f = new File(sketchPath + "/" + txt + ".mp3");
      OutputStream out = new FileOutputStream(f);
      byte buf[] = new byte[1024];
      int len;
      while ((len = is.read(buf)) > 0) {
        out.write(buf, 0, len);
      }
      out.close();
      is.close();
      println("File created for: " + txt); // report back via the console
    } catch (IOException e) {
      e.printStackTrace();
    }
  } catch (MalformedURLException e) {
    e.printStackTrace();
  }
}

Now let’s see if you can actually playback that mp3 in Processing during runtime (as was the original question). Can we? Yes we can! :D The following makes use of the Minim library that automatically comes with Processing. So all these code snippets run in Processing, without installing additional libraries or other files. The way it works is that you can type the String during runtime. Use the keyboard as you normally would, backspace deletes a character, delete clears the whole String etc. The number of characters is limited to 100, because I believe that is the limit of the webservice. Anyway, once you have the String you want, press enter. Then the file is downloaded and played back, this code is based on the playback example that comes with Minim. For more advanced sketches, the use of threads is of course recommended. In the code example you can keep typing and playing different Strings if you want. An interesting trick is to comment or remove the line that closes the player (line 36). If you do that, you can layer sound upon sound. Also useful to remember is that – given the way the code works – all of the spoken texts are saved to the harddisk. So once you quit the program, you will still have all the mp3′s inside the sketch directory.

You can just copy-paste this code in Processing’s PDE and press RUN. Hope it’s useful to someone! :)

import ddf.minim.*;
AudioPlayer player;
Minim minim;

String s = "Type here";

void setup() {
  size(1280, 720);
  minim = new Minim(this);
  textAlign(CENTER, CENTER);
  textSize(50);
  stroke(255);
}

void draw() {
  background(0);
  text(s, 0, 0, width, height);
  if (player != null) {
    translate(0, 250);
    for(int i = 0; i < player.left.size()-1; i++) {
      line(i, 50 + player.left.get(i)*50, i+1, 50 + player.left.get(i+1)*50);
      line(i, 150 + player.right.get(i)*50, i+1, 150 + player.right.get(i+1)*50);
    }
  }
}

void keyPressed() {
  if (keyCode == BACKSPACE) {
    if (s.length() > 0) {
      s = s.substring(0, s.length()-1);
    }
  } else if (keyCode == DELETE) {
    s = "";
  } else if (keyCode == ENTER) {
    googleTTS(s, "en");
    if (player != null) { player.close(); } // comment this line to layer sounds
    player = minim.loadFile(s + ".mp3", 2048);
    player.loop();
    s = "";
  } else if (keyCode != SHIFT && keyCode != CONTROL && keyCode != ALT && s.length() < 100) {
    s += key;
  }
}

void googleTTS(String txt, String language) {
  String u = "http://translate.google.com/translate_tts?tl=";
  u = u + language + "&q=" + txt;
  u = u.replace(" ", "%20");
  try {
    URL url = new URL(u);
    try {
      URLConnection connection = url.openConnection();
      connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 1.2.30703)");
      connection.connect();
      InputStream is = connection.getInputStream();
      File f = new File(sketchPath + "/" + txt + ".mp3");
      OutputStream out = new FileOutputStream(f);
      byte buf[] = new byte[1024];
      int len;
      while ((len = is.read(buf)) > 0) {
        out.write(buf, 0, len);
      }
      out.close();
      is.close();
      println("File created: " + txt + ".mp3");
    } catch (IOException e) {
      e.printStackTrace();
    }
  } catch (MalformedURLException e) {
    e.printStackTrace();
  }
}

void stop() {
  player.close();
  minim.stop();
  super.stop();
}

 

Advertisement
Comments
4 Responses to “Text-to-speech”
  1. lac_777 says:

    Thank you so much! This works perfekt!

  2. artbothack says:

    thank you very much for this great code
    I have a question:
    if I don’t want to record the mp3 file but just play it what can I do? …

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.