QuickTime for Java: A Developer's Notebook/Audio Media

From WikiContent

Jump to: navigation, search
QuickTime for Java: A Developer's Notebook

This is the first of three chapters dealing with specific media types. Video will be covered in Chapter 8, and several other kinds of media—including things you might not have thought of as media, such as text and time codes—will be covered in Chapter 9.

It's possible that you've never thought of QuickTime as being the engine for audio-only applications—the ubiquity of QuickTime's .mov file format probably makes it more readily recognized as a video standard. But QuickTime's support for audio has been critical to many applications. For example, the fact that QuickTime was already ported to Windows made bringing iTunes and its music store over to Windows a lot easier.

In fact, iTunes is probably responsible for getting QuickTime onto a lot more Windows machines than it would have reached otherwise. So, I'll begin with a few labs that are particularly applicable to the MP3s and AACs collected by iTunes users.

Contents

Reading Information from MP3 Files

If you've ever listened to an MP3 music file—and at this point, who hasn't—you've surely appreciated the fact that useful information like artist, song title, album title, etc., is stored inside the file. Not only does this make it convenient to organize your music, but also, when you move a song from one device to another, this metadata travels with it.

The most widely accepted standard for doing this is the ID3 standard, which puts this metadata into parts of the file that are not interpreted as containing audio data—MP3s arrange data in frames , and ID3 puts metadata between these frames. ID3 tags typically are found at the beginning of a file, which makes them stream-friendly, although some files tagged with earlier versions of the standard have the metadata at the end of the file.

Note

Visit http://www.id3.org/ to learn more about ID3.

When QuickTime imports an MP3 file, it reads ID3 tags and makes them available to your program through the movie's user data, allowing you to display the tags to the user, or use them in any other way you see fit.

How do I do that?

Once you open an MP3 as a movie, you need to get at the user data, which contains the imported ID3 tags. Fortunately, it's wrapped as an object called UserData :

UserData userData = movie.getUserData( );

The user data is something of a grab bag of data that you can read from and write to freely. Items are keyed by FOUR_CHAR_CODE s, and the contents aren't required to adhere to any particular standard or format (after all, you're free to write whatever you like in user data). For example, QuickTime Player writes a "WLOC" entry that stores the window location last used for the movie.

Apple has a standard set of keys that you can use to retrieve the data parsed from an MP3's ID3 tags. Because these are text values, you use UserData 's getTextAsString( ) method to pull them out. getTextAsString( ) takes three arguments: the type you're requesting; an index to indicate whether you want the first, second, etc., instance of that type; and a region tag that's irrelevant in the ID3 case.

Example 7-1 shows a basic exercise of this technique, getting the UserData object and asking for album, artist, creation date, and song title information.

Note

Run this example from the downloadable book code with ant run-ch07-id3tagreader.

Example 7-1. Retrieving ID3 metadata

package com.oreilly.qtjnotebook.ch07;
 
import quicktime.*;
import quicktime.std.*;
import quicktime.std.movies.*;
import quicktime.std.movies.media.*;
import quicktime.io.*;
import java.util.*;
 
import com.oreilly.qtjnotebook.ch01.QTSessionCheck;
 
public class ID3TagReader extends Object {
 
  /* these values are straight out of Movies.h
  */
  final static int  kUserDataTextAlbum            = 0xA9616C62; /*'©alb' */
  final static int  kUserDataTextArtist           = 0xA9415254; 
  final static int  kUserDataTextCreationDate     = 0xA9646179; /*'©day' */
  final static int  kUserDataTextFullName         = 0xA96E616D; /*'©nam' */
 
  /* This array maps all the tag constants to human-readable strings
   */
  private static final Object[  ][  ] TAG_NAMES = {
      {new Integer (kUserDataTextAlbum), "Album"},
      {new Integer (kUserDataTextArtist),"Artist" },
      {new Integer (kUserDataTextCreationDate), "Created"},
      {new Integer (kUserDataTextFullName), "Full Name"}
  };
 
  private static final HashMap TAG_MAP =
      new HashMap(TAG_NAMES.length);
  static {
      for (int i=0; i<TAG_NAMES.length; i++) {
          TAG_MAP.put (TAG_NAMES[i][0],
                       TAG_NAMES[i][1]);
      }
  }
 
  public static void main (String[  ] args) {
      new ID3TagReader( );
      System.exit(0);
  }
 
  public ID3TagReader( ) {
      try {
          QTSessionCheck.check( );
          QTFile f = QTFile.standardGetFilePreview (null);
          OpenMovieFile omf = OpenMovieFile.asRead(f);
          Movie movie = Movie.fromFile (omf);
          // get user data
          UserData userData = movie.getUserData( );
          dumpTagsFromUserData(userData);
      } catch (Exception e) {
          e.printStackTrace( );
      }
  }
 
  protected static void dumpTagsFromUserData (UserData userData) {
      // try for each key in TAG_MAP
      Iterator it = TAG_MAP.keySet( ).iterator( );
      while (it.hasNext( )) {
          Integer key = (Integer) it.next( );
          int tag = key.intValue( );
          String tagName = (String) TAG_MAP.get(key);
          try {
              String value =
                  userData.getTextAsString (tag,
                                            1,
                                            IOConstants.langUnspecified);
              System.out.println (tagName + ": " + value);
          } catch (QTException qte) {  } // no such tag
      }
  }
}

When run, this dumps the found tags to standard out, as seen in the following console output:

cadamson% ant run-ch07-id3tagreader
Buildfile: build.xml
 
run-ch07-id3tagreader:
   [java] Album: Arthur Or The Decline And Fall Of The British Empire
   [java] Full Name: Victoria
   [java] Artist: The Kinks

What just happened?

The application sets up some static values for keys it is interested in and maps them to human-readable names. For example, the FOUR_CHAR_CODE "@alb" is mapped to "Album."

The program prompts the user to select an MP3 file and imports it as a movie, from which it gets a UserData object. In dumpTagsFromUserData( ), it calls getTextAsString() to attempt to get a value for each known tag. If successful, it writes the key and value to the console. If a given tag is absent from the user data, QuickTime throws an exception, which this program quietly ignores.

QuickTime has an important and disappointing limitation: it does not import tags written in non-Western scripts. For example, here's the output when I run the application against an MP3 whose "artist" tag is in Japanese kana (characters):

cadamson% ant run-ch07-id3tagreader
Buildfile: build.xml
 
run-ch07-id3tagreader:
   [java] Album: COWBOY BEBOP O.S.T.1
   [java] Created: 1998
   [java] Full Name: SPACE LION

Because the artist (, or "Yoko Kanno" in romaji) is written in non-Western characters, QuickTime doesn't attempt to import it, and thus there's no artist item to retrieve from the user data.

What about...

...other tags? A big list of metadata tags are defined in the native API's Movies.h file. Unfortunately, these aren't in the StdQTConstants classes, or anywhere else in QTJ, so you have to define your own constants for them. Table 7-1 is the list of supported values.

Table 7-1. Audio metadata tag constants

Constant name Hex value 4CC
kUserDataTextAlbum
0xA9616C62
©alb
kUserDataTextArtist
0xA9415254
©ART
kUserDataTextAuthor
0xA9617574
©aut
kUserDataTextChapter
0xA9636870
©chp
kUserDataTextComment
0xA9636D74
©cmt
kUserDataTextComposer
0xA9636F6D
©com
kUserDataTextCopyright
0xA9637079
©cpy
kUserDataTextCreationDate
0xA9646179
©day
kUserDataTextDescription
0xA9646573
©des
kUserDataTextDirector
0xA9646972
©dir
kUserDataTextDisclaimer
0xA9646973
©dis
kUserDataTextEncodedBy
0xA9656E63
©enc
kUserDataTextFullName
0xA96E616D
©nam
kUserDataTextGenre
0xA967656E
©gen
kUserDataTextHostComputer
0xA9687374
©hst
kUserDataTextInformation
0xA9696E66
©inf
kUserDataTextKeywords
0xA96B6579
©key
kUserDataTextMake
0xA96D616B
©mak
kUserDataTextModel
0xA96D6F64
©mod
kUserDataTextOriginalArtist
0xA96F7065
©ope
kUserDataTextOriginalFormat
0xA9666D74
©fmt
kUserDataTextOriginalSource
0xA9737263
©src
kUserDataTextPerformers
0xA9707266
©prf
kUserDataTextProducer
0xA9707264
©prd
kUserDataTextProduct
0xA9505244
©PRT
kUserDataTextSoftware
0xA9737772
©swr
kUserDataTextSpecialPlayback
Requirements
0xA9726571
©req
kUserDataTextTrack
0xA974726B
©trk
kUserDataTextWarning
0xA977726E
©wrn
kUserDataTextWriter
0xA9777274
©wrt
kUserDataTextURLLink
0xA975726C
©url
kUserDataTextEditDate1
0xA9656431
©ed1


Also, instead of requesting specific keys from the user data, can I just tour what's in there? Yes, you can use UserData.getNextType() to discover the types of items in the user data. This method takes an int of the last discovered type (use 0 on the first call), and returns the next type after that one. When it returns 0, there are no more types to discover. Given a type, you can get its data with getTextAsString() , but because you can't know that a discovered piece of user data necessarily represents textual data, it might be safer to call getData( ) , which returns a QTHandle , from which you can get a byte array with getBytes( ) .

Note

This technique is a lot like the "Discovering All Installed Components" lab in Chapter 4.

Reading Information from iTunes AAC Files

If you read the last lab and thought about how ID3 metadata is imported into a QuickTime movie's UserData, you might well expect that the same thing would be true of AAC files created by iTunes: .m4a files for songs "ripped" by the user and .m4p files sold by the iTunes Music Store. In fact, because these files use an MPEG-4 file format that is itself based on QuickTime, you might think that using the same user data scheme would be a slam dunk.

But...you'd be wrong.

These AAC files do put the metadata in the user data, but they do so in a way that resists straightforward retrieval via QuickTime. Fortunately, it's not too hard to get the values out with some parsing.

Note

Buckle up, this one is rough.

How do I do that?

For once, theory needs to come before code—you need to see the format to understand how to parse it. Here's a /usr/bin/hexdump of an iTunes Music Store AAC file from my collection, Toto Dies.m4p:

0000b010  00 3d 5f 3c 00 3d 7d 5e  00 3d 9a fb 00 03 18 da  |.=_<.=}^.=......|
0000b020  75 64 74 61 00 03 18 d2  6d 65 74 61 00 00 00 00  |udta....meta....|
0000b030  00 00 00 22 68 64 6c 72  00 00 00 00 00 00 00 00  |..."hdlr........|
0000b040  6d 64 69 72 61 70 70 6c  00 00 00 00 00 00 00 00  |mdirappl........|
0000b050  00 00 00 03 11 9b 69 6c  73 74 00 00 00 21 a9 6e  |......ilst...!.n|
0000b060  61 6d 00 00 00 19 64 61  74 61 00 00 00 01 00 00  |am....data......|
0000b070  00 00 54 6f 74 6f 20 44  69 65 73 00 00 00 24 a9  |..Toto Dies...$.|
0000b080  41 52 54 00 00 00 1c 64  61 74 61 00 00 00 01 00  |ART....data.....|
0000b090  00 00 00 4e 65 6c 6c 69  65 20 4d 63 4b 61 79 00  |...Nellie McKay.|
0000b0a0  00 00 24 a9 77 72 74 00  00 00 1c 64 61 74 61 00  |..$.wrt....data.|
0000b0b0  00 00 01 00 00 00 00 4e  65 6c 6c 69 65 20 4d 63  |.......Nellie Mc|
0000b0c0  4b 61 79 00 03 0e 76 63  6f 76 72 00 03 0e 6e 64  |Kay...vcovr...nd|
0000b0d0  61 74 61 00 00 00 0d 00  00 00 00 ff d8 ff e0 00  |ata.............|
0000b0e0  10 4a 46 49 46 00 01 01  01 02 f9 02 f9 00 00 ff  |.JFIF...........|

Granted, this is not easy to read, but I'll bet you can pick out the artist (Nellie McKay) and the song title ("Toto Dies"), so you know this is the relevant section of the file. In fact, you also might notice the string "udta"...sounds a little like "user data," doesn't it?

At work here is the QuickTime file format and its concept of atoms , which are tree-structured pieces of data used to describe a movie, its contents, and its metadata. Without going too deeply into the details—there's a whole book on the format—each atom consists of 4 bytes of size, a 4-byte type, and then data. Atoms contain either data or other atoms, but not both. The 4 bytes before "udta", 0x000318da, indicate the size of all the user data. The first child is an atom called "meta". Because its size is 0x000318d2, just 8 less than the size of "udta", the "meta" atom is clearly the only child of "udta".

Unfortunately, because this is user data, the contents don't have to adhere to any published standard, and they don't. The first thing after "meta" should be the 4-byte size of its first child atom, but the value is 0x00000000—an illegal "no size" value—so, a normal QuickTime parser would ignore the contents of "meta".

Funny thing is, although these contents aren't real QuickTime atoms, they're awfully close. Start with the stuff that's obviously the metadata and work backward: "Toto Dies" is preceded by an 8-byte pad (0x00000001 and 0x00000000), and before that is "data" and a 4-byte number. That number, 0x00000019, is the size of itself, plus "data", plus the 8-byte pad, plus the string "Toto Dies." And just before that, you'll find the string "©nam", preceded by a 4-byte size. Better yet, "©nam" is one of the constants defined in Movies.h for metadata tagging.

Note

See the previous lab for a list of QuickTime's metadata tags.

Dig further and you'll find that there's a run of these tag-name/data structures, each of which has the structure discovered earlier:

Full size
4 bytes
Type
4 bytes
Contents size
4 bytes
"data"
4 bytes
Unknown
8 bytes
Value
Variable number of bytes (size is implicit from earlier size data)

The run of metadata blocks exists within a single pseudo-atom parent called "ilst". So, this analysis provides a strategy for getting iTunes AAC metadata:

  1. Get the user data.
  2. Look for a user data item called "meta" and get it as a byte array.
  3. Inside this array, find "ilst".
  4. Start reading 8-byte blocks as possible size/type combinations. If the type is known as a metadata type, skip past the 24 bytes of junk (the 8-byte pad, the "data", etc.) and read the String.

The sample program in Example 7-2 implements this strategy.

Note

Run this example with ant run-ch07-aactagreader.

Example 7-2. Retrieving iTunes AAC metadata

package com.oreilly.qtjnotebook.ch07;
 
import quicktime.*;
import quicktime.std.*;
import quicktime.std.movies.*;
import quicktime.std.movies.media.*;
import quicktime.io.*;
import quicktime.util.*;
import java.util.*;
import java.math.BigInteger;
 
import com.oreilly.qtjnotebook.ch01.QTSessionCheck;
 
public class AACTagReader extends Object {
 
  /* these values are straight out of Movies.h
  */
  final static int  kUserDataTextAlbum            = 0xA9616C62; /*'©alb' */
  final static int  kUserDataTextArtist           = 0xA9415254; 
  final static int  kUserDataTextCreationDate     = 0xA9646179; /*'©day' */
  final static int  kUserDataTextFullName         = 0xA96E616D; /*'©nam' */
 
  /* This array maps all the tag constants to human-readable strings
  */
  private static final Object[  ][  ] TAG_NAMES = {
      {new Integer (kUserDataTextAlbum), "Album"},
      {new Integer (kUserDataTextArtist),"Artist" },
      {new Integer (kUserDataTextCreationDate), "Created"},
      {new Integer (kUserDataTextFullName), "Full Name"}
  };
 
  private static final HashMap TAG_MAP =
      new HashMap(TAG_NAMES.length);
  static {
      for (int i=0; i<TAG_NAMES.length; i++) {
          TAG_MAP.put (TAG_NAMES[i][0],
                       TAG_NAMES[i][1]);
      }
  }
 
  public static void main (String[  ] args) {
      new AACTagReader( );
      System.exit(0);
  }
 
  public AACTagReader( ) {
      try {
          QTSessionCheck.check( );
          QTFile f = QTFile.standardGetFilePreview (null);
          OpenMovieFile omf = OpenMovieFile.asRead(f);
          Movie movie = Movie.fromFile (omf);
          // get user data
          UserData userData = movie.getUserData( );
          dumpTagsFromUserData(userData);
      } catch (Exception e) {
          e.printStackTrace( );
      }
  }
 
  protected void dumpTagsFromUserData (UserData userData)
      throws QTException {
      int metaFCC = QTUtils.toOSType("meta");
      QTHandle metaHandle = userData.getData (metaFCC, 1);
      System.out.println ("Found meta");
      byte[  ] metaBytes = metaHandle.getBytes( );
 
      // locate the "ilst" pseudo-atom, ignoring first 4 bytes
      int ilstFCC = QTUtils.toOSType("ilst");
      PseudoAtomPointer ilst = findPseudoAtom (metaBytes, 4, ilstFCC);
 
      // iterate over the pseudo-atoms inside the "ilst"
      // building lists of tags and values from which we'll
      // create arrays for the DefaultTableModel constructor
      int off = ilst.offset + 8;
      ArrayList foundTags = new ArrayList (TAG_NAMES.length);
      ArrayList foundValues = new ArrayList (TAG_NAMES.length);
      while (off < metaBytes.length) {
          PseudoAtomPointer atom = findPseudoAtom (metaBytes, off, -1);
          String tagName = (String) TAG_MAP.get (new Integer(atom.type));
          if (tagName != null) {
              // if we match a type, read everything after byte 24
              // which skips size, type, size, 'data', 8 junk bytes
              byte[  ] valueBytes = new byte [atom.atomSize - 24];
              System.arraycopy (metaBytes,
                                atom.offset+24,
                                valueBytes,
                                0,
                                valueBytes.length);
              String value = new String (valueBytes);
              System.out.println (tagName + ": " + value);
          } // if tagName != null
          off = atom.offset + atom.atomSize;
      }
  }
 
  /** find the given type in the byte array, starting at
      the start position.  Returns the offset within the
      byte array that begins this pseudo-atom.  a helper method
      to populateFromMetaAtom( ).
      @param bytes byte array to search
      @param start offset to start at
      @param type type to search for.  if -1, returns first
      atom with a plausible size
   */
  private PseudoAtomPointer findPseudoAtom (byte[  ] bytes,
                                            int start,
                                            int type) {
      // read size, then type
      // if size is bogus, forget it, increment offset, and try again
      int off = start;
      boolean found = false;
      while ((! found) &&
             (off < bytes.length-8)) {
          // read 32 bits of atom size
          // use BigInteger to convert bytes to long
          // (instead of signed int)
          byte sizeBytes[  ] = new byte[4];
          System.arraycopy (bytes, off, sizeBytes, 0, 4);
          BigInteger atomSizeBI = new BigInteger (sizeBytes);
          long atomSize = atomSizeBI.longValue( );
 
          // don't bother if the size would take us beyond end of
          // array, or is impossibly small
          if ((atomSize > 7) &&
              (off + atomSize <= bytes.length)) {
              byte[  ] typeBytes = new byte[4];
              System.arraycopy (bytes, off+4, typeBytes, 0, 4);
              int aType = QTUtils.toOSType (new String (typeBytes));
 
              if ((type =  = aType) ||
                  (type =  = -1))
                  return new PseudoAtomPointer (off, (int) atomSize, aType);
              else
                  off += atomSize;
                                  
          } else {
              System.out.println ("bogus atom size " + atomSize);
              // well, how did this happen?  increment off and try again
              off++;
          }
      } // while
      return null;
  }
 
  /** Inner class to represent atom-like structures inside
      the meta atom, designed to work with the byte array 
      of the meta atom (i.e., just wraps pointers to the 
      beginning of the atom and its computed size and type)
   */
  class PseudoAtomPointer {
      int offset;
      int atomSize;
      int type;
      public PseudoAtomPointer (int o, int s, int t) {
          offset=o;
          atomSize=s;
          type=t;
      }
      
  }
 
}

When run with Toto Dies.m4p, the output to the console looks like this:

cadamson% ant run-ch07-aactagreader
Buildfile: build.xml
 
run-ch07-aactagreader:
   [java] Found meta
   [java] Full Name: Toto Dies
   [java] Artist: Nellie McKay
   [java] Album: Get Away from Me
   [java] Created: 2004-02-10T08:00:00Z

Note

The "album" and "created" data didn't appear in the earlier hexdump because in the file they occur after the cover art data, which is several kilobytes long.

What just happened?

The program gets the UserData, gets its "meta" atom as a byte array, and looks for the "ilst" pseudo-atom. If it finds one, it skips ahead 8 bytes (over "ilst" and its size) and goes into a loop of discovering and parsing potential pseudo-atoms.

To parse, you look at the first 4 bytes and consider whether it's a plausible size—in other words, whether it's big enough to contain data, but small enough to not run past the end of the byte array. If so, interpret the next 4 bytes as a FOUR_CHAR_CODE type and check against the list of known metadata types. If it matches one of the known types, you've got a valid piece of metadata, which this program simply writes to standard out.

What about...

...combining this with the MP3 approach of the previous lab so that there's just one codebase? A good strategy for that would be to get the UserData and look for a "meta" atom. If you get one, assume you have iTunes AAC and do the previous parsing. If not, assume you have an MP3, and start asking for the various metadata types with UserData.getTextAsString( ), as in the previous lab.

Providing Basic Audio Controls

Most audio applications provide some basic audio controls to allow the user to customize the sound output to suit his environment. The MovieController provides a volume control, but you can do better than that: you can control balance, bass, and treble with simple method calls.

How do I do that?

The AudioMediaHandler class provides the methods setBalance( ) and setSoundBassAndTreble( ), so it's just a matter of getting the handler object. The key is to remember that:

  • Movies have tracks.
  • Tracks have exactly one Media each.
  • Each Media has a MediaHandler.

Iterate over the movie's tracks to get each track's media and handler. To figure out whether a given track is audio, you can use a simple instanceof to see if the handler is an AudioMediaHandler.

setBalance( ) takes a float, which ranges from -1.0 (all the way to the left) to 1.0 (all the way to the right), with 0 representing equal balance.

setSoundBassAndTreble( ) is interesting because it's officially undocumented. As it turns out, you pass in ints for bass and treble, where 0 is normal, -256 is minimum bass or treble, and 256 is maximum.

Note

Well, the native version is undocumented. For once, the Javadocs have the useful info.

Example 7-3 provides a simple GUI to exercise these methods.

Note

Run this example with ant run-ch07-basicaudiocontrolsplayer.

Example 7-3. Providing balance, bass, and treble controls

package com.oreilly.qtjnotebook.ch07;
 
import quicktime.*;
import quicktime.std.*;
import quicktime.std.movies.*;
import quicktime.std.movies.media.*;
import quicktime.app.view.*;
import quicktime.io.*;
 
import java.awt.*;
import javax.swing.*;
import javax.swing.event.*;
 
import com.oreilly.qtjnotebook.ch01.QTSessionCheck;
 
public class BasicAudioControlsPlayer extends Frame 
  implements ChangeListener {
 
  JSlider balanceSlider, trebleSlider, bassSlider;
 
  AudioMediaHandler audioMediaHandler;
 
  public static void main (String[  ] args) {
      try {
          QTSessionCheck.check( );
          Frame f= new BasicAudioControlsPlayer( );
          f.pack( );
          f.setVisible(true);
      } catch (QTException qte) {
          qte.printStackTrace( );
      }
  }
 
  public BasicAudioControlsPlayer ( ) throws QTException {
      super ("Basic Audio Controls");
      // prompt for audio file
      QTFile file = QTFile.standardGetFilePreview(null);
      OpenMovieFile omf = OpenMovieFile.asRead (file);
      Movie movie = Movie.fromFile (omf);
      MovieController controller = new MovieController (movie);
      // get AudioMediaHandler for first audio track
      for (int i=1; i<=movie.getTrackCount( ); i++) {
          Track t = movie.getTrack(i);
          Media m = t.getMedia( );
          MediaHandler mh = m.getHandler( );
          if (mh instanceof AudioMediaHandler) {
              audioMediaHandler = (AudioMediaHandler) mh;
              break;
          }
      }
      if (audioMediaHandler =  = null) {
          System.out.println ("No audio track");
          System.exit(-1);
      }
      // add controller to GUI
      setLayout (new BorderLayout( ));
      Component comp =
          QTFactory.makeQTComponent(controller).asComponent( );
      add (comp, BorderLayout.NORTH);
      // build balance, treble, bass controls in a panel
      Panel controls = new Panel(new GridLayout (3,2));
      controls.add (new JLabel ("Balance"));
      balanceSlider = new JSlider (-1000, 1000, 0);
      balanceSlider.addChangeListener (this);
      controls.add (balanceSlider);
      controls.add (new JLabel ("Treble"));
      trebleSlider = new JSlider (-256, 256, 0);
      trebleSlider.addChangeListener (this);
      controls.add (trebleSlider);
      controls.add (new JLabel ("Bass"));
      bassSlider = new JSlider (-256, 256, 0);
      bassSlider.addChangeListener (this);
      controls.add (bassSlider);
      add (controls, BorderLayout.SOUTH);
  }
 
  public void stateChanged (ChangeEvent ev) {
      Object source = ev.getSource( );
      try {
          if (source =  = balanceSlider) {
              // balance
              float newBal =
                  (float) (balanceSlider.getValue( ) / 1000f);
              audioMediaHandler.setBalance (newBal);
          } else {
              // bass & treble
              audioMediaHandler.setSoundBassAndTreble (
                                    bassSlider.getValue( ),
                                    trebleSlider.getValue( ));
          }
 
 
      } catch (QTException qte) {
          qte.printStackTrace( );
      }
  }
}

When run, the program asks the user to select a file to play, and then shows a GUI, as seen in Figure 7-1.

Figure 7-1. Balance, treble, and bass controls

Image:QuickTime for Java: A Developer's Notebook I 7 tt200.png

What just happened?

The key to this example is the use of Swing JSlider s, which can be configured with appropriate bounds for the features they represent. For example, the bass and treble sliders run in a -256 to 256 range, with 0 as a default:

trebleSlider = new JSlider (-256, 256, 0);

The balance slider needs to pass a float between -1 and 1, but JSliders work with ints, so it uses a range of -1000 to 1000, which is scaled to an appropriate float before calling setBalance( ):

balanceSlider = new JSlider (-1000, 1000, 0);

All the sliders share a ChangeListener implementation that reads the new value from the affected JSlider and make a corresponding call to the AudioMediaHandler.

Providing a Level Meter

Many audio applications also provide a graphical " level meter," which is an on-screen display of the loudness or softness of certain frequencies within the audio. In QuickTime Player, this is shown as a set of bars on the right side of the control bar, as seen in Figure 7-2.

Figure 7-2. Audio level meter in QuickTime Player

Image:QuickTime for Java: A Developer's Notebook I 7 tt203.png

The intensity of lower frequencies, like bass, is shown in the leftmost columns, while higher frequencies are to the right.

How do I do that?

AudioMediaHandler provides two key methods: setSoundEqualizerBands() to set up monitoring and getSoundLevelMeterLevels() to actually get the data. setSoundEqualizerBands( ) indicates which frequencies you want to monitor for your graphics display. These are passed in the form of a MediaEqSpectrumBands object, which is built up by constructing it with the number of bands you intend to monitor, then repeatedly calling setFrequency() to indicate which frequency a given band will monitor.

Note

Unfortunately, most of the level-metering methods are officially undocumented.

As the audio plays, you can repeatedly call getSoundLevelMeterLevels( ), which returns an array of ints representing the measured levels.

Example 7-4 creates a basic audio level meter in an AWT Canvas.

Note

Run this example with ant run-ch07-levelmeterplayer.

Example 7-4. Providing an audio level meter

package com.oreilly.qtjnotebook.ch07;
 
import quicktime.*;
import quicktime.std.*;
import quicktime.std.movies.*;
import quicktime.std.movies.media.*;
import quicktime.app.view.*;
import quicktime.io.*;
 
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
 
import com.oreilly.qtjnotebook.ch01.QTSessionCheck;
 
public class LevelMeterPlayer extends Frame { 
 
  // bands used by apple sndequalizer example; equivalent to qt player's
  // http://developer.apple.com/samplecode/sndequalizer/sndequalizer.html
  int[  ] EQ_LEVELS = {
      200,
      400,
      800,
      1600,
      3200,
      6400,
      12800,
      21000
  };
  static final Dimension meterMinSize =
      new Dimension (300, 150);
  LevelMeter meter;
  AudioMediaHandler audioMediaHandler;
 
  public static void main (String[  ] args) {
      try {
          QTSessionCheck.check( );
          Frame f= new LevelMeterPlayer( );
          f.pack( );
          f.setVisible(true);
      } catch (QTException qte) {
          qte.printStackTrace( );
      }
  }
 
  public LevelMeterPlayer ( ) throws QTException {
      super ("Basic Audio Controls");
      // prompt for audio file
      QTFile file = QTFile.standardGetFilePreview(null);
      OpenMovieFile omf = OpenMovieFile.asRead (file);
      Movie movie = Movie.fromFile (omf);
      MovieController controller = new MovieController (movie);
      // get AudioMediaHandler for first audio track
      for (int i=1; i<=movie.getTrackCount( ); i++) {
          Track t = movie.getTrack(i);
          Media m = t.getMedia( );
          MediaHandler mh = m.getHandler( );
          if (mh instanceof AudioMediaHandler) {
              audioMediaHandler = (AudioMediaHandler) mh;
              break;
          }
      }
      if (audioMediaHandler =  = null) {
          System.out.println ("No audio track");
          System.exit(-1);
      }
      // add controller to GUI
      setLayout (new BorderLayout( ));
      Component comp =
          QTFactory.makeQTComponent(controller).asComponent( );
      add (comp, BorderLayout.NORTH);
      // add level meter to GUI
      meter = new LevelMeter( );
      add (meter, BorderLayout.SOUTH);
      // set up repainting timer
      Timer t = new Timer (50, new ActionListener( ) {
              public void actionPerformed (ActionEvent ae) {
                  meter.repaint( );
              }
          });
      t.start( );
  }
 
  class LevelMeter extends Canvas {
      public Dimension getPreferredSize( ) { return meterMinSize; }
      public Dimension getMinimumSize( ) { return meterMinSize; }
      public LevelMeter( ) throws QTException {
          MediaEQSpectrumBands bands =
              new MediaEQSpectrumBands (EQ_LEVELS.length);
          for (int i=0; i<EQ_LEVELS.length; i++) {
              bands.setFrequency (i, EQ_LEVELS[i]);
              audioMediaHandler.setSoundEqualizerBands (bands);
              audioMediaHandler.setSoundLevelMeteringEnabled (true);
          }
      }
 
      public void paint (Graphics g) {
          int gHeight = this.getHeight( );
          int gWidth = this.getWidth( );
 
          // draw baseline
          g.drawLine (0, gHeight, gWidth, gHeight);
          try {
              if (audioMediaHandler != null) {
                  int[  ] levels =
                     audioMediaHandler.getSoundEqualizerBandLevels(
                                                  EQ_LEVELS.length);
                  int maxHeight = gHeight - 1;
                  int barWidth = gWidth / levels.length;
                  int segInterval = gHeight / 20;
                  for (int i=0; i<levels.length; i++) {
                      // calculate height of each set of boxes,
                      // proportional to level
                      float levPct = ((float)levels[i]) / 255.0f;
                      // math is a little weird here; y axis has 0 at top,
                      // but we have 0 at bottom of this graph
                      int barHeight = (int) (levPct * maxHeight);
                      // draw the bar as set of 0-20 rectangles
                      int barCount = 0;
                      for (int j=maxHeight;
                           j > (maxHeight - barHeight);
                           j-=segInterval) {
                          switch (barCount) {
                          case 20:
                          case 19: 
                          case 18:
                              g.setColor (Color.red);
                              break;
                          case 17:
                          case 16:
                          case 15:
                              g.setColor (Color.yellow);
                              break;
                          default:
                              g.setColor (Color.green);
                          }
                          g.fillRect (i * barWidth,
                                      j - segInterval,
                                      barWidth - 1,
                                      segInterval - 1);
                          barCount++;
                      }
                  }
 
              }
          } catch (QTException qte) {
              qte.printStackTrace( );
          }
          
      }
 
  }
}

When run, this example provides the graphics-level display as shown in Figure 7-3.

Figure 7-3. Frequency bands displayed as a level meter

Image:QuickTime for Java: A Developer's Notebook I 7 tt204.png

What just happened?

This example sets up levels that, according to a demo in the native API, correspond to the same frequency bands metered by QuickTime Player:

  int[  ] EQ_LEVELS = {
      200,
      400,
      800,
      1600,
      3200,
      6400,
      12800,
      21000
  };

When the user opens a movie, the program finds the AudioMediaHandler of the first audio track and calls setSoundEqualizerBands() with these bands. Then it creates an instance of the LevelMeter inner class, along with a Swing Timer to repaint the level meter every 50 milliseconds.

When the repaint calls the meter's paint() method, it divides its width by the number of bands to figure out how wide each bar should be. The height takes a little more work: the returned levels are in the range 0 to 255, so the program calculates a "level percent" float by dividing by 255, then multiplying this by the height of the component. With the height and width of each frequency band, the component can draw a set of boxes, up to that height, to represent the band's level.

What about...

...the values passed in for frequencies and the number that can be passed in? Unfortunately, with no documentation for this feature, there's only trial-and-error to fall back on. One thing I've found is that you can have only 10 bands—you can pass in as many frequencies as you want, and you'll get that many back in the int array returned by getSoundLevelMeterLevels( ), but only the first 10 will have nonzero values.

Building an Audio Track from Raw Samples

As I've said many times before: movies have tracks, tracks have media, media have samples. But what are these samples? In the case of sound, they indicate how much voltage should be applied to a speaker at an instant of time. By itself, a sample is meaningless, but as a speaker is repeatedly excited and relaxed, it creates waves of sound that move through the air and can be picked up by the ear.

So, why would you want to do this? One plausible scenario is that you have code that generates this uncompressed pulse code modulation (PCM) data, like a decoder for some format that QuickTime doesn't support. By writing the raw samples to an empty movie, you can expose it to QuickTime and then play it, export it to QT-supported formats, and use other QuickTime-related functions.

How do I do that?

SoundMedia inherits an addSample( ) method from the Media class. This can be used to pack samples into a Media, which in turn can be added to a Track, which then can be added to a Movie.

But what values do you provide to create an audible sound? The example shown in Example 7-5 creates a square wave at a constant frequency. A square wave is one in which the voltage is either fully on or completely off. To create a 1000-hertz (Hz) tone, you write samples to alternate between full voltage and zero voltage, 1,000 times per second. Figure 7-4 shows a graph of sample values for the square wave.

Note

Run this example with ant run-ch07-audiosamplebuilder.

Example 7-5. Building audio media by adding samples

package com.oreilly.qtjnotebook.ch07;
 
import quicktime.*;
import quicktime.std.*;
import quicktime.std.movies.*;
import quicktime.std.movies.media.*;
import quicktime.io.*;
import quicktime.util.*;
 
import com.oreilly.qtjnotebook.ch01.QTSessionCheck;
 
public class AudioSampleBuilder extends Object {
 
static final int SAMPLING = 44100;
static final byte[  ] ONE_SECOND_SAMPLE = new byte [SAMPLING * 2];
static final int FREQUENCY = 262;
 
public static void main (String[  ] args) {
  try {
      QTSessionCheck.check( );
 
      QTFile movFile = new QTFile (new java.io.File("buildaudio.mov"));
      Movie movie =
          Movie.createMovieFile(movFile,
                        StdQTConstants.kMoviePlayer,
                        StdQTConstants.createMovieFileDeleteCurFile |
                        StdQTConstants.createMovieFileDontCreateResFile);
      
      System.out.println ("Created Movie");
      
      // create an empty audio track
      int timeScale = SAMPLING; // 44100 units per second
      Track soundTrack = movie.addTrack (0, 0, 1);
                                         
      System.out.println ("Added empty Track");
      
      // create media for this track
      Media soundMedia = new SoundMedia (soundTrack,
                                         timeScale);
      System.out.println ("Created Media");
 
      // add samples
      soundMedia.beginEdits( );
      
      // see native docs for other format consts
      int format = QTUtils.toOSType ("NONE");
 
      SoundDescription soundDesc = new SoundDescription(format);
      System.out.println ("Created SoundDescription");
 
      soundDesc.setNumberOfChannels(1);
      soundDesc.setSampleSize(16);
      soundDesc.setSampleRate(SAMPLING);
 
      for (int i=0; i<5; i++) {
          // build the one-second sample
          QTHandle mediaHandle =  buildOneSecondSample (i);
          
          soundMedia.addSample(mediaHandle, // QTHandleRef data,
                               0, // int dataOffset,
                               mediaHandle.getSize( ), // int dataSize,
                               1, // int durationPerSample,
                               soundDesc, // SampleDescription sampleDesc,
                               SAMPLING, // int numberOfSamples,
                               0 // int sampleFlags)
                               );
      }
 
      // finish editing and insert media into track
      soundMedia.endEdits( );
      System.out.println ("Ended edits");
      soundTrack.insertMedia (0, // trackStart
                              0, // mediaTime
                              soundMedia.getDuration( ), // mediaDuration
                              1); // mediaRate
      System.out.println ("inserted media");
 
      // save up 
      System.out.println ("Saving...");
      OpenMovieFile omf = OpenMovieFile.asWrite (movFile);
      movie.addResource (omf,
                         StdQTConstants.movieInDataForkResID,
                         movFile.getName( ));
      System.out.println ("Done");
 
      System.exit(0);
 
  } catch (QTException qte) {
      qte.printStackTrace( );
  }
} // main
 
/** Fill ONE_SECOND_SAMPLE with two-byte samples, according
  to some scheme (like square wave, sine wave, etc.)
  then wrap with QTHandle
 */
public static QTHandle buildOneSecondSample (int inTime)
  throws QTException {
  // convert inTime to sample count (i.e., how many samples
  // past 0 we are)
  int wavelengthInSamples = SAMPLING / FREQUENCY;
  int halfWavelength = wavelengthInSamples / 2;
  int sample = inTime * SAMPLING;
  for (int i=0; i<SAMPLING*2; i+=2) {
      int offset = sample % wavelengthInSamples;
      // square wave - bytes are either 7fff or 0000
      if (offset < halfWavelength) {
          ONE_SECOND_SAMPLE[i] = (byte) 0x7f;
          ONE_SECOND_SAMPLE[i+1] = (byte) 0xff;
      } else {
          ONE_SECOND_SAMPLE[i] = (byte) 0x00;
          ONE_SECOND_SAMPLE[i+1] = (byte) 0x00;
      }
      sample ++;
  }
  return new QTHandle (ONE_SECOND_SAMPLE);
}
}

Note

Run this example with ant-ch07-audiosamplebuilder.

When run, this creates a five-second, audio-only movie file called buildaudio.mov. Open it in QuickTime Player or an equivalent (like the level meter player from the previous lab) to listen to the file.

Note

Square waves are not easy on the ears. Turn down your speakers or headphones before you play this file.

What just happened?

Two constants at the beginning define important values. SAMPLING is the number of samples to be played every second. This example uses 44,100, which is the same as on a compact disc.

Tip

An important consideration for choosing a sampling frequency is the Nyquist-Shannon Sampling Theorem , which states that you need to sample at a rate double the highest frequency you want to capture. So, a sampling rate of 44,100 will properly reproduce frequencies less than 22,050 Hz. Given that human hearing typically ranges from 20 to 20,000 Hz, this effectively covers any humanly audible sound.

The FREQUENCY constant is the frequency of the sound wave to be produced. This example uses 262, which is approximately middle C on a piano.

Note

To be more precise, middle C is approximately 261.625565 Hz.

To start writing samples, you need a SoundMedia object and a place to put your data. The example does this by:

  1. Creating a new Movie with createMovieFile( ) . Using this approach—instead of the no-arg Movie constructor—has the benefit of indicating where the samples are to be stored.
  2. Adding a new track to the movie, with no size, and a volume of 1 (full volume).
  3. Creating a new SoundMedia object. This constructor takes the track the media is associated with and a time scale for the media. In this case, 44,100 is a good choice because then each sample will correspond to one unit of the media's time scale. You could use higher values, but not lower ones, because a sample can't be expressed as less than one unit of the time scale.
  4. Calling beginEdits() on the media to indicate that the program will be making changes to the media.

Most of the rest of the code in the example has to do with setting up the call to addSample() , which is somewhat tricky. The method takes seven arguments:

  • A QTHandleRef that points to the data to be added
  • An offset into the handle
  • The size of the data to be inserted
  • The durationPerSample—how much time the sample represents, in the media's time scale
  • A SampleDescription to describe the data in the handle
  • The number of samples being added with this call
  • Behavior flags

Note

See Chapter 2 for more on time scales.

The first thing to do is to create a SampleDescription that can be reused on every call to addSample( ). To do this, create a SoundDescription object. The constructor takes a "format" FOUR_CHAR_CODE, which for uncompressed data is "NONE".

Tip

Other valid formats are defined in "QuickTime API Reference: S0und Formats" on Apple's developer site.

Next, you customize the SampleDescription object with some setter methods to indicate the number of channels, the size of each sample in bits, and the sampling frequency. For this example, I used one channel and 16 bits per sample. This means that when the byte array with the data is parsed, QuickTime will take the data 2 bytes at a time and assume it to be a 16-bit value. If there were two channels, there would be 4 bytes per sample: two 2-byte samples, one for each speaker.

You might expect that you'd then simply loop through, adding one sample at a time to the Media and creating one second of audio every 44,100 times through the loop. Although this is legal, the resulting file won't actually play. The problem is that QuickTime wants you to put audio data in larger and more manageable chunks. To quote from the native AddMediaSample docs:

You should set the value of this parameter so that the resulting sample size represents a reasonable compromise between total data retrieval time and the overhead associated with input and output. [ . . . ] For a sound media, choose a number of samples that corresponds to between 0.5 and 1.0 seconds of sound. In general, you should not create groups of sound samples that are less than 2 KB in size or greater than 15 KB.

So, in this example, I've created a byte array to represent one second of samples, which is filled in a method called buildOneSecondSample( ). This method figures out where the waveform is at each sample time and writes either 0x7fff or 0x0000 to each 2-byte pair. Because the "NONE" format assumes signed shorts, 0x7fff is the maximum, not 0xffff.

With the byte array filled, you can wrap it with a QTHandle, and you're ready to call addSample( ) . The call looks like this:

soundMedia.addSample(mediaHandle, // QTHandleRef data,
                  0, // int dataOffset,
                  mediaHandle.getSize( ), // int dataSize,
                  1, // int durationPerSample,
                  soundDesc, // SampleDescription sampleDesc,
                  SAMPLING, // int numberOfSamples,
                  0 // int sampleFlags)
                  );

Once you're done adding samples, it's cleanup time. You use endEdits() to tell the Media you're done editing, then actually put the media into the track with Track.insertMedia() , which tells the track what parts of the media object to use and where it goes relative to the track's time scale. Finally, the movie is written to disk with the curiously named Movie.addResource( ) .

What about...

...some other kind of wave because hearing that square wave is really unpleasant? A sine wave offers a nicer alternative, because it is much more like a naturally occurring sound. Figure 7-5 shows what its waveform looks like.

The following alternate implementation of buildOneSecondSample( ) produces a sine wave—I didn't want to put it in the preceding example, which is already complicated enough without having to use trigonometry, like this does:

public static QTHandle buildOneSecondSample (int inTime)
  throws QTException {
  // convert inTime to sample count (i.e., how many samples
  // past 0 we are)
  int wavelengthInSamples = SAMPLING / FREQUENCY;
  int sample = inTime * SAMPLING;
  double twoPi = 2 * Math.PI;
  double radiansPerSample = twoPi / wavelengthInSamples;
  // each sample should be one n/th of twoPi
 
  for (int i=0; i<SAMPLING*2; i+=2) {
      int offset = sample % wavelengthInSamples;
      // sine wave
      double angle = offset * radiansPerSample;
      double sine = Math.sin (angle);
      // sines are -1<x<1.  we want from 0 to 0x7fff
      double heightD = (sine + 1) * (0x7fff / 2);
      // cast to int and fix endianness if on little (x86, etc.)
      short height = (short) heightD;
      // pack this into array as two bytes
      ONE_SECOND_SAMPLE [i] = (byte) ((height & 0xff00) >> 8);
      ONE_SECOND_SAMPLE [i+1] = (byte) (height & 0xff);
      sample ++;
  }
  return new QTHandle (ONE_SECOND_SAMPLE);
}

This implementation calculates the width of a wavelength in samples, then divides that into equal segments of a 2 radius for its calls to Math.sin( ) . The returned values are then translated so that instead of running from -1.0 to 1.0, they run from 0 to 0x7fff.

It's also worth noting that the middle C sine wave is pretty hard to hear over basic computer speakers. You might have better results with a frequency of 440, which is the A above middle C.

Personal tools