Blame view

tools/sph2pipe_v2.5/0readme.1st 14.9 KB
8dcb6dfcb   Yannick Estève   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
  
  README File for "sph2pipe"
  --------------------------
  
  1. Introduction
  
  The "sph2pipe" program was created by the Linguistic Data Consortium
  to provide greater flexibility and ease of use for SPHERE-formatted
  digital audio data.  It is equivalent in most respects to the related
  utility "sph_convert", but each of these tools provides some abilities
  that the other does not.  Here is a brief summary of the similarities
  and differences.
  
  Both sph_convert and sph2pipe will:
  
   - work on all Microsoft Windows systems, via the "MS-DOS" command
     line prompt
   - read any SPHERE-formatted data file and convert it to Microsoft
     RIFF ("WAV") format, Sun/Java AU format, MAC AIFF format or raw
     (headerless) format
   - automatically uncompress SPHERE files that have been compressed
     using the "shorten" algorithm (often used in LDC speech corpora)
   - allow demultiplexing of two-channel waveform data, to output one
     or the other channel alone
   - allow conversion of the sample data to 16-bit linear PCM or to
     8-bit mu-law encoding, regardless of the input sample encoding
  
  Only sph_convert can:
  
   - run on older (pre-OSX) Macintosh systems, via the old Mac-style GUI
   - do multiple file conversions in a single run (sph2pipe only does
     one file at a time); there are two methods for doing "batches":
        * treat all files in a chosen directory that match a
          user-specified file-name pattern, or 
        * treat all files in all subdirectories under a chosen
          base directory 
   - in either case, convert all SPHERE files and copy (or bypass) all
     non-SPHERE files 
  
  Only sph2pipe can:
  
   - run on UNIX systems (should also work on MacOS X, via its unix
     shell/command-line interface, using the "Terminal" utility)
   - provide SPHERE-formatted output as well as RIFF, AU, AIFF and raw
   - handle raw sample data as input, using a SPHERE header stored in a
     separate file.
   - trim off the beginning and/or end of the input data, to output just
     a user-specified segment based on either time or sample offsets
     (sph_convert always outputs the entire file)
   - write the output data to stdout, for redirection to any named file,
     or to a pipeline process (sph_convert always writes the data to a
     new file, with a name derived automatically from the input file)
   - support input and output of A-law speech data
  
  When installed on MS Windows or MacOS, these tools will produce RIFF
  output files by default; when compiled for UNIX systems (Linux, Solaris,
  etc), sph2pipe will output SPHERE format by default.  In any case, the
  user has the option to specify what format is desired -- any machine can
  be used to generate any kind of output.  (Well, a Mac that is running
  OS-9 or older cannot produce SPHERE output, but we haven't heard any
  requests for that...)
  
  Sph2pipe will not work on older Mac systems because the notion of a
  pipeline command did not exist on Macs prior to OS X.  Of course, it is
  possible to create custom-edited RIFF/AIFF/AU/raw files using sph2pipe
  on unix or wintel, then copy those files to an older Mac; but the
  combination of sph_convert and any of several waveform editing tools for
  Macs can provide all the functionality of sph2pipe, and then some.
  
  The "shorten" speech compression technique, used in the LDC's
  publication of many speech corpora, was developed by Tony Robinson,
  originally at Cambridge University; "shorten" is available from
  SoftSound, Inc. (http://www.softsound.com/Shorten.html).  The algorithm
  and source code for uncompressing "shortened" speech data are included
  here by permission of Tony Robinson and SoftSound, Inc.
  
  People who have used the original "shorten" package (dating from the
  mid-1990's) will find that sph2pipe is more much flexible, because of
  the range of options available for controlling output.  UNIX users who
  are familiar with the NIST SPHERE utilities "w_decode" and "w_edit"
  will find that sph2pipe runs faster and is easier to use, especially
  when extracting a subset of data from a compressed file: in this case
  sph2pipe alone handles a job that would require both w_decode and
  w_edit, and works a lot quicker (and also avoids a nasty bug in the
  sphere_2.6a package that can arise when you try to run w_decode and
  w_edit together in a pipeline).
  
  Note that sph2pipe and sph_convert are NOT able to do sample-rate
  conversion.  If you have a need for this, try the "SoX" package -- see
  under "Licensing" below for more information about SoX.
  
  
  2. Installation
  
  Wintel users can simply download the executable file (sph2pipe.exe) that
  has been precompiled for MS Windows/DOS systems, and start using it.
  (You can download the source files too, if you have your own C compiler
  and want to customize the program for your needs.)  UNIX and MacOS X
  users are advised to compile the program from the source code.
  
  To build from sources, download "sph2pipe_v2.4.tgz", and do this:
  
   -- if you have the Gnu version of tar (standard on linux):
  
       tar xzf sph2pipe_v2.4.tgz
  
   -- otherwise (with Wintel systems or non-Gnu versions of tar):
  
       gzip -c -d sph2pipe_v2.4.tgz | tar xf -
  
   -- then:
  
       cd sph2pipe_v2.4
  
       gcc -o sph2pipe *.c -lm     ## on unix
   or
       gcc -o sph2pipe.exe *.c -lm ## on wintel, using the djgpp compiler
  
  That's it -- no configuration scripts, makefiles or special libraries
  are needed (the source code consists of just 3 *.c files, and 3 *.h
  files; the standard math library is needed for compilation).  Put the
  resulting "sph2pipe" executable in your path and start using it.  If you
  don't have gcc, try whatever C compiler you do have; you might need to
  change a few details in sph_convert.h, but we hope the code is generic
  enough (POSIX compliant) to work anywhere.
  
  
  3. Usage
  
  The command line syntax is:
  
   sph2pipe [-h hdr] [-t|-s b:e] [-c 1|2] [-p|-u|-a] [-f typ] infile [outfile]
  
     -h hdr -- treat the input file as raw (headerless) sample data, and
           read header information from a separate file, given as the
           "hdr" argument; the "hdr" must contain a valid SPHERE header
           that correctly describes the nature of the input sample data
           ("hdr" may contain actual sample data as well, which will be
           ignored).  If the output format is "sph", the SPHERE header
           in "hdr" will be written first, with appropriate adjustments
           where needed.  (When this option is not used, "input" must
           begin with a valid SPHERE header.)
  
     -t b:e -- output only the portion of waveform data that lies
               between the stated beginning and ending points, given in
               seconds, as positive real numbers; "b" defaults to
               start-of-file, "e" defaults to end-of-file -- so the
               following usages are valid:
  
  	     "-t :10.05"  (output first 10.05 sec, skip the rest)
  	     "-t 4:"      (skip first 4 sec, output the rest)
  	     "-t 4:10.05  (output 6.05 sec, starting at 4 sec in)
  
     -s b:e -- output only the portion of waveform data that lies
  	     between the stated beginning and ending points, given in
  	     samples as positive integers; "b" defaults to
  	     start-of-file, "e" defaults to end-of-file -- so the
  	     following usages are valid:
  
  	     "-s :32000"    (output first 32K samples, skip the rest)
  	     "-s 8000:"     (skip first 8K samples, output the rest)
  	     "-s 8000:32000 (output 24K samples, starting at 8K in)
  
     -c 1 or -c 2 -- output only the first or second channel, in case
  	           input is two-channel (has no effect if input is
  	           single channel); default is to output all channels
  
     -p -- force 16-bit PCM output, in case input is something else (has
           no effect if input is already 16-bit PCM)
  
     -u -- force 8-bit mu-law output, in case input is 16-bit pcm (has
  	 no effect if input is already mu-law)
  
     -a -- force 8-bit a-law output, in case input is 16-bit pcm (has
  	 no effect if input is already a-law)
  
  	 The -p, -u and -a options are ignored if "-f aif" is used,
  	 because AIFF only supports PCM samples.  When none of these
  	 three is specified, the default behavior is to leave original
  	 sample format "as is" (or to force PCM if using "-f aif")
  
     -f fmt -- selects the output header format; "fmt" can be:
  	 rif (or wav) -- default for Wintel & Mac systems
  	 aif (or mac) -- similar to rif, but more Mac-ish...
  	 sph -- SPHERE format, default on unix systems
  	 au  -- common on Sun/Java/Next
  	 raw -- i.e. headerless
  
  If only one file name is given on the command line, output is written
  to stdout (i.e. for redirection via "> output.file", or for input to a
  pipeline).  If a second file name is given, output is written directly
  to a file with this name, and not to stdout; if the named output file
  already exists and contains data, its contents will be overwritten
  (replaced) by the sph2pipe output.
  
  If the output format is RIFF, AU, AIFF or SPH, a fully specified and
  correct file header is written first (*).  When writing via stdout to
  a pipeline, a downstream process can behave exactly as it would for a
  valid disk file in the target format (except that "seek()" does not
  work on stdin, of course).
  
  (*) Note: for SPHERE-formatted output, sph2pipe will eliminate the
  "sample_checksum" field, since this cannot be given a correct value
  prior to processing and writing the output data.  Also, when
  converting PCM input to mu-law or a-law, sph2pipe removes the
  "sample_byte_format" header field, which defines the byte order for
  16-bit sample data.  Apart from these two circumstances, the output
  sphere header retains all information in the original input header,
  along with appropriate changes, where necessary, to the sample_count,
  channel_count, sample_coding, sample_n_bytes, sample_byte_format and
  sample_sig_bits fields, making the header information consistent with
  the data being written.
  
  A useful benefit provided by pipeline operation is the ability to
  "compose" a single output file by concatenating any number of input
  files, or pieces of one or more input files.  For instance, to combine
  all the speech data in one directory into a single file for signal
  analysis (using bash as the command-line shell, which is available for
  wintel systems as well as for unix):
  
     $ for i in *.sph; do
     > sph2pipe -f raw $i >> allsph.raw
     > done
  
  Or, to put together a set of excerpts that you want to play back
  during your next PowerPoint presentation:
  
     sph2pipe -f raw -t 0:1 empty.sph > silence.raw
     sph2pipe -f sph -t 0:1 empty.sph > slideshow.sph
     sph2pipe -f raw -t 15.5:18.2 example1.sph >> slideshow.sph
     cat silence.raw >> slideshow.sph
     sph2pipe -f raw -t 300:305.5 example2.sph >> slideshow.sph
     cat silence.raw >> slideshow.sph
     sph2pipe -f raw -t 1832:1838 example3.sph >> slideshow.sph
     cat silence.raw >> slideshow.sph
     ...
     sph2pipe -f wav slideshow.sph > slideshow.wav
  
  Note the use of "raw" format to concatenate waveform data (we don't
  want file headers to be interspersed with the speech).  Also, in the
  second example, the sphere header that is initially created for
  "slideshow.sph" will be "numerically" correct only in reference to the
  initial one-second chunk; as more segments are appended to this file,
  the "sample_count" field in the header will be further and further
  from the truth.  But this doesn't matter -- at the final stage, when
  this file is converted to RIFF, sph2pipe will notice the discrepancy
  between the "sample_count" value in the header and the actual size of
  the file, and will automatically correct the sample_count to be
  consistent with the file size.
  
  There are important rules to follow when combining segments from
  multiple files.  If you happen to violate any of these rules, the
  resulting output will certainly come out sounding wrong (sometimes
  painfully so):
  
  (1) be sure that all the input files have the same sampling rate.
  (2) be sure to append data using a consistent number of channels,
         always a single channel, or always two channels
  (3) it's a good idea to specify "-p" on all runs -- or "-u" or "-a" on
         all runs -- to guarantee that the output file will have the same
         sample coding throughout, no matter what the original sample
         codings may have been in the source files
  
  When combining data from files in any single LDC corpus, these issues
  normally won't pose any problem: within a given corpus, all files tend
  to have the same properties.
  
  
  4. Version specific information
  
  This version will only convert one sphere file in one run, and must
  read that file directly from disk or cdrom (it does not accept input
  via stdin, because it must be able to do "fseek()" on the input file).
  Handling bunches of files is easily done on both unix and wintel
  systems using generic tools like the unix "bash" shell, the unix
  "find" utility, and/or the Perl or Python scripting languages; fully
  capable ports of all these tools are available for wintel systems.
  
   Version History:
  
   - Version 2.0 was the first "public" release; it did not support a-law
   sample coding, AU or AIFF output formats, the "-h hdrfile" option, or
   the "-s|-t bgn:end" options.  It contained a significant bug that arose
   when converting some 16-bit PCM sphere files to ulaw output.
  
   - Version 2.1 provided a fix for the pcm-to-ulaw bug.
  
   - Version 2.2 added the options for AU and AIFF output formats.
  
   - Version 2.3 added the "-s|-t" options to select regions for output
   based on sample or time offsets, and also added the "-h" option for
   using "stand-off" sphere headers with raw sample data files.
  
   - Version 2.4 added support for a-law sample coding, and added a
   thorough test suite, allowing end users to verify their installation;
   there were some minor bug fixes involving the "-h" option; the README
   file has also been revised to bring various URL's up to date.
  
   - Version 2.5 added the ability to include an output file name as a
   command line argument; this was done to avoid concerns on MS-Windows
   systems about some command-line shells that impose "text-mode"
   alterations to data when running commands with redirection or pipes.
  
  
  5. License
  
  Various portions of source code from Tony Robinson's "shorten-2.0"
  package are used here by permission of Tony Robinson and SoftSound,
  Inc. <http://www.softsound.com> -- these portions are found in the file
  "shorten_x.c"; please note the copyright information in that file.  By
  agreement with Tony Robinson and SoftSound, Inc, the Linguistic Data
  Consortium (LDC) grants permission to copy and use this software for the
  purpose of reading "shorten"-compressed speech data provided in NIST
  SPHERE file format by the LDC or others.  SoftSound provides useful
  tools for audio compression and other signal processing tasks.
  
  Other portions of source code (in particular the "writeRIFFHeader" and
  "writeAIFFHeader" functions in "file_headers.c", and the "alaw2pcm"
  conversion function) were adapted from the "SoX" package, a valuable
  open-source tool maintained primarily by Chris Bagwell, with assistance
  from many others (http://sox.sourceforge.net/).  We gratefully
  acknowledge the value provided by all contributors to SoX; sph2pipe
  would have been much harder to write without this resource.  We
  recommend that you use SoX if you need to do sample-rate conversion on
  audio data.