0readme.1st
14.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
README File for "sph2pipe"
--------------------------
1. Introduction
The "sph2pipe" program was created by the Linguistic Data Consortium
to provide greater flexibility and ease of use for SPHERE-formatted
digital audio data. It is equivalent in most respects to the related
utility "sph_convert", but each of these tools provides some abilities
that the other does not. Here is a brief summary of the similarities
and differences.
Both sph_convert and sph2pipe will:
- work on all Microsoft Windows systems, via the "MS-DOS" command
line prompt
- read any SPHERE-formatted data file and convert it to Microsoft
RIFF ("WAV") format, Sun/Java AU format, MAC AIFF format or raw
(headerless) format
- automatically uncompress SPHERE files that have been compressed
using the "shorten" algorithm (often used in LDC speech corpora)
- allow demultiplexing of two-channel waveform data, to output one
or the other channel alone
- allow conversion of the sample data to 16-bit linear PCM or to
8-bit mu-law encoding, regardless of the input sample encoding
Only sph_convert can:
- run on older (pre-OSX) Macintosh systems, via the old Mac-style GUI
- do multiple file conversions in a single run (sph2pipe only does
one file at a time); there are two methods for doing "batches":
* treat all files in a chosen directory that match a
user-specified file-name pattern, or
* treat all files in all subdirectories under a chosen
base directory
- in either case, convert all SPHERE files and copy (or bypass) all
non-SPHERE files
Only sph2pipe can:
- run on UNIX systems (should also work on MacOS X, via its unix
shell/command-line interface, using the "Terminal" utility)
- provide SPHERE-formatted output as well as RIFF, AU, AIFF and raw
- handle raw sample data as input, using a SPHERE header stored in a
separate file.
- trim off the beginning and/or end of the input data, to output just
a user-specified segment based on either time or sample offsets
(sph_convert always outputs the entire file)
- write the output data to stdout, for redirection to any named file,
or to a pipeline process (sph_convert always writes the data to a
new file, with a name derived automatically from the input file)
- support input and output of A-law speech data
When installed on MS Windows or MacOS, these tools will produce RIFF
output files by default; when compiled for UNIX systems (Linux, Solaris,
etc), sph2pipe will output SPHERE format by default. In any case, the
user has the option to specify what format is desired -- any machine can
be used to generate any kind of output. (Well, a Mac that is running
OS-9 or older cannot produce SPHERE output, but we haven't heard any
requests for that...)
Sph2pipe will not work on older Mac systems because the notion of a
pipeline command did not exist on Macs prior to OS X. Of course, it is
possible to create custom-edited RIFF/AIFF/AU/raw files using sph2pipe
on unix or wintel, then copy those files to an older Mac; but the
combination of sph_convert and any of several waveform editing tools for
Macs can provide all the functionality of sph2pipe, and then some.
The "shorten" speech compression technique, used in the LDC's
publication of many speech corpora, was developed by Tony Robinson,
originally at Cambridge University; "shorten" is available from
SoftSound, Inc. (http://www.softsound.com/Shorten.html). The algorithm
and source code for uncompressing "shortened" speech data are included
here by permission of Tony Robinson and SoftSound, Inc.
People who have used the original "shorten" package (dating from the
mid-1990's) will find that sph2pipe is more much flexible, because of
the range of options available for controlling output. UNIX users who
are familiar with the NIST SPHERE utilities "w_decode" and "w_edit"
will find that sph2pipe runs faster and is easier to use, especially
when extracting a subset of data from a compressed file: in this case
sph2pipe alone handles a job that would require both w_decode and
w_edit, and works a lot quicker (and also avoids a nasty bug in the
sphere_2.6a package that can arise when you try to run w_decode and
w_edit together in a pipeline).
Note that sph2pipe and sph_convert are NOT able to do sample-rate
conversion. If you have a need for this, try the "SoX" package -- see
under "Licensing" below for more information about SoX.
2. Installation
Wintel users can simply download the executable file (sph2pipe.exe) that
has been precompiled for MS Windows/DOS systems, and start using it.
(You can download the source files too, if you have your own C compiler
and want to customize the program for your needs.) UNIX and MacOS X
users are advised to compile the program from the source code.
To build from sources, download "sph2pipe_v2.4.tgz", and do this:
-- if you have the Gnu version of tar (standard on linux):
tar xzf sph2pipe_v2.4.tgz
-- otherwise (with Wintel systems or non-Gnu versions of tar):
gzip -c -d sph2pipe_v2.4.tgz | tar xf -
-- then:
cd sph2pipe_v2.4
gcc -o sph2pipe *.c -lm ## on unix
or
gcc -o sph2pipe.exe *.c -lm ## on wintel, using the djgpp compiler
That's it -- no configuration scripts, makefiles or special libraries
are needed (the source code consists of just 3 *.c files, and 3 *.h
files; the standard math library is needed for compilation). Put the
resulting "sph2pipe" executable in your path and start using it. If you
don't have gcc, try whatever C compiler you do have; you might need to
change a few details in sph_convert.h, but we hope the code is generic
enough (POSIX compliant) to work anywhere.
3. Usage
The command line syntax is:
sph2pipe [-h hdr] [-t|-s b:e] [-c 1|2] [-p|-u|-a] [-f typ] infile [outfile]
-h hdr -- treat the input file as raw (headerless) sample data, and
read header information from a separate file, given as the
"hdr" argument; the "hdr" must contain a valid SPHERE header
that correctly describes the nature of the input sample data
("hdr" may contain actual sample data as well, which will be
ignored). If the output format is "sph", the SPHERE header
in "hdr" will be written first, with appropriate adjustments
where needed. (When this option is not used, "input" must
begin with a valid SPHERE header.)
-t b:e -- output only the portion of waveform data that lies
between the stated beginning and ending points, given in
seconds, as positive real numbers; "b" defaults to
start-of-file, "e" defaults to end-of-file -- so the
following usages are valid:
"-t :10.05" (output first 10.05 sec, skip the rest)
"-t 4:" (skip first 4 sec, output the rest)
"-t 4:10.05 (output 6.05 sec, starting at 4 sec in)
-s b:e -- output only the portion of waveform data that lies
between the stated beginning and ending points, given in
samples as positive integers; "b" defaults to
start-of-file, "e" defaults to end-of-file -- so the
following usages are valid:
"-s :32000" (output first 32K samples, skip the rest)
"-s 8000:" (skip first 8K samples, output the rest)
"-s 8000:32000 (output 24K samples, starting at 8K in)
-c 1 or -c 2 -- output only the first or second channel, in case
input is two-channel (has no effect if input is
single channel); default is to output all channels
-p -- force 16-bit PCM output, in case input is something else (has
no effect if input is already 16-bit PCM)
-u -- force 8-bit mu-law output, in case input is 16-bit pcm (has
no effect if input is already mu-law)
-a -- force 8-bit a-law output, in case input is 16-bit pcm (has
no effect if input is already a-law)
The -p, -u and -a options are ignored if "-f aif" is used,
because AIFF only supports PCM samples. When none of these
three is specified, the default behavior is to leave original
sample format "as is" (or to force PCM if using "-f aif")
-f fmt -- selects the output header format; "fmt" can be:
rif (or wav) -- default for Wintel & Mac systems
aif (or mac) -- similar to rif, but more Mac-ish...
sph -- SPHERE format, default on unix systems
au -- common on Sun/Java/Next
raw -- i.e. headerless
If only one file name is given on the command line, output is written
to stdout (i.e. for redirection via "> output.file", or for input to a
pipeline). If a second file name is given, output is written directly
to a file with this name, and not to stdout; if the named output file
already exists and contains data, its contents will be overwritten
(replaced) by the sph2pipe output.
If the output format is RIFF, AU, AIFF or SPH, a fully specified and
correct file header is written first (*). When writing via stdout to
a pipeline, a downstream process can behave exactly as it would for a
valid disk file in the target format (except that "seek()" does not
work on stdin, of course).
(*) Note: for SPHERE-formatted output, sph2pipe will eliminate the
"sample_checksum" field, since this cannot be given a correct value
prior to processing and writing the output data. Also, when
converting PCM input to mu-law or a-law, sph2pipe removes the
"sample_byte_format" header field, which defines the byte order for
16-bit sample data. Apart from these two circumstances, the output
sphere header retains all information in the original input header,
along with appropriate changes, where necessary, to the sample_count,
channel_count, sample_coding, sample_n_bytes, sample_byte_format and
sample_sig_bits fields, making the header information consistent with
the data being written.
A useful benefit provided by pipeline operation is the ability to
"compose" a single output file by concatenating any number of input
files, or pieces of one or more input files. For instance, to combine
all the speech data in one directory into a single file for signal
analysis (using bash as the command-line shell, which is available for
wintel systems as well as for unix):
$ for i in *.sph; do
> sph2pipe -f raw $i >> allsph.raw
> done
Or, to put together a set of excerpts that you want to play back
during your next PowerPoint presentation:
sph2pipe -f raw -t 0:1 empty.sph > silence.raw
sph2pipe -f sph -t 0:1 empty.sph > slideshow.sph
sph2pipe -f raw -t 15.5:18.2 example1.sph >> slideshow.sph
cat silence.raw >> slideshow.sph
sph2pipe -f raw -t 300:305.5 example2.sph >> slideshow.sph
cat silence.raw >> slideshow.sph
sph2pipe -f raw -t 1832:1838 example3.sph >> slideshow.sph
cat silence.raw >> slideshow.sph
...
sph2pipe -f wav slideshow.sph > slideshow.wav
Note the use of "raw" format to concatenate waveform data (we don't
want file headers to be interspersed with the speech). Also, in the
second example, the sphere header that is initially created for
"slideshow.sph" will be "numerically" correct only in reference to the
initial one-second chunk; as more segments are appended to this file,
the "sample_count" field in the header will be further and further
from the truth. But this doesn't matter -- at the final stage, when
this file is converted to RIFF, sph2pipe will notice the discrepancy
between the "sample_count" value in the header and the actual size of
the file, and will automatically correct the sample_count to be
consistent with the file size.
There are important rules to follow when combining segments from
multiple files. If you happen to violate any of these rules, the
resulting output will certainly come out sounding wrong (sometimes
painfully so):
(1) be sure that all the input files have the same sampling rate.
(2) be sure to append data using a consistent number of channels,
always a single channel, or always two channels
(3) it's a good idea to specify "-p" on all runs -- or "-u" or "-a" on
all runs -- to guarantee that the output file will have the same
sample coding throughout, no matter what the original sample
codings may have been in the source files
When combining data from files in any single LDC corpus, these issues
normally won't pose any problem: within a given corpus, all files tend
to have the same properties.
4. Version specific information
This version will only convert one sphere file in one run, and must
read that file directly from disk or cdrom (it does not accept input
via stdin, because it must be able to do "fseek()" on the input file).
Handling bunches of files is easily done on both unix and wintel
systems using generic tools like the unix "bash" shell, the unix
"find" utility, and/or the Perl or Python scripting languages; fully
capable ports of all these tools are available for wintel systems.
Version History:
- Version 2.0 was the first "public" release; it did not support a-law
sample coding, AU or AIFF output formats, the "-h hdrfile" option, or
the "-s|-t bgn:end" options. It contained a significant bug that arose
when converting some 16-bit PCM sphere files to ulaw output.
- Version 2.1 provided a fix for the pcm-to-ulaw bug.
- Version 2.2 added the options for AU and AIFF output formats.
- Version 2.3 added the "-s|-t" options to select regions for output
based on sample or time offsets, and also added the "-h" option for
using "stand-off" sphere headers with raw sample data files.
- Version 2.4 added support for a-law sample coding, and added a
thorough test suite, allowing end users to verify their installation;
there were some minor bug fixes involving the "-h" option; the README
file has also been revised to bring various URL's up to date.
- Version 2.5 added the ability to include an output file name as a
command line argument; this was done to avoid concerns on MS-Windows
systems about some command-line shells that impose "text-mode"
alterations to data when running commands with redirection or pipes.
5. License
Various portions of source code from Tony Robinson's "shorten-2.0"
package are used here by permission of Tony Robinson and SoftSound,
Inc. <http://www.softsound.com> -- these portions are found in the file
"shorten_x.c"; please note the copyright information in that file. By
agreement with Tony Robinson and SoftSound, Inc, the Linguistic Data
Consortium (LDC) grants permission to copy and use this software for the
purpose of reading "shorten"-compressed speech data provided in NIST
SPHERE file format by the LDC or others. SoftSound provides useful
tools for audio compression and other signal processing tasks.
Other portions of source code (in particular the "writeRIFFHeader" and
"writeAIFFHeader" functions in "file_headers.c", and the "alaw2pcm"
conversion function) were adapted from the "SoX" package, a valuable
open-source tool maintained primarily by Chris Bagwell, with assistance
from many others (http://sox.sourceforge.net/). We gratefully
acknowledge the value provided by all contributors to SoX; sph2pipe
would have been much harder to write without this resource. We
recommend that you use SoX if you need to do sample-rate conversion on
audio data.