Roger B. Dannenberg
- External Control
- Preferences
- Unit Generator Design
- The Ugen Abstract Class
- Memory Management
- Instruments in Serpent
- End-of-Sound Actions
- Phase Vocoder and Pitch Shifting
This section describes protocols and interaction between a client and Arco, mainly the Audioio class that implements audio IO.
See also Internal States in audioio.cpp
.
Initially, the audio thread is run by the host, which must call
arco_thread_poll
periodically. If audio is being processed, the
audio thread control is safely handed off to audio callbacks, and
calls to arco_thread_poll
will return immediately.
Arco receives /arco/devinf
: gets return address and sends info
strings about audio devices, terminated by empty string.
Arco sends to reply address: info strings about audio devices.
Arco receives /arco/reset
: shuts down audio thread, deletes all
Ugens, sets reply service (here, we assume it is "actl"), sends
confirmation. Do this first, even before /arco/open
.
Arco sends /actl/reset
: confirms audio stream closed and Ugens are
deleted. Client can send /arco/open
again.
Client creates Zero and Thru Ugens for INPUT_ID
and PREV_OUTPUT_ID
with:
/arco/zero/new ZERO_ID
/arco/bzero/new ZEROB_ID
/arco/thru/new INPUT_ID input_channels ZERO_ID
/arco/thru/new PREV_OUTPUT_ID output_channels ZERO_ID
Note that audio input and previous output are represented by Thru
Ugens whose outputs are written directly by the Audioio callback, and
their real_run
methods are never called.
Arco receives /arco/open
: opens audio IO, specifying devices,
channels, latency and buffer size (in frames). (Control or reply
service is also specified, but by convention it should be the same
service passed to /arco/reset
. Here, we assume /actl
.
Arco sends /actl/starting
: gives actual device ids, actual channel
counts, actual latency and actual buffer size (in frames).
Arco receives /arco/close
: close audio stream, but keep Ugens in
place.
Arco sends /actl/closed
: confirms audio stream closed.
Preferences within Arco source code are confusing enough that you will need this to understand the code:
There are several sets of preference variables with different prefixes:
p_*
-- private to prefs module; values correspond to .arco prefs filearco_*
-- Used for getting preferences from the user interactively.actual_*
-- Parameters determined after devices were actually opened. These values are displayed in interfaces as the "Current value" of any parameter.
These prefixes are applied to each of the following suffixes. Here is what they mean:
*_in_id
-- PortAudio input device number*_out_id
-- PortAudio output device number*_in_name
-- PortAudio input device name (or substring thereof)*_out_name
-- PortAudio output device name (or substring thereof)*_in_chans
-- input channel count*_out_chans
-- output channel count*_buffer_size
-- samples per audio buffer (not necessarily the same as the Arco block length, BL, as Arco may need to compute multiple blocks to fill the buffer on each audio callback).*_latency
-- output latency in ms (the output buffer should be about this long, or if double-buffered, each buffer should be about this long. The input buffer should also be this long, but we assume it is empty when the output buffer is full, so input buffers do not contribute additional latency.)
General information flow:
actual_*
values are provided by PortAudio when devices are opened. They are used by the audio callback to interpret the audio stream.p_*
values are initialized from the preferences file.arco
arco_*
values are initialized fromp_*
values, and when the user sets validarco_*
values, they are used to updatep_*
values. (Setting to -1 or some indication of "default" is also valid and means "use default value instead of whatever might have been in the preferences file", i.e., "clear the preference".)- Values used to open PortAudio devices are the
p_*
values unless they are -1 in which case default values are used.
Preferences are only written on request by writing the current p_*
values, which can include -1 for "no preference".
Preferences inside/outside the server Applications pass in parameters to open audio: device ids, channel counts, buffer size and latency. -1 works to get default values, but it is up to applications to manage preferences.
Therefore preferences are on the "server side" and not visible to the "arco side". To allow inspection of actual values selected, the arco side sends actual values back to the control service, the name of which is provided in /arco/open.
On the "arco side", we have only defaults and whatever values are passed into the open operation.
Synchronization and Threads
Audio runs off a callback, so we should be careful with threads. All communication is through O2, with preferences handled on the main thread side. Actual values are sent from the audioio module via O2 so either a shared-memory server or a remote application can work the same way.
Unit generators are subclasses of Ugen (ugen.h). A Ugen has a rate
(either 'A' for audio or 'B' for block rate) and a Vec of outputs. To
make it easier to resolve types and perhaps to make code readable, the
output rate is determined by the choice of unit generator, e.g., use
mult
for an audio-rate multiplier and multb
for a block-rate
multiplier.
Inputs are more variable. The Ugen class does not specify any inputs, but any subclass with inputs adds instance variables for: (pointer to) input unit generator, stride for that unit generator (described below), and pointer to input samples.
Ugen objects have a method run
that is called on each input before
accessing input. The causes each input unit generator to compute its
output and may result in run
calls to inputs of the inputs, etc.
The run
method is inherited from Ugen
and checks to see if output
is already up-to-date, which happens if the same output fans out to
multiple inputs -- only the first input calling run
computes
anything, and other inputs reuse the output samples. If the output
samples are not up-to-date, run
calls real_run
, which each unit
generator overrides with its own DSP algorithm.
The trick is that there can be many different sorts of input
signals. With constant-, block-, and audio-rate inputs, and single-
vs. multi-channel signals, there are potentially 6 different
types. With N inputs, there are potentially 6^N variants of
real_run
, which quickly gets out of hand, even with automatic code
generation.
To solve this problem, we use two main strategies. First, code to
iterate over multiple channels of input and output signals is
implemented in a single real_run
method, but the processing of
different types of inputs is specialized by making an indirect method
call through run_channel
, which points to a method. E.g., in
mult
, there are run_channel
methods chan_aa_a
, which multiplies
two audio-rate channels (each a vector of 32 floats) to produce an
audio-rate channel, and chan_ab_a
, which mutiplies an audio-rate
channel by a block-rate channel (a single float). The variable
run_channel
is set to the proper method whenever inputs change so
that the correct specialized DSP computation is run.
The second strategy reduces the logic to iterate over input channels
to a single add instruction for each input before calling
run_channel
, so the overhead is the number of output channels x the
number of input signals. This is very small compared to the total
amount of data accessed and computed.
Each run_channel
method expects input to start at the address in
per-input sample pointers. The pointers are not changed by
run_channel
, but when it returns, sample pointers are incremented by
per-input stride quantities. For single-channel inputs, we want to
reuse the same input for each channel, so stride is 0. For
multi-channel audio inputs, we want to advance to the next channel of
the input. Input and output are consecutive in memory, so the stride
is the block length (BL = 32). For multi-channel block-rate inputs,
there is only one sample per block, so the stride is 1. Finally,
constant inputs (values that can be updated by messages) are treated
the same as block-rate signals. Constants can be single- or
multi-channel with strides of 0 or 1, respectively.
With these strategies, the combinatorics reduces from 6^N versions of the inner computation loop to 2^N versions, because we need different code for audio-rate and block-rate input. It is possible to restrict this further by insisting that some inputs must be audio-rate, e.g. it does not make much sense to apply an audio low-pass filter to a block rate signal.
FAUST is used mainly to create the inner loops for run_channel
methods. A code generator, written in Python, extracts the inner loops
from FAUST code, does some rewriting, and generates code for Arco. To
generate all the variants of run_channel
, FAUST is actually called
once for each variant, with source code generated from specification
file that includes both FAUST code and some annotations about which
variant are required. (link to details should go here.)
Every Ugen has:
rate
:'a'
(audio),'b'
(block), or'c'
(rate)chans
: number of output channelsblock
: block count, intially -1. Updated to current block number byrun(block)
output
: a vector of Samples.- A-rate output has one audio block per channel, flattened to an array of length
chans * BLOCK_LEN
. - B-rate output has
chans
samples.
- A-rate output has one audio block per channel, flattened to an array of length
states
: a vector ofchans
structures, where each structure contains all the state for one channel of the unit generatorinputs
: for each input, there is a source pointer of typeUgen *
and a pointer to the Ugen's output vector and a stride.
Computation basically involves a loop to compute each channel while updating input and output pointers.
Inputs must have either 1 channel or the same number of channels as the Ugen. If 1 channel, the same channel is used for each channel of the Ugen.
- audio rate input
- 1 channel: param_stride = 0
- n channel: param_stride = BL // next channel is where previous one ends
- block rate input
- 1 channel: param_stride = 0
- n channel: param_stride = 1 // 1 sample per channel
- constant rate input
- 1 channel: param_stride = 0
- n channel: param_stride = 1 // 1 sample per channel
Arco allows unit generators to be allocated on-the-fly so that signal processing can be reconfigured. An entire graph of unit generators can be instantiated to make a sound and freed when the sound ends. Arco uses reference counting to free unused unit generators.
Clients (applications) name unit generators with small
integers. Normally, the integer will index arrays on both the client
and Arco sides. The Arco side has an array of pointers to unit
generators, the ugen_table
. Unit generators are referenced by this
table as well as by any unit generator that they are connected
to. Reference counts are decremented when a unit generator is
disconnected from an input. The reference from the ugen_table
also
counts as one reference. You can free the integer index, which clears
the pointer from the ugen_table
to the unit generator and decrements
the reference count. When the last reference is removed, the reference
count becomes zero and the unit generator is deleted.
In the simplest case, a tree of unit generators is created by the
client, which frees all but its reference ID to the output (Ugen at
the root of the tree). To produce audio output, the /arco/output
message inserts the output ugen into the output_set
, which is a list
of ID's of unit generators whose outputs should be mixed to form the
audio output.
Eventually, the client can remove the ugen from the output_set
using
/arco/mute
and free the ID using the message address
/arco/free
. If the object is not connected to any inputs, it will be
freed. The reference counts will propagate up the tree, freeing all
the unit generators in the tree.
You can also use /arco/run
to insert a ugen ID into run_set
, which
is a list of unit generators to "run" before computing each block of
audio output. Typically, unit generators in the run_set
analyze
input audio (Arco's Vu meters are an example). Note that if a member
of the output_set
depends directly or indirectly on another ugen,
that ugen will be run automatically (the tree of unit generators is
traversed in a depth-first manager, running all input-providers before
computing output). Therefore, anything that an output depends on does
not need to be added to the run_set
.
On the client side, the integer "handles" or ID's for unit generators
can be managed in any way that is convenient. For small and mostly
static configurations, every unit generator ID can be stored in a
different variable for easy reference. The programmer must explicitly
free the ID by sending a message to /arco/free
to delete the
object.
In Serpent, I have created shadow classes and objects for unit generators, so the entire unit generator graph in Arco is mirrored in the client side control program. These objects encapsulate and manage reference counts so that the client code is written in terms of methods on the mirror objects. The client works as this level of abstraction and there are no direct messages to Arco. Languages with garbage collectors that call "cleanup code" when objects are freed could use this to free Arco IDs.
In this more elaborate scheme, the ID on the client side is actually
an object, not a simple integer. When all references to the object
(and thus all ability to referene the acutal Arco ugen) are deleted
and the object is garbage colleted, we can /arco/free
the ID, which
decrements the reference count and possibly deletes the ugen. (It could
still have other references within Arco and survive longer.)
Arco does not count output_set
items as references, so reference
counts cna go to zero even while an object is being output. This
allows for implicit disconnection from output when an object no
longer has an id. Similarly, in Serpent, we do not keep a list of
playing objects, so object IDs are garbage collected and the
corresponding Arco object is deleted when there are no more
references to the shadow object in Serpent.
The output_set
and run_set
are lists of ugens and affect
reference counts. Every ugen also has flag bits IN_OUTPUT_SET
and IN_RUN_SET
indicating membership. A ugen can only be in
either set once.
Serpent could use the basic integer-as-reference interface as is, but this means that Serpent users have to manage reference counts if they want to dynamically create and destroy unit generators in Arco. Serpent has garbage collection, so why not tie GC to Unit Generators?
Unfortunately, Serpent does not have programmable actions when Serpent objects are freed. This would require objects about to be freed to execute an "about to be freed" method, which would essentially create a new reference to the object and possibly create additional references to the object, making it unsafe to actually free it. Rather than solving this problem, we can make "external" objects that run C++ code when they are freed. This does not require entering the Serpent interpreter and as long as we require that the C++ code does not create new references, the object can be freed immediately after running the code.
For Ugens, we have an external object called Ugen_id that holds a 32-bit Arco id and a 32-bit epoch number. The epoch is incremented whenever Arco is reset, clearing all unit generators. After a reset, all references are invalid, so incrementing the epoch number is a way of invalidating them.
How safe should this be? Since users can construct arbitrary O2
message and send to Arco, it's always possible to circumvent any Ugen
management mechanism, e.g. we can send a message like /arco/free 25
. Since this loophole is available, we can be relaxed about other
loopholes, e.g. we can allow Serpent programs to extract integer id's
from Ugen_id objects, but we want to automate things for users as much
as possible to avoid mistakes.
Currently, users normally send O2 messages from Serpent using
o2_send_cmd
with a type string to specify types in the message. We
will add a special type character code "U" for Ugen that expects a
Ugen_id object, checks the epoch number to make sure it is valid,
extracts the Ugen_id, and adds it to the message using o2_add_int32
.
Ugen_id objects are created by consulting an array of integers. The array is initialized as a linked list, where each array element contains the index of the next array element. To create a Ugen_id, pop an entry off the linked list to get an integer. Mark the array element at that index with its own index to mean allocated. Set the current epoch number in the Ugen_id.
Note that Serpent can have any number of references to a Ugen_id and
it can be freely copied (because assignment of an object assigns the
object address). When a Ugen_id is freed by GC, we will send an
/arco/free
message and return the id to the free list.
If we construct messages in the normal way, the message construction
primitives beginning with o2_msg_start
and ending with
o2_msg_finish
are not reentrant, which means we cannot send a
message from within the garbage collector, which can be invoked
between Serpent instructions. To solve this, we'll have the GC keep
another list of freed id's and provide a C++-level function that once
called runs without invoking GC and sends /arco/free
messages for
all freed ids. We might optimize /arco/free
by allowing it to carry
an arbitrary number of id's so we can free them in a batch.
Since Serpent will have to periodically call a function to check for
freed id's and send /arco/free
there will be some delay between
freeing a Ugen on the Serpent (client) side and actually releasing the
id and Ugen on the Arco (server) side. There might be cases where we
want to release a Ugen immediately. We can add a delete
method to
the Serpent Ugen class that makes the Ugen invalid. The method will
call a function that immediately sends /arco/free
and marks the ugen
array entry with -id. Later, GC will actually free the reference, and
the -id will indicate that the /arco/free
message has been sent and
the id can be moved to the free list (not the to-be-freed list). (If
the array entry is id, it means the id is allocated. We put the array
entry in the to-be-freed list which will give the array entry the
value to another index or -1 (end of list). We want to delay moving
the id to the free list for two reasons: (1) we need to keep the array
entry -1 to tell GC that the id is already free, and (2) if we
immediately put the id back on the free list, another object could be
created for the id, so a second object would then have the
same id number.
Functions available to Serpent are:
-
arco_ugen_new()
-- allocate an id and construct a new Ugen_id -
arco_ugen_new_id(id)
-- construct a new Ugen_id from known id -
arco_ugen_id(ugen_id)
-- extract the Arco id from the Ugen_id object. Returns -1 if the epoch is not current or ugen_id was freed usingarco_ugen_free
. -
arco_ugen_epoch(ugen_id)
-- extract the epoch number from the Ugen_id object. -
arco_ugen_free(ugen_id)
-- explicitly free the associated Ugen. Thearco_ugen_id
will become -1 for the remaining lifetime of this object, thus any references to the object will also lose their ability to reference the Ugen. -
arco_ugen_gc()
-- when ugen_id objects are garbage collected, their id's are collected in a list, but the associated Ugens in Arco are not immediately freed. Callingarco_ugen_gc
will free the associated Ugens in Arco and return their id's to the pool of unused id's on the Serpent side for reuse byarco_ugen_new
. If there are no id's to be freed and more than half of the available Ugen id's are allocated,arco_ugen_gc
runs an incremental step of garbage collection. Garbage collection is somewhat self-balancing in Serpent because more memory can be allocated. The more memory allocated, the more memory is freed after each mark-sweep cycle, so GC becomes more efficient when computation produces more garbage. But Ugens have a fixed set of id's and we do not want to run out. If we allocate lots of Ugens with relatively little computation (GC activity is roughly proportional to the amount of computation performed), we could run out of id's. Therefore, we want to spend more time on GC when we are running short of id's. Arco expects periodic calls toarco_ugen_gc
even when there is nothing in particular to compute. This will give Arco a chance to make extra increments in the GC computation when running out of id's. If id's run out, it is pretty catastrophic, and the solutions are to recompile with a largerUGEN_TABLE_SIZE
or to callarco_ugen_gc
more often, or make more monolithic unit generators so you do not allocate so many in the first place. -
arco_ugen_reset()
-- increment the epoch, invalidating all current Ugen_id objects. Returns the new epoch number.
A client-side library for Arco is written in Serpent. You can use Arco directly, but the approach with Serpent is to create classes and objects that "shadow" the Arco objects and make it easier to allocate, free, connect, and control a collection of Ugens in Arco.
In Serpent (using the arco.srp library), you create unit generators by
calling functions, e.g. sine
to instantiate a Sine
Ugen in the
Arco server. An important abstraction in this library is the
Instrument
class, which represents a collection of unit generators
that are instantiated and destroyed as a unit. An Instrument
has no
direct representation on the Arco server side. There, it is just a set
of independent unit generators. On the Serpent (client) side, the
Instrument
is a real object. You can subclass Instrument
to make
various kinds of instruments, inheriting the general behavior, which
includes a set of named components, named parameters that can be set,
a designated output unit generator that represents the output of the
Instrument
, and a free
operation that releases all the unit
generators in the Instrument
.
To define an Instrument
, the init
method of a subclass calls
instr_begin
which initializes a container for components of the
instrument. As each component is created, you call member
to say
that the created Ugen is a member of the instrument, e.g.,
var env = member(pwlb(dur * 0.2, 0.01, dur * 0.8), 'env')
This example creates a piece-wise linear envelope generator (pwlb
),
adds it as a member to the instrument, and gives it the name
'env'
. It also keeps a reference to the Pwlb
in the local variable
env
. Later, you can use instrument.get('env')
to retrieve the
pwlb
component, e.g. to start the envelope:
instrument.get('env').start()
Alternatively, you could also store it
as an instance variable and write a start
method for the instrument,
e.g.:
def start(): env.start() // new method in instrument
...
instrument.start() // called to start the instrument's envelope
You can also create named, setable parameters. The problem here is
that you might want your instrument to have parameters freq
or
amp
, but the instrument is not a real unit generator and has no
inherent parameters. You could write a method for each parameter, but
that is a little awkward. Consider implementing a settable frequency:
var mysine // this is in the class declaration
def init():
...
mysine = member(sine(220.0, 0.5), 'sine')
...
def set_freq(f): mysine.set('freq', 440.0)
Then, you can change frequency by calling: instrument.set_freq(440.0)
.
Alternatively, there is a special set
method in Instrument that
avoids defining a method for every parameter. To make this work, call
param
in init
. For example, consider this from the instrument
init
function:
def init():
...
var mysine = sine(220.0, 0.5)`, you can call:
param(mysine, 'freq', 'myfreq')
...
This creates a Sine Ugen in the instrument and says that 'myfreq'
will name the sine's 'freq'
parameter. Then, you can call
instrument.set('myfreq', 440.0)
.
A general problem is: once you launch a collection of Ugens to make a sound, how do you reclaim them when the sound event completes?
A good example is the Pwlb subclass of Ugen, which is a general-purpose piece-wise linear function generator with output at the block rate, and typically used as an amplitude envelope. We want to get a notice when an instance of Pwlb reaches the last breakpoint (typically with amplitude zero) in its list of breakpoints.
The mechanism is an O2 message to any designated service with an ID
number representing the end-of-envelope event. To request a message,
send /arco/pwlb/act id action_id
to the Pwlb instance (indicated by
the id
parameter). The action_id
is an arbitrary int32
integer
other than 0, which means "no action notice." When the pwlb envelope
ends, a message is sent: /actl/act action_id
, where actl
is the
control service (ctrlservice
) parameter passed in the /arco/open
message to initialize the Arco server.
Setting up end-of-sound actions is not trivial. Consider playing a sound from a file into a mixer input. When the sound (from a Recplay object) finishes, we want to remove the sound from the mixer, but that means (1) remember a name for the mixer input, (2) schedule an end-of-sound action that takes the name and mixer as parameters, (3) write the end-of-sound method to uninsert the named ugen from the mixer. Simply invoking a built-in action like MUTE or FINISH is not sufficient.
Alternatively, in Nyquist, the file playback sound would simply end. The "mixer" would consist of a gain multiplier and an adder. The gain multiplier would detect a terminated input, and because multiplication by zero is zero, the product sound would terminate. Then, addition by zero is the identity, to the adder would be replaced by the other adder input (add are always binary, so a mixer would have a chain or tree of adders).
We can have something similar in Arco. Termination could be a bit maintained by every Ugen. The run method, before calling real_run(), can check for termination. If terminated, it can just return out_samps without further evaluation. The caller can also check the terminate bit and invoke some action; e.g. Mix and Add should remove inputs that terminate, and filters should terminate their output when their input terminates, but possibly after a delay or detecting the output had decayed to near zero.
However, we need to make termination an option; e.g. even envelopes in Arco can be re-started, so automatically terminating everything that seems to stop is not workable. That also means that when a unit generator is created, the default should be that it runs forever and an option is for it to report termination.
What would a filter look like in this scheme? Instead of
void real_run() {
snd_samps = snd->run(current_block); // update input
cutoff_samps = cutoff->run(current_block); // update input
...
we could write
void real_run() {
snd_samps = snd->run(current_block); // update input
cutoff_samps = cutoff->run(current_block); // update input
if ((snd_samps->flags & cutoff_samps->flags & TERMINATED) &&
(flags & CAN_TERMINATE)) {
flags |= TERMINATED; // propagate terminated input to output
}
...
Note that termination will propagate without delay. What if we need to delay propagating termination until the output decays to zero? We do not have timed events within the audio thread, so we have to use a counter. Overhead is only incurred after an input terminates:
void real_run() {
snd_samps = snd->run(current_block); // update input
cutoff_samps = cutoff->run(current_block); // update input
if ((snd_samps->flags & cutoff_samps->flags & TERMINATED) &&
(flags & CAN_TERMINATE)) {
terminate();
}
...
}
// this is inherited from Ugen. tail_blocks is a new member
// variable of Ugens that is initialized to zero, but set by
// the method that sets CAN_TERMINATE.
void terminate() { // this Ugen will terminate after tail_blocks
if (!(flags & TERMINATING)) {
terminate_count = tail_blocks;
flags |= TERMINATING; // start the count
}
if (terminate_count-- == 0) {
flags |= TERMINATED;
on_terminate(); // a virtual method called at most once
}
}
In Serpent, this mechanism is used to implement end-of-sound
actions. The most important is muting an instrument when an envelope
ends. To do this for an envelope env
, we call env.atend(MUTE, this)
, where atend
should be read "at end," MUTE is the action to
take, and this
is an optional parameter, which in this case refers
to the Instrument or Ugen we want to mute. (MUTE is the only
implemented action at this point.)
atend
calls create_action
which makes a unique new action_id
and
sends /arco/pwlb/act id action_id
. It also adds an action
description to a dictionary mapping action ids to action descriptions.
When the envelope ends, the handler for /actl/act action_id
retrieves the action from the dictionary and calls it, invoking
instrument.mute()
, where instrument
is the argument that was the
Instrument (this
) passed to atend
. The mute
method removes the
instrument from the Arco output collection and unreferences the
instrument, which normally frees the instrument and all its component
Ugens.
The MUTE
action is only good for removing an Instrument or Ugen from
the output list in Arco, but the create_action
method can specify
any object, method and parameters to invoke, and you can even tie
multiple method invocations to a single end-of-envelope action.
The FINISH
action is a more general action that invokes the finish
method of the instrument, where "the instrument" means the this
parameter passed to atend
, e.g. x
in env.atend(FINISH, x)
. In
this case, finish
is passed a status parameter, so it should
probably be declared def finish(optional status)
.
The mixer ins
method inserts a unit generator signal and a separate
gain into the mix, and there is an optional atend
parameter which
can be one of:
-
SIGNAL
requires the unit generator to be an Instrument with a parameter namedenvelope
that can send an action when it ends. When the envelope ends, the unit generator and the gain control are removed from the mix. -
GAIN
requires that the gain be aPwllb
envelope. When the envelope ends, both the unit generator and gain control are removed from the mix.
These are notes on the pv
unit generator. For pitch shifting, we
need to stretch by ratio and then downsample by ratio. Should we
stretch first or resample first? One reason to stretch first is that
the pitch can be changed instantly by changing resampling, as opposed
to resampling first and then adding the latency and uncertainty of the
FFTs used for stretching.
In the CMUPV library, ratio
is the reciprocal of stretch, so ratio > 1
means the output sound will be shorter than the input sound.
For the Pv Unit Generator, ratio
is the stretch factor and the
pitch shift ratio, so ratio > 1
means we stretch the input to get a
longer intermediate CMUPV output. Then we resample this output to be
shorter, causing the pitch to be higher.
In this description we use the Pv UG definition of ratio
, and may
refer to the CMUPV library ratio as pvratio
.
Let the stretch ratio and pitch shift ratio be R, so resampling ratio is 1/R. To stretch, we need N samples in a buffer, where N is the FFT size. The FFT steps through the buffer by N/8 when the ratio is 1, so it steps by N/(8R) to stretch by R. We want the FFT to operate on contiguous samples. We could shift the buffer by N/(8R) after each FFT, or we could make the buffer longer and shift only when we need to. That seems to be a better plan and easier to implement, as it handles cases where R changes often. So let's make the buffer 2N in size, restricting R to >1/8. (Although for shifts of more than 3 octaves, ratio = 8, the reconstruction hop size can be arbitrarily small, leading to greater overlap of the analysis windows.)
To minimize latency, we will let the resampling drive the phase vocoder. There is an input buffer of size 2N. When the input would overflow the buffer, the buffer is shifted by N samples, which retains N samples for an FFT, and opens space for the next N samples. Therefore, an FFT can always be computed from a continguous span of the most recent N samples.
The phase vocoder algorithm produces H samples after each FFT, where H is the hop size. An output buffer contains the signal to be resampled. This buffer is of size H + M, where M is the number of samples of history needed for resampling. E.g. for 9 point sinc interpolation, M = 4. Output is generated by resampling the output buffer. When the resampling reaches the end of the output buffer, the output buffer is shifted by H and H samples are obtained from the phase vocoder to refill the buffer.
When new samples are needed for the output directory, we need to determine where to compute the next FFT. One natural approach would be to advance the analysis window by H/R to compensate for the resampling. However, if the resample rate is changed continuously and if the analysis window location is quantied to the nearest sample, there is the risk that the time stretching and resampling will not exactly match, causing either buffer overflow or buffer underflow. A simpler scheme is simply to compute an FFT of the last available N samples in the input buffer. This will automatically minimize the latency and avoid any rounding or quantization errors.
Does it matter that input arrives in blocks of 32 or that FFTs will always be aligned to 32-sample boundaries? I think since the FFTs are in the 1024 to 4096 size range that shifting by 32 samples will have no audible effect as long as the phase is adjusted accordingly.
This approach, where we take the most recent N samples as the analysis
window, is not standard for phase vocoders, so we'll need to interface
with the phase vocoder to cause it to compute an input hop size that moves
the FFT window to the end of the buffer. With CMUPV, the API says we can
set ratio before each request for an output block. We will use an output
block size of H so that each output block will require exactly one analysis
window, so we can adjust ratio before each analysis. With CMUPV, we want
to set ratio such that pv_get_input_count
returns the right number of
samples. E.g. suppose the last window ended at I and the last sample in
the buffer is at J. Then we want pv_get_input_count
to return J - I.
Reading through pv_get_input_count
code, we see
ana_hopsize = lroundf((pv->syn_hopsize) * (pv->ratio));
so roughly, ana_hopsize = H * ratio
and
int need = pv->blocksize - (int)(pv->frame_next - pv->out_next);
Since we are requesting block sizes equal to the hop size, need = H
.
Next, int frames = (need + pv->syn_hopsize - 1) / pv->syn_hopsize;
so frames = 1
as expected.
Next, pv_get_input_count
computes
pv->input_head += ana_hopsize;
and then
need = pv->fftsize + ana_hopsize * (frames - 1);
but frames == 1
,
so need == fftsize
(yes, we're expecting to need a single FFT frame).
Next, long have = (long) (pv->input_rear - pv->input_head);
where
input_rear
is the index of the end of the input data, which should be
the end of the previous FFT analysis window. Next, we have need -= have
.
Let prevwin
be the location of the previous window. Assuming we had
just enough samples for that window, pv->input_rear
is the end of the
bufffer = prevwin + fftsize
, pv->input_head = prevwin + ana_hopsize
,
have = pv->input_rear - pv->input_head
= (prevwin + fftsize) - (prevwin + ana_hopsize)
= fftsize - ana_hopsize
, so after need -= have
we get
need = fftsize - (fftsize - ana_hopsize) = ana_hopsize
. So CMUPV
is going to ask for ana_hopsize
(based on ratio) every time we ask
for more output. So this is actually very simple: Initially, we zero
the input buffer, consider the buffer to contain N samples, expect the
first request for samples to ask for a full frame of N samples, which
we deliver from sample 0 to N-1.
We remember prev_input_end
as the location of the end of the
previous input to the phase vocoder (passed in by pv_put_input()
).
We want the next input to consume all the samples from
prev_input_end
to the end of the input buffer, which we can call
available
. Therefore, we set ratio such that
available == lroundf(H * ratio)
, or ratio = available / H
, or
given that CMUPV uses the reciprocal of this, pvratio = H / available
.
There is a limit to ana_hopsize
, so ratio
cannot be too small.
Setting ratio
too low can cause the input buffer to overflow.
The maximum hopsize is N / 3 and the "normal" hopsize for ratio = 1
is N / 4, so the smallest ratio is 3 / 4. However 4 is just N / H, so
in general, the smallest ratio is 3 * H / N. E.g. if we set H to N / 8,
the smallest ratio is 3 / 8, or more than 1 octave down, and H = N / 16,
e.g., N = 2048, H = 128, allows more than 2 octaves down. The unit
generator should limit ratio to be > 3 * H / N to avoid unworkable
parameters.
Following Analog Devices dsp_book_Ch16.pdf, we will use a windowed sinc function and a Blackman window:
w[i] = 0.42 - 0.5 * cos(2*PI*i/M) + 0.08 * cos(4 * PI * i/M)
where M is the number of points - 1. From this, we'll construct a windowed sinc function. When we up-sample, the sinc function is a reconstruction filter to interpolate between the input samples, so the zeros are spaced according to the input. The sinc function has a span of 9 input samples.
When we down-sample, the sinc function is a low-pass filter with zeros spaced according to the output. This means we will sum over about 8/R samples, e.g. if we downsample by one octave, we have to sum over 16 points. We could use a constant number, but that would require a different kernel for each ratio R, and that would be very expensive to construct.
There is some code to run tests on resampling in arco/cmupv/src. Some results and comments in arco/cmupv/src/testdata.txt say:
Notes: Large windows work best, so results with a small window of, say, 8 may not be great. With a window size of 8, transposing down seems to work better, and transposing down really sums 8 points (4 on each side), whereas when you tranpose up, you need to lowpass to reduce aliasing, and for that the window size is greater than 8; e.g., if you transpose up an octave, the windowed sinc function has to be twice as wide, so it spans 16 samples of input.
For 8 (which we should at least try because it is much faster than a really accurate interpolation), an oversampling factor of 32 seems reasonable. The windowed sinc function takes 8 * 32 * 4 * 2 = 2048 bytes, so not a problem. Unfortunately, substantial gains in performance require larger windows, not better interpolation.
It seems like 8-sample windows do not do much for anti-aliasing, i.e., just not a sharp cutoff at the Nyquist frequency. Getting a faster cutoff requires a long filter, so maybe it is not suitable for real-time, and the best approach is simply to transpose down, not up.
Execution time with a window of 8 is about 2000 times real time.