文章目录
前言
在之前的文章中,博主大致介绍了3A的作用、使用场景,并且以webrtc开源框架,说明了在webrtc中音频3A开启的方法和选项,以及对于3A模块实例如何塞入近端信号和远端信号的流程结合代码做了详细介绍。
webrtc中的SplittingFilter的主要作用是使用滤波器组进行子带分割,好为后面做回声消除做准备,本篇文章将会先介绍SplittingFilter代码的整体流程,包括创建,分割,和合并的业务代码。然后再进一步的介绍算法是如何实现以及实现细节。
|版本声明:山河君,未经博主允许,禁止转载
一、频带分割过程
1.SplittingFilter的创建
从最开始的地方来看一下SplittingFilter
的创建时机,如果读者对于webrtc有一定了解,那么应该知道对于音频数据是存储在AudioBuffer
中的,AudioBuffer
创建时同时会创建SplittingFilter
。
AudioBuffer::AudioBuffer(size_t input_rate,
size_t input_num_channels,
size_t buffer_rate,
size_t buffer_num_channels,
size_t output_rate,
size_t output_num_channels)
: input_num_frames_(static_cast<int>(input_rate) / 100),
......
num_bands_(NumBandsFromFramesPerChannel(buffer_num_frames_)), //判断需要划分的频带
{
.....
if (num_bands_ > 1) { //判断是否创建SplittingFilter
split_data_.reset(new ChannelBuffer<float>(
buffer_num_frames_, buffer_num_channels_, num_bands_));
splitting_filter_.reset(new SplittingFilter(
buffer_num_channels_, num_bands_, buffer_num_frames_));
}
}
根据上文提到的,对于语音信号,人声区间应该在0~8kHz,即:
- 对于32k采样率应该划分为 0 ~ 8kHz, 8 ~ 16kHz
- 对于48k采样率划分到 0 ~ 8kHz, 8 ~ 16kHz,16 ~ 24kHz。
所以对于16k采样率即不用再划分频带,而NumBandsFromFramesPerChannel(buffer_num_frames_)
就是通过采样率判断需要划分的频带区间。
2.频带分割整体流程
1)分割时机
AudioBuffer
会在合适的时机进行频带分割,例如近端信号会在以下条件(如开启高通滤波器,降噪等)满足时进行分割:
bool AudioProcessingImpl::SubmoduleStates::CaptureMultiBandProcessingActive(
bool ec_processing_active) const {
return high_pass_filter_enabled_ || mobile_echo_controller_enabled_ ||
noise_suppressor_enabled_ || adaptive_gain_controller_enabled_ ||
(echo_controller_enabled_ && ec_processing_active);
}
if (submodule_states_.CaptureMultiBandSubModulesActive() &&
SampleRateSupportsMultiBand(
capture_nonlocked_.capture_processing_format.sample_rate_hz())) {
capture_buffer->SplitIntoFrequencyBands();
}
其中SplitIntoFrequencyBands
接口就是开启进行分割的入口
void AudioBuffer::SplitIntoFrequencyBands() {
splitting_filter_->Analysis(data_.get(), split_data_.get());
}
2)分割规则
SplittingFilter
根据频带选择是分割成 0 ~ 8kHz, 8 ~ 16kHz还是分割成三份 0 ~ 8kHz, 8 ~ 16kHz,16 ~ 24kHz
void SplittingFilter::Analysis(const ChannelBuffer<float>* data,
ChannelBuffer<float>* bands) {
.....
if (bands->num_bands() == 2) {
TwoBandsAnalysis(data, bands); // 0 ~ 8kHz, 8 ~ 16kHz
} else if (bands->num_bands() == 3) {
ThreeBandsAnalysis(data, bands); // 0 ~ 8kHz, 8 ~ 16kHz,16 ~ 24kHz
}
}
3)分割核心代码
假如分割成两份,那么核心的代码就是如下:
void SplittingFilter::TwoBandsAnalysis(const ChannelBuffer<float>* data,
ChannelBuffer<float>* bands) {
RTC_DCHECK_EQ(two_bands_states_.size(), data->num_channels());
RTC_DCHECK_EQ(data->num_frames(), kTwoBandFilterSamplesPerFrame);
for (size_t i = 0; i < two_bands_states_.size(); ++i) {
std::array<std::array<int16_t, kSamplesPerBand>, 2> bands16;
std::array<int16_t, kTwoBandFilterSamplesPerFrame> full_band16;
FloatS16ToS16(data->channels(0)[i], full_band16.size(), full_band16.data());
WebRtcSpl_AnalysisQMF(full_band16.data(), data->num_frames(),
bands16[0].data(), bands16[1].data(),
two_bands_states_[i].analysis_state1,
two_bands_states_[i].analysis_state2);
S16ToFloatS16(bands16[0].data(), bands16[0].size(), bands->channels(0)[i]);
S16ToFloatS16(bands16[1].data(), bands16[1].size(), bands->channels(1)[i]);
}
}
- 核心代码过程:
FloatS16ToS16
对于多通道的情况,先取一个通道的数据,将其从FloatS16转为int16WebRtcSpl_AnalysisQMF
进行频带分割,将两个频带音频数据分别放入bands16[0]
和bands16[1]
,two_bands_states_[i].analysis_state
存储不同通道,两个频带的状态S16ToFloatS16
再将两个频带的数据恢复为float类型
3.频带合并
有频带分割,当然可以把分割的频带合并为原始音频,过程和分割步骤相似
void SplittingFilter::TwoBandsSynthesis(const ChannelBuffer<float>* bands,
ChannelBuffer<float>* data) {
RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size());
RTC_DCHECK_EQ(data->num_frames(), kTwoBandFilterSamplesPerFrame);
for (size_t i = 0; i < data->num_channels(); ++i) {
std::array<std::array<int16_t, kSamplesPerBand>, 2> bands16;
std::array<int16_t, kTwoBandFilterSamplesPerFrame> full_band16;
FloatS16ToS16(bands->channels(0)[i], bands16[0].size(), bands16[0].data());
FloatS16ToS16(bands->channels(1)[i], bands16[1].size(), bands16[1].data());
WebRtcSpl_SynthesisQMF(bands16[0].data(), bands16[1].data(),
bands->num_frames_per_band(), full_band16.data(),
two_bands_states_[i].synthesis_state1,
two_bands_states_[i].synthesis_state2);
S16ToFloatS16(full_band16.data(), full_band16.size(), data->channels(0)[i]);
}
}
二、算法实现
1.实现原理介绍
webrtc的SplittingFilter源码中主要应用到以下的原理:
- QMF正交镜像滤波器:语音信号处理二十九——QMF滤波器设计原理与实现
- 多项分解和noble恒等式:语音信号处理三十——高效多相抽取器
- 全通QMF滤波器:webrtc之子带分割上——All-pass QMF滤波器
webrtc中是采用三个一阶全通滤波器实现的all pass qmf,它的整体结构如下:
输入: x[n]
|
↓
Even x[2n] → A1(z) -→ A2(z) -→ A3(z) ──┐
│ ↓ half_band_low[n]
Odd x[2n+1] → A1(z) -→ A2(z) -→ A3(z) ─┘
↓ half_band_high[n]
[用于语音处理]
合成:
↑
x̂[2n] = low + high
x̂[2n+1] = low - high
2.All pass QMF系统源码
void WebRtcSpl_AnalysisQMF(const int16_t* in_data, size_t in_data_length,
int16_t* low_band, int16_t* high_band,
int32_t* filter_state1, int32_t* filter_state2)
{
size_t i;
int16_t k;
int32_t tmp;
int32_t half_in1[kMaxBandFrameLength];
int32_t half_in2[kMaxBandFrameLength];
int32_t filter1[kMaxBandFrameLength];
int32_t filter2[kMaxBandFrameLength];
const size_t band_length = in_data_length / 2;
RTC_DCHECK_EQ(0, in_data_length % 2);
RTC_DCHECK_LE(band_length, kMaxBandFrameLength);
// Split even and odd samples. Also shift them to Q10.
for (i = 0, k = 0; i < band_length; i++, k += 2)
{
half_in2[i] = ((int32_t)in_data[k]) * (1 << 10);
half_in1[i] = ((int32_t)in_data[k + 1]) * (1 << 10);
}
// All pass filter even and odd samples, independently.
WebRtcSpl_AllPassQMF(half_in1, band_length, filter1,
WebRtcSpl_kAllPassFilter1, filter_state1);
WebRtcSpl_AllPassQMF(half_in2, band_length, filter2,
WebRtcSpl_kAllPassFilter2, filter_state2);
// Take the sum and difference of filtered version of odd and even
// branches to get upper & lower band.
for (i = 0; i < band_length; i++)
{
tmp = (filter1[i] + filter2[i] + 1024) >> 11;
low_band[i] = WebRtcSpl_SatW32ToW16(tmp);
tmp = (filter1[i] - filter2[i] + 1024) >> 11;
high_band[i] = WebRtcSpl_SatW32ToW16(tmp);
}
}
1)提高精度
根据整体结构,首先是将信号分为奇偶信号,具体代码如下:
for (i = 0, k = 0; i < band_length; i++, k += 2)
{
half_in2[i] = ((int32_t)in_data[k]) * (1 << 10);
half_in1[i] = ((int32_t)in_data[k + 1]) * (1 << 10);
}
值得注意的是* (1 << 10)
是为了提高计算精度,webrtc中常用Q10 格式的定点数(左移 10 位),进行高精度的乘法和累加计算,避免精度损失。
2)经过串联全通滤波器
WebRtcSpl_AllPassQMF(half_in1, band_length, filter1,
WebRtcSpl_kAllPassFilter1, filter_state1);
WebRtcSpl_AllPassQMF(half_in2, band_length, filter2,
WebRtcSpl_kAllPassFilter2, filter_state2);
其中:
half_in1
:偶信号half_in2
:奇信号filter1,filter2
:滤波后信号WebRtcSpl_kAllPassFilter1
:偶信号滤波器系数WebRtcSpl_kAllPassFilter2
:奇信号滤波器系数filter_state1,filter_state2
:保存滤波器状态,实现保持多个数据块(frame)之间连续
具体内部实现下文中介绍
3)分离后的低频高频
根据All pass QMF滤波器设计:
H 0 ( z ) = 1 2 [ A ( z ) + A ( − z ) ] H 1 ( z ) = 1 2 [ A ( z ) − A ( − z ) ] H_0(z)=\frac{1}{2}[A(z)+A(-z)] \\ H_1(z)=\frac{1}{2}[A(z)-A(-z)] H0(z)=21[A(z)+A(−z)]H1(z)=21[A(z)−A(−z)]
滤波后得到低频高频信号:
for (i = 0; i < band_length; i++)
{
tmp = (filter1[i] + filter2[i] + 1024) >> 11;
low_band[i] = WebRtcSpl_SatW32ToW16(tmp);
tmp = (filter1[i] - filter2[i] + 1024) >> 11;
high_band[i] = WebRtcSpl_SatW32ToW16(tmp);
}
这里右移11位而不是10位,就是All pass QMF滤波器设计中要除以2
3.滤波算法实现
1)滤波器系数
从上文中,对于低频子带和高频子带中三个全通滤波器系数分别是:
// QMF filter coefficients in Q16.
static const uint16_t WebRtcSpl_kAllPassFilter1[3] = {6418, 36982, 57261};
static const uint16_t WebRtcSpl_kAllPassFilter2[3] = {21333, 49062, 63010};
项目 | 内容 |
---|---|
WebRtcSpl_kAllPassFilter1 |
第一组三级一阶全通滤波器的 Q16 系数,用于构造低子带 |
WebRtcSpl_kAllPassFilter2 |
第二组三级一阶全通滤波器的 Q16 系数,用于构造高子带 |
格式 | Q16,表示 v a l u e 65536 \frac{value}{65536} 65536value |
来源 | 来源于经典 QMF 结构设计,用于语音信号的二分频滤波处理(Subband splitting) |
通过matlab画出来,其 A ( z ) A(z) A(z)和 A ( − z ) A(-z) A(−z)的相频响应是如下图中所示:
2)全通滤波器串联
void WebRtcSpl_AllPassQMF(int32_t* in_data, size_t data_length,
int32_t* out_data, const uint16_t* filter_coefficients,
int32_t* filter_state)
{
// The procedure is to filter the input with three first order all pass filters
// (cascade operations).
//
// a_3 + q^-1 a_2 + q^-1 a_1 + q^-1
// y[n] = ----------- ----------- ----------- x[n]
// 1 + a_3q^-1 1 + a_2q^-1 1 + a_1q^-1
//
// The input vector |filter_coefficients| includes these three filter coefficients.
// The filter state contains the in_data state, in_data[-1], followed by
// the out_data state, out_data[-1]. This is repeated for each cascade.
// The first cascade filter will filter the |in_data| and store the output in
// |out_data|. The second will the take the |out_data| as input and make an
// intermediate storage in |in_data|, to save memory. The third, and final, cascade
// filter operation takes the |in_data| (which is the output from the previous cascade
// filter) and store the output in |out_data|.
// Note that the input vector values are changed during the process.
size_t k;
int32_t diff;
// First all-pass cascade; filter from in_data to out_data.
// Let y_i[n] indicate the output of cascade filter i (with filter coefficient a_i) at
// vector position n. Then the final output will be y[n] = y_3[n]
// First loop, use the states stored in memory.
// "diff" should be safe from wrap around since max values are 2^25
// diff = (x[0] - y_1[-1])
diff = WebRtcSpl_SubSatW32(in_data[0], filter_state[1]);
// y_1[0] = x[-1] + a_1 * (x[0] - y_1[-1])
out_data[0] = WEBRTC_SPL_SCALEDIFF32(filter_coefficients[0], diff, filter_state[0]);
// For the remaining loops, use previous values.
for (k = 1; k < data_length; k++)
{
// diff = (x[n] - y_1[n-1])
diff = WebRtcSpl_SubSatW32(in_data[k], out_data[k - 1]);
// y_1[n] = x[n-1] + a_1 * (x[n] - y_1[n-1])
out_data[k] = WEBRTC_SPL_SCALEDIFF32(filter_coefficients[0], diff, in_data[k - 1]);
}
// Update states.
filter_state[0] = in_data[data_length - 1]; // x[N-1], becomes x[-1] next time
filter_state[1] = out_data[data_length - 1]; // y_1[N-1], becomes y_1[-1] next time
// Second all-pass cascade; filter from out_data to in_data.
// diff = (y_1[0] - y_2[-1])
diff = WebRtcSpl_SubSatW32(out_data[0], filter_state[3]);
// y_2[0] = y_1[-1] + a_2 * (y_1[0] - y_2[-1])
in_data[0] = WEBRTC_SPL_SCALEDIFF32(filter_coefficients[1], diff, filter_state[2]);
for (k = 1; k < data_length; k++)
{
// diff = (y_1[n] - y_2[n-1])
diff = WebRtcSpl_SubSatW32(out_data[k], in_data[k - 1]);
// y_2[0] = y_1[-1] + a_2 * (y_1[0] - y_2[-1])
in_data[k] = WEBRTC_SPL_SCALEDIFF32(filter_coefficients[1], diff, out_data[k-1]);
}
filter_state[2] = out_data[data_length - 1]; // y_1[N-1], becomes y_1[-1] next time
filter_state[3] = in_data[data_length - 1]; // y_2[N-1], becomes y_2[-1] next time
// Third all-pass cascade; filter from in_data to out_data.
// diff = (y_2[0] - y[-1])
diff = WebRtcSpl_SubSatW32(in_data[0], filter_state[5]);
// y[0] = y_2[-1] + a_3 * (y_2[0] - y[-1])
out_data[0] = WEBRTC_SPL_SCALEDIFF32(filter_coefficients[2], diff, filter_state[4]);
for (k = 1; k < data_length; k++)
{
// diff = (y_2[n] - y[n-1])
diff = WebRtcSpl_SubSatW32(in_data[k], out_data[k - 1]);
// y[n] = y_2[n-1] + a_3 * (y_2[n] - y[n-1])
out_data[k] = WEBRTC_SPL_SCALEDIFF32(filter_coefficients[2], diff, in_data[k-1]);
}
filter_state[4] = in_data[data_length - 1]; // y_2[N-1], becomes y_2[-1] next time
filter_state[5] = out_data[data_length - 1]; // y[N-1], becomes y[-1] next time
}
其实算法实现已经在注释里了,这里对于实现进行介绍,总共有三级,都是一阶全通滤波器:
// a_3 + q^-1 a_2 + q^-1 a_1 + q^-1
// y[n] = ----------- ----------- ----------- x[n]
// 1 + a_3q^-1 1 + a_2q^-1 1 + a_1q^-1
//
根据一阶全通滤波器的传递函数:
H ( q ) = a + q − 1 1 + a q − 1 ⟹ ( 1 + a q − 1 ) y [ n ] = ( a + q − 1 ) x [ n ] ⟹ y [ n ] + a y [ n − 1 ] = a x [ n ] + x [ n − 1 ] ⟹ y [ n ] = x [ n − 1 ] + a ⋅ ( x [ n ] − y [ n − 1 ] ) H(q)=\frac{a+q^{-1}}{1+aq^{-1}} \\ \Longrightarrow (1+aq^{-1})y[n]=(a+q^{-1})x[n] \\ \Longrightarrow y[n] +ay[n-1]=ax[n]+x[n-1] \\ \Longrightarrow y[n]=x[n-1]+a\cdot(x[n]-y[n-1]) H(q)=1+aq−1a+q−1⟹(1+aq−1)y[n]=(a+q−1)x[n]⟹y[n]+ay[n−1]=ax[n]+x[n−1]⟹y[n]=x[n−1]+a⋅(x[n]−y[n−1])
所以对应代码实现:
- 每一个全通滤波器的初始状态使用的是上一次的frame对应全通滤波器的输入输出,例如:
filter_state[0]
: x [ − 1 ] x[-1] x[−1]filter_state[1]
: y [ − 1 ] y[-1] y[−1]
- 计算 x [ n ] − y [ n − 1 ] x[n]-y[n-1] x[n]−y[n−1]:
diff = WebRtcSpl_SubSatW32(in_data[0], filter_state[1]);
- 计算 y [ n ] y[n] y[n]:
out_data[0] = WEBRTC_SPL_SCALEDIFF32(filter_coefficients[0], diff, filter_state[0]);
总结
本篇博客中介绍了SplittingFilter
的作用和使用它的意义,通过对于webrtc源代码的解读对于频带分割的流程做了深入介绍,需要结合上一篇对于All pass QMF原理的文章一起看才能理解为何要如此设计。
如果对您有所帮助,请帮忙点个赞吧!