Codec2

Codec2 is a low-bitrate speech audio codec (speech coding) that is patent free and open source.^[1] Codec2 compresses speech using sinusoidal coding, a method specialized for human speech. Bit rates of 3200 to 700 bit/s have been successfully created. Codec2 was designed to be used for amateur radio and other high compression voice applications.

Overview

The codec was developed by David Rowe (Amateur Radio Call-Sign VK5DGR), with support and cooperation of other researchers (e.g., Jean-Marc Valin from Speex).^[2] Codec2 uses sinusoidal coding to model speech. In sinusoidal coding, spoken audio is recreated by modelling speech as a sum of harmonically related sine waves with independent amplitudes called Line spectral pairs, or LSP. The fundamental frequency of the speaker's voice (pitch) and the amplitude (energy) of the harmonics is encoded, and with the LSP's are exchanged across a channel in a digital format. The LSP coefficients represent the Linear Predictive Coding (LPC) model in the frequency domain, and lend themselves to a robust and efficient quantisation of the LPC parameters.^[3]

Codec2 consists of 3200, 2400, 1600, 1400, 1300, 1200, and 700 bit/s codec modes. It outperforms most other low-bitrate speech codecs. For example, it uses half the bandwidth of Advanced Multi-Band Excitation to encode speech with similar quality. The speech codec uses 16-bit PCM sampled audio, and outputs packed digital bytes. Likewise, you send it packed digital bytes, and it outputs PCM sampled audio. The audio sample rate is fixed at 8 kHz. Internally, the codec algorithms operate on 10 ms PCM frames, with each of these audio segments declared voiced (vowel) or unvoiced (consonant).

The digital bytes are in a bit-field format that have been packed together into bytes. These bit-fields are also optionally gray coded before being grouped together. The gray coding may be useful if sending raw, but normally an application will just burst the bit-fields out. The bit-fields make-up the various parameters that are stored or exchanged (pitch, energy, voicing booleans, LSP's, etc).

For example, Mode 3200, has 20 ms of audio converted to 64 Bits. So 64 Bits will be output every 20 ms (50 times a second), for a minimum data rate of 3200 bits/sec. These 64 bits are sent as 8 bytes to the application, which has to unwrap the bit-fields, or send the bytes over a data channel.

Another example is Mode 1300, which is sent 40 ms of audio, and outputs 52 Bits every 40 ms (25 times a second), for a minimum rate of 1300 bits/sec. These 52 bits are sent as 7 bytes to the application or data channel.

The codec software is open source and is freely available in a subversion (SVN) repository.^[4] The source code is released under LGPL Version 2.1^[5] It has been tested on Linux and MS Windows.

The codec has been presented in various conferences and has received the 2012 ARRL Technical Innovation Award,^[6] and the Linux Australia Conference's Best Presentation Award.^[7]

Non-Coherent PSK

Rowe has also created a frequency-division multiplex (FDM) modem which carries the digital voice (DV) in only 1.3 kHz of radio bandwidth.^[8] The codec and FDM modem are used every day on amateur radio shortwave bands using both the SM1000 hardware implementation, and the FreeDV application.

This modem operates at 50 Baud with a bit-rate of 1600 bps. This is sent using sixteen QPSK FDM carriers (2-bits each), or 32-bits 50 times a second. 64-bits are needed to make a vocoder frame, thus it has a 25 Hz effective rate. The 64-bits contains 52-bits of vocoder data, and 12 bits of Forward Error Correction (Golay). Thus an effective 1300 bps is used for the vocoder. A separate BPSK carrier is sent in the middle of the spectrum (1500 Hz) for synchronization.

The ITU emission designation is J2E for phone payload, and J2D for data payload.

Coherent PSK

A new FDM modem waveform was developed for the 700 bps vocoder. This modem operates with a symbol rate of 75 Baud, using Coherent Quadrature Phase-Shift Keying (QPSK) with seven sub-carriers. A duplicate set of sub-carriers are used as a diversity channel. This diversity channel is used to combat the effects of fading with shortwave propagation. The modem will still perform well with a +/- 40 Hz tuning error.

The FDM modem sends and receives a row of sub-carriers 75 times a second. However, it takes six of these rows to make-up a modem frame. First, two pilot reference-phase rows (28-bits), then two speech vocoder rows (28-bits), and finally two more rows for the second speech vocoder frame (28-bits). The process then repeats as long as the transmitter Push-To-Talk (PTT) is keyed.

Thus, a modem frame is 84-bits total. 56-bits bits are used for speech, and 28 bits are used for the reference-phase pilots. These pilots are what makes this a coherent modem. They are used to correct the received data bit phases. The data rate is 1050 bps (75 Baud x 14-bits). The effective data rate is 700 bps (75 Baud / 6 or 12.5 Baud x 56-bits). Each row of 14-bits is sent as seven QPSK carriers (2-bits each carrier).

The modem timings are also relevant, in that each speech vocoder frame outputs 28-bits every 40 ms. Since the modem has an 80 ms modem frame, it can transport two speech vocoder frames.

There are 100 complex IQ (In-Phase and Quadrature-Phase) audio samples for each row, at a 7500 Hz rate. 600 samples total for the modem frame. Thus, 100 x 6 * 12.5 equals the 7500 Hz sample rate. Using a rate conversion filter, the application is provided an 8 kHz interface, which is much more compatible with sound cards. There are 640 complex audio samples at the 8 kHz rate. This rate conversion would not be necessary in firmware.

The FDM modem operates with a center frequency of 1500 Hz. The initial FDM sub-carrier frequencies are set using a spreading function. This changes the spacing of each sub-carrier a little bit more each sub-carrier further to the left. About 105 Hz apart on the right, to about 109 Hz apart on the left. This design, along with spectrum clipping, improves the Peak to Average Power Ratio (PAPR). The measured Crest factor is about 8.3 dB with clipping, and about 10.3 dB without clipping.

The FDM modem waveform consumes a different amount of bandwidth, depending on whether the diversity channel is enabled. About 700 Hz per group of seven sub-carriers. Normally you would want to use diversity on shortwave, but optionally on VHF and above.

The ITU emission designation is J2E for phone payload, and J2D for data payload.

Adoption

Codec 2 is currently used in several radios and Software Defined Radio Systems

FreeDV^[9]
FlexRadio 6000 series^[10]
SM1000^[11]

References

External links

Multimedia compression and container formats

Video
compression

ISO/IEC	MJPEG Motion JPEG 2000 MPEG-1 MPEG-2 Part 2 MPEG-4 Part 2/ASP Part 10/AVC MPEG-H Part 2/HEVC

ITU-T	H.120 H.261 H.262 H.263 H.264 H.265

SMPTE	VC-1 VC-2 VC-3 VC-5

Others	Apple Video AV1 AVS Bink Cinepak Daala Dirac DV DVI FFV1 Huffyuv Indeo Lagarith Microsoft Video 1 MSU Lossless OMS Video Pixlet ProRes 422 ProRes 4444 QuickTime Animation Graphics RealVideo RTVideo SheerVideo Smacker Sorenson Video, Spark Theora Thor VP3 VP6 VP7 VP8 VP9 WMV XEB YULS

Audio
compression

ISO/IEC	MPEG-1 Layer III (MP3) MPEG-1 Layer II Multichannel MPEG-1 Layer I AAC HE-AAC AAC-LD MPEG Surround MPEG-4 ALS MPEG-4 SLS MPEG-4 DST MPEG-4 HVXC MPEG-4 CELP MPEG-D USAC MPEG-H 3D Audio

ITU-T	G.711 (A-law, µ-law) G.718 G.719 G.722 G.722.1 G.722.2 G.723 G.723.1 G.726 G.728 G.729 G.729.1

IETF	Opus iLBC

3GPP	AMR AMR-WB AMR-WB+ EVRC EVRC-B GSM-HR GSM-FR GSM-EFR

Others	ACELP AC-3 ALAC Asao ATRAC CELT Codec2 DRA DTS FLAC iSAC Monkey's Audio TTA True Audio MT9 Musepack OptimFROG OSQ QCELP RCELP RealAudio RTAudio SD2 SHN SILK Siren SMV Speex SVOPC TwinVQ VMR-WB Vorbis VSELP WavPack WMA MQA aptX

Image
compression

IEC, ISO, ITU-T, W3C, IETF	CCITT Group 4 GIF HEVC JBIG JBIG2 JPEG JPEG 2000 JPEG XR Lossless JPEG PNG TIFF TIFF/EP TIFF/IT

Others	APNG BPG DjVu EXR FLIF ICER MNG PGF QTVR WBMP WebP

Containers

ISO/IEC	MPEG-ES MPEG-PES MPEG-PS MPEG-TS ISO base media file format MPEG-4 Part 14 (MP4) Motion JPEG 2000 MPEG-21 Part 9 MPEG media transport

ITU-T	H.222.0 T.802

IETF	RTP

Others	3GP and 3G2 AMV ASF AIFF AVI AU BPG Bink Smacker BMP DivX Media Format EVO Flash Video GXF IFF M2TS Matroska WebM MXF Ogg QuickTime File Format RatDVD RealMedia RIFF WAV MOD and TOD VOB, IFO and BUP

Collaborations

See Compression methods for methods and Compression software for codecs

Data compression software

Archivers with
compression
(comparison)

Free software	7-Zip Archive Manager Ark Expander FreeArc Info-ZIP KGB Archiver PAQ PeaZip The Unarchiver (decompression only) tar Xarchiver Zipeg ZPAQ

Freeware	Filzip LHA StuffIt Expander (decompression only) TUGZip ZipGenius

Commercial	ARC ALZip Archive Utility ARJ BetterZip BulkZip JAR MacBinary PKZIP/SecureZIP PowerArchiver StuffIt WinAce WinRAR WinZip

Non-archiving
compressors

Generic	bzip2 compress gzip lzip lzop pack rzip Snappy XZ Utils

For code	UPX

Audio
compression
(comparison)

Lossy	Fraunhofer FDK AAC Nero AAC Codec Freeware Advanced Audio Coder (FAAC) Helix DNA Producer l3enc LAME TooLAME libavcodec libcelt libopus libspeex Musepack libvorbis Windows Media Encoder

Lossless	ALAC FLAC libavcodec Monkey's Audio mp4als OptimFROG Shorten TTA (True Audio) WavPack

Video
compression
(comparison)

Lossy

MPEG-4 ASP	3ivx DivX Nero Digital FFmpeg HDX4 Xvid

H.264 / MPEG-4 AVC	CoreAVC Blu-code DivX FFmpeg Nero Digital OpenH264 QuickTime x264

HEVC	DivX x265

Others	CineForm Cinepak Daala DNxHD Helix DNA Producer Indeo libavcodec Schrödinger (Dirac) SBC Sorenson VP7 libtheora libvpx Windows Media Encoder

Lossless

See also: compression methods and compression formats

This article is issued from Wikipedia - version of the 8/11/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.