ftz.Lyberta.net

Modern C++ goodness

ftz Unicode

Light-weight Unicode library

A collection of useful utilities to handle Unicode data. There are 2 types of interface: simple - this interface can be used if you use the following types:

  • std::string
  • std::u16string
  • std::u32string
  • std::string_view
  • std::u16string_view
  • std::u32string_view

And advanced if you use any other type.

Check - quickly check if string is well-formed.

Simple

std::string string{ ... };
ftz::Unicode::IsValidCodePointSequence(string); // Will throw std::domain_error if string is not valid.

Advanced

std::vector<unsigned short> buffer{ ... }; // Assuming vector contains UTF-16 code units.
// Will throw std::domain_error if buffer contains invalid code points.
ftz::Unicode::Check<ftz::Unicode::UTF16>::IsValidCodePointSequence(buffer);

Convert - straightforward conversion functions.

Simple

std::u16string utf16string{ ... };
std::string utf8string = ftz::Unicode::ToUTF8(utf16string);

Advanced

using UTF16Buffer = ... // Custom buffer that contains UTF-16 code units.
using UTF8Buffer = ... // Custom buffer that contains UTF-8 code units.
UTF16Buffer inputbuffer{ ... };
UTF8Buffer utf8buffer = ftz::Unicode::Convert<ftz::Unicode::UTF16,
	ftz::Unicode::UTF8>::template CodeUnits<UTF8Buffer>(utf16buffer);

CodePointSequence - treat your strings as sequences of code points instead of code units.

Simple

// Sequence will hold UTF-8 code units.
ftz::Unicode::CodePointSequence sequence{u8"тест"s};
for (const auto& codepoint : sequence)
{
	std::cout << codepoint << ' ';
}

Advanced

std::vector<unsigned short> buffer{ ... }; // Assuming vector contains UTF-16 code units.
ftz::Unicode::CodePointSequence<std::vector<unsigned short>,
	ftz::Unicode::UTF16> sequence{std::move(buffer)};
for (const auto& codepoint : sequence)
{
	std::cout << codepoint << ' ';
}

CodePointSequenceView - constant non-owning view into a code point sequence.

Simple

auto view = ftz::Unicode::MakeCodePointSequenceView(u8"тест");
for (const auto& codepoint : sequence)
{
	std::cout << codepoint << ' ';
}

Advanced

std::vector<unsigned short> buffer{ ... }; // Assuming vector contains UTF-16 code units.
// Assuming array_view is a constant non-owning view into array.
array_view<unsigned short> view{buffer.data(), buffer.size()};
ftz::Unicode::CodePointSequenceView<array_view<unsigned short>,
	ftz::Unicode::UTF16> sequence{view};
for (const auto& codepoint : sequence)
{
	std::cout << codepoint << ' ';
}

Key features

  • Cross platform - written in ISO C++17 + Concepts TS (+ #pragma once).
  • Header only - no need to build and worry about compiler settings.
  • Is not hardcoded to std::basic_string - you can use your own containers of code units.
  • Free software - released under the terms of GNU GPLv3 or any later version.

Dependencies

Compiler support

  • G++ 8 or newer with libstdc++

How to get

You need to use Conan to install and use this library. To install Conan on APT-based distros you would typically do:

# apt install python-pip
$ pip install conan

Then add official ftz repository:

$ conan remote add ftz https://conan.ftz.lyberta.net

Then you need to follow a Conan tutorial to declare that your project depends on this library, for example, using conanfile.txt:

[requires]
ftzUnicode/Latest@Lyberta/Latest

Or using conanfile.py:

build_requires = "ftzUnicode/Latest@Lyberta/Latest"

The rest depends on your build system. In CMake you would do:

# Adding Conan dependencies.
include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake)
conan_basic_setup(TARGETS)

# Adding example executable target.
add_executable(Example ...)

# Specifying libraries to link executable to.
target_link_libraries(Example PRIVATE CONAN_PKG::ftzUnicode)

After that in your source code you include files like this:

#include <ftz/Unicode/SomeHeader.h>

To build your code, remember to execute conan install before calling conan build or cmake so your dependencies will be set correctly.

Source code

API reference

Manually packaging the library

Packaging dependencies:

  • Git
  • Conan

To install dependencies on Debian Testing you would invoke:

# apt install git python-pip
$ pip install conan

Clone the repository and switch into it:

$ git clone https://gitlab.com/ftz/unicode.git
$ cd unicode

Build the Conan package with your specified user and channel:

$ conan create . User/Channel