Form-Genuine Printf for C

120
Form-Genuine Printf for C

This uses macro magic, compound literals, and _Generic to recall
printf() to the subsequent degree: form-noteworthy printing, printing into compound
literal char arrays, easy UTF-8, -16, and -32, with lawful error
handling.

The goal is to be noteworthy by taking out the need for characteristic varargs.

The similar outdated C printf formatting syntax is passe, with some restrictions
and quite a couple of extensions.

This let’s you mix UTF-8, -16, -32 strings seamlessly in input and
output strings, without manual string format conversions, and without
the usage of different format specifiers or print characteristic names.

This liberates you from obsessed with %u vs. %lu vs. %llu
vs. %zu, even in portable code with different integer forms: the
compiler choses the lawful characteristic to name for your parameter, and
they all print sexy with %u. (Even strings will print sexy with
%u.)

‘Form-noteworthy’ on this context does no longer mean that you just get extra bring collectively
errors, nonetheless that the format specifier does no longer must specify the
argument form, nonetheless appropriate defines the print format. Genuinely, format
strings with this library can fill less bring collectively-time checking (namely
none) than with modern compilers for typical printf. This form
is nonetheless safer: with this library, you appropriate cannot toddle the injurious
size parameter and smash.

Compatibility

This library requires as a minimum a C11 compiler (for _Generic,
char16_t, char32_t), and it uses a couple of gcc extensions that are
moreover understood by Clang and any other compilers (({...}),
,##__VA_ARGS__, __typeof__, __attribute__).

Synopsis

Within the following, Char would be char, char16_t, or char32_t:

va_stream_file_t
VA_STREAM_FILE(FILE *f);

#include

Char va_snprintf(Char *s, size_t n, Char const *format, …);

Char va_szprintf(Char s[], Char const *format, …);

char va_nprintf(size_t n, Char const *format, …);

char16_t va_unprintf(size_t n, Char const *format, …);

char32_t va_Unprintf(size_t n, Char const *format, …);

va_stream_charp_t
VA_STREAM_CHARP(Char const *s, size_t n);

#include

char va_mprintf(void *(*alloc)(void *, size_t, size_t), Char const *, …)

char16_t va_umprintf(void *(*alloc)(void *, size_t, size_t), Char const *, …)

char32_t va_Umprintf(void *(*alloc)(void *, size_t, size_t), Char const *, …)

va_stream_vec_t
VA_STREAM_VEC(void *(*alloc)(void *, size_t, size_t));

va_stream_vec16_t
VA_STREAM_VEC16(void *(*alloc)(void *, size_t, size_t));

va_stream_vec32_t
VA_STREAM_VEC32(void *(*alloc)(void *, size_t, size_t));

#include

size_t
va_lprintf(Char const *format, …);

va_stream_len_t
VA_STREAM_LEN();

#include

va_stream_…t va_xprintf(va_stream_…t *s, Char const *format, …)

void
va_iprintf(va_stream_…t *s, Char const *format, …);

void
va_pprintf(va_stream_vtab_t *v, Char const *format, …);

#include

typedef struct { … } va_stream_t;

typedef struct { … } va_stream_vtab_t;

typedef struct { unsigned code; } va_error_t
#define VA_E_OK
#define VA_E_NULL
#define VA_E_DECODE
#define VA_E_ENCODE
#define VA_E_TRUNC

va_stream_t
VA_STREAM(va_stream_vtab_t const *vtab)”>

#encompass 

void
va_fprintf(FILE *f, Char const *format, ...);

void
va_printf(Char const *format, ...);

va_stream_file_t
VA_STREAM_FILE(FILE *f);


#encompass 

Char va_snprintf(Char *s, size_t n, Char const *format, ...);

Char va_szprintf(Char s[], Char const *format, ...);

char va_nprintf(size_t n, Char const *format, ...);

char16_t va_unprintf(size_t n, Char const *format, ...);

char32_t va_Unprintf(size_t n, Char const *format, ...);

va_stream_charp_t
VA_STREAM_CHARP(Char const *s, size_t n);


#encompass 

char va_mprintf(void *(*alloc)(void *, size_t, size_t), Char const *, ...)

char16_t va_umprintf(void *(*alloc)(void *, size_t, size_t), Char const *, ...)

char32_t va_Umprintf(void *(*alloc)(void *, size_t, size_t), Char const *, ...)

va_stream_vec_t
VA_STREAM_VEC(void *(*alloc)(void *, size_t, size_t));

va_stream_vec16_t
VA_STREAM_VEC16(void *(*alloc)(void *, size_t, size_t));

va_stream_vec32_t
VA_STREAM_VEC32(void *(*alloc)(void *, size_t, size_t));


#encompass 

size_t
va_lprintf(Char const *format, ...);

va_stream_len_t
VA_STREAM_LEN();


#encompass 

va_stream_...t va_xprintf(va_stream_...t *s, Char const *format, ...)

void
va_iprintf(va_stream_...t *s, Char const *format, ...);

void
va_pprintf(va_stream_vtab_t *v, Char const *format, ...);


#encompass 

typedef struct { ... } va_stream_t;

typedef struct { ... } va_stream_vtab_t;

typedef struct { unsigned code; } va_error_t
#provide an explanation for VA_E_OK
#provide an explanation for VA_E_NULL
#provide an explanation for VA_E_DECODE
#provide an explanation for VA_E_ENCODE
#provide an explanation for VA_E_TRUNC

va_stream_t
VA_STREAM(va_stream_vtab_t const *vtab)

Description

This library gives a form-noteworthy printing mechanism to print
any roughly string of frightening form char, char16_t, or char32_t,
or any integer or pointer into a brand unique string, an array, or a file.

The library moreover gives functions for user-outlined output streams
that can print into any other roughly poke.

The arguments to the formatted print are handed into a _Generic()
macro in preference to ‘…’ and the resulting characteristic name is thus
form-noteworthy in accordance with the true argument form, and can’t smash attributable to a
injurious format specifier.

The format specifiers on this printing mechanism inspire to provide an explanation for which
output format need to be passe, as they build no longer appear to be needed for form
data. The format specifier “%v” might perhaps likely even be passe as a generic
‘default’ output format.

Layout Specifiers

Indulge in in C, a format specifier begins with ‘%’ adopted by:

  • a listing of flag characters
  • a width specifier
  • a precision specifier
  • a listing of integer cowl and quotation specifiers
  • a conversion letter

The following flags are recognised:

  • # print in several form. For numeric format, a prefix to
    designate the frightening is prefixed to the worth with the exception of to 0:
    • for o and frightening 8, 0 is prefixed
    • for b and frightening 2, 0b is prefixed,
    • for B and frightening 2, 0B is prefixed,
    • for x and frightening 16, 0x is prefixed,
    • for X and frightening 16, 0X is prefixed,
    • for e and frightening 32, 0e is prefixed,
    • for E and frightening 32, 0E is prefixed.

For quoted strings, this inhibits printing of delimiting quotes.

  • 0 pads numerics with zero 0 on the left quite than
    with a apartment character . If a precision is given, here is
    neglected.

    For C and JSON quotation, this selects to quote non-US-ASCII
    characters the usage of u and U in preference to printing them in
    output encoding.

  • - selects to left flush in preference to the default lawful flush.

  • (a apartment character U+0020) selects that a apartment is printed
    in entrance of obvious signed integers.

  • + selects that a + is printed in entrance of obvious signed
    integers.

  • = specifies that the final worth is printed again the usage of this
    unique format specifier. Right here is meager substitute for the $
    situation specifiers that are no longer utilized on this library.

A width is either a decimal integer, or a *. The * selects
that the width is taken from the subsequent characteristic parameter. If fewer
code aspects result from the conversion, the output is padded with
white apartment up the width. A negative width is intepreted as
a - flag adopted by a obvious width.

A precision is specified by a . (length) adopted by either a
decimal integer or a *. The * selects that the width is taken
from the subsequent characteristic parameter. If the precision is appropriate .,
it’s interpreted as zero. The precision defines the minimum number
of digits in numeric conversions. For strings, here is the maximum
different of raw code gadgets read from the input string (no longer the number
of converted code aspects, nonetheless the low-degree different of parts
within the string, so as that non-NUL terminated arrays might perhaps likely even be printed
with their size handed as precision, even with multi-byte/multi-phrase
encodings kept internal. The input decoder will no longer read incomplete
encodings at the stop of restricted strings, nonetheless will quit forward of. If
a pointer to a string pointer is handed, then the
pointer could be up up to now so as that it aspects to the subsequent character, i.e.,
the one after the final one who change into as soon as read.

The following integer cowl and quotation specifiers are recognised:

  • h applies the cowl 0xffff to an integer, then zero extends unsigned
    values, or stamp extends signed values.
    E.g., va_printf("%#hx",0xabcdef) prints -0x3211.

  • hh applies the cowl 0xff to an integer, then zero extends unsigned
    values, or stamp extends signed values.
    E.g., va_printf("%hhX",0xabcdU) prints CD.

  • z reinterprets a signed integer as unsigned (mnemonic: zero
    extension). z is implicit in codecs u (and U).
    E.g., va_printf("%hhu", -1) prints 255.

  • q selects C quotation for strings and char format. There might perhaps be a
    separate half under to show this.

  • Q selects JSON quotation for strings and char format. There might perhaps be a
    separate half under to show this.

  • ample selects Bourne or Korn shell quotation. There might perhaps be a
    separate half under to show this.

Point to that a gigantic selection of the same outdated size specifiers (l, ll, and so on.) identified
from C get no sense and are no longer recognised (nor neglected), because
form casting withhold watch over in varargs is no longer needed here attributable to the
form-security.

The following conversion letters are recognised:

  • v prints one thing in default notation (mnemonic: ‘worth’).
    There are a gigantic selection of different unassigned letters that print in default
    notation. v is no longer passe by typical C printf and appears to be
    no longer going to be assigned any special notation.

  • o selects octal integer notation for numeric printing (at the side of
    pointers).

  • d or i selects decimal integer notation for numeric
    printing (at the side of pointers).

  • u is rather like zd, i.e., prints a signed integer as
    unsigned in decimal notation.

  • x or X selects hexadecimal integer notation for numeric
    printing (at the side of pointers). x uses lower case digits,
    X upper case. Point to that this moreover prints signed numbers with
    a - if appropriate: va_printf("%#x", -5) prints -0x5.
    There might perhaps be the z flag to print signed integers as unsigned.

  • b or B selects binary integer notation for numeric
    printing (at the side of pointers). b uses lower case prefix,
    B uses upper case. The adaptation is ideal visible
    with the # flag.

  • e or E selects Base32 notation the usage of the digits
    ‘a’..’z’,’2′..’7′. e uses lower case digits and prefix,
    E uses upper case.

  • p prints adore #x, and for any pointer, at the side of strings,
    prints the pointer worth in preference to the contents. Point to that
    it moreover prints signed numbers: va_print("%p",-5) prints -0x5.

  • P is appropriate adore p, nonetheless with upper case hexadecial digits.

  • c prints integers (nonetheless no longer pointers) as characters, adore a
    one-ingredient string. Point to that the NUL character is no longer printed,
    nonetheless behaves adore an empty string. For string quotation where
    hexadecimals are printed, this uses lower case characters.

  • C is appropriate adore c, nonetheless in string quotation when hexadecimals
    are printed, uses upper case characters.

  • t prints the argument form forward of adjustments, in C
    syntax: int8_..int64_t, uint8_t..uint64_t, char*,
    char16_t*, char32_t*, void*. Point to that va_error_t*
    arguments on no yarn print, and on no yarn eat a % format, nonetheless
    consistently appropriate return the poke error.

  • any letter no longer talked about above or any aggregate of letter and
    form no longer talked about above prints in default notation. If the letter
    is upper case, it uses upper case letters where appropriate.

Characteristic parameters within the again of the final format specifier within the format
string are printed in default notation after everything that is
printed within the format string.

Characteristic Parameters

The following characteristic parameter forms are recognised:

  • int, unsigned, char, signed char, unsigned char, short,
    unsigned short, long, unsigned long, long long,
    unsigned long long: these are integer and are printed
    in unsigned or signed decimal integer notation by default.

    Which skill that char, char16_t, and char32_t all print in
    numeric format by default, no longer in character format, as they build no longer appear to be
    obvious forms. For decoding them as a 1-ingredient Unicode
    codepoint string, c format need to be passe.

    Additionally demonstrate that character constants adore 'a' fill form int in C
    and print numerically by default.

  • char *, char const *: 8-bit character strings or
    arrays. They print as is by default.

    The default string encoding is UTF-8, It’ll also be reset to
    a user encoding by #defining va_char_p_decode. Additionally note
    the half on encoding under.

    Unquoted, NULL prints empty and gadgets the VA_E_NULL error.
    Additionally note the half on quotation under.

  • char16_t *, char16_t const *: 16-bit character strings or
    arrays. The default encoding is UTF-16, that is likely
    switched the usage of va_char16_p_decode.

    Unquoted, NULL prints empty and gadgets the VA_E_NULL error.

  • char32_t *, char32_t const *: 32-bit character strings or
    arrays. The default encoding is UTF-32, that is likely
    switched the usage of va_char32_p_decode.

    Unquoted, NULL prints empty and gadgets the VA_E_NULL error.

  • Char , Char const : pointers to pointers to
    characters, i.e., pointers to string, will print the string
    and then update the pointer to existing the code
    unit appropriate within the again of the final one who change into as soon as read from the
    string. And not utilizing a precision given within the format, they’re going to
    existing the terminating NUL character. When these
    parameters are printed a couple of instances the usage of the = flag,
    the string could be reset whenever and the up up to now worth
    will correspond to the stop situation for the length of the final print
    of the string.

  • va_error_t*: this retrieves the error code from the
    poke and writes it into the handed struct. This can
    be passe to test for encoding or decoding errors, out
    of memory stipulations, or hitting the stop of the output
    array. NULL must no longer be handed as a pointer.

  • va_read_iter_t*: here is an inner form to read from
    strings. There are quite a couple of constraints on
    provide an explanation for a legit va_read_iter_t, which would be no longer all
    documented here.

  • one thing: is tried to be converted to a pointer and
    printed in hexadecimal encoding by default, i.e., in %x
    format.

Unicode

Internally, this library uses 32-bit codepoints with 24-bit payload
and 8-bit tags for processing strings, and by default, the payload
representation is Unicode. The library tries now to no longer interpret the
payload data except obligatory, so as that other encodings might perhaps likely well in
precept be passe and handed via the library.

The suitable location the core library uses Unicode interpretation is when
quoting C or JSON strings for codepoints >0x80 (e.g., when formatting
with %0qs), and if a decoding error is encountered or if the worth
is no longer official Unicode, then it uses ufffd to illustrate this, because the
quotation the usage of u or U would in any other case be a lie.

The inner representation permits any worth internal 24 bits to be passe
for codepoints. 0 is interpreted as ‘stop of string’ and is on no yarn
printed into the output poke.

UTF-8, -16, and -32 encoders and decoders take a look at that the Unicode
constraints are met, adore rather than one thing above 0x10FFFF and excessive
and low UTF-16 surrogates, and detecting decoding errors in accordance with
the Unicode suggestions and most productive practices. The encoder/decoder
pairs usually strive to head via frightening sequences as is, if that you just might perhaps likely well likely imagine,
e.g., studying ISO-8859-1 data from an UTF-8 %s and printing it into
an UTF-8 output poke preserves the distinctive ISO-8859-1 byte
sequence, despite the undeniable reality that the intermediate steps attain lift ‘unlawful sequence’
errors.

Integers print without Unicode assessments, i.e., if an integer is printed
as a persona the usage of %c, then the lower 24 bits is handed the total model down to
the output poke encoder as is. If integers bigger than 0xffffff are
tried to be printed with %c, this leads to a decoding error, and
ideal the lower 24 bits are passe.

Encodings

The library helps different string encodings for the format string,
for input strings, and for output streams. The defaults are UTF-8,
UTF-16, or UTF-32. This might perhaps perhaps be switched by environment the following
#defines forward of at the side of headers of this library, i.e., it cannot be
switched dynamically out of the box, because this would mean that all
the encoding modules would consistently be linked. Dynamic switching can
be added by defining a brand unique encoding that internally switches dynamically.

The following #defines switch characteristic names:

Layout String Encoding

The default is UTF-8, -16, or -32 encoding, and it would also be changed
by #defining forward of #encompass :

#provide an explanation for va_char_p_format utf8
#provide an explanation for va_char16_p_format utf16
#provide an explanation for va_char32_p_format utf32

These macros are appended to an identifier to salvage the suitable
reader for the format string as follows:

va_char_p_read_vtab ## va_char_p_format
va_char16_p_read_vtab ## va_char16_p_format
va_char32_p_read_vtab ## va_char32_p_format

When the usage of a obvious encoding than the default, it is also ensured
that the corresponding vtab declarations are visible.

String Price Encoding

The default for studying string values is UTF-8, -16, or -32 encoding,
for "...", u"...",and U"..." strings,resp. The default might perhaps likely even be
changed by defining one of many following macros forward of #encompass :

#provide an explanation for va_char_p_decode utf8
#provide an explanation for va_char16_p_decode utf16
#provide an explanation for va_char32_p_decode utf32

These macros are appended to an identifier to salvage the suitable
reader for the string worth as follows:

va_xprintf_char_p_ ## va_char_p_decode
va_xprintf_char_pp_ ## va_char_p_decode
va_xprintf_char_const_pp_ ## va_char_p_decode
va_xprintf_char16_p_ ## va_char16_p_decode
va_xprintf_char16_pp_ ## va_char16_p_decode
va_xprintf_char16_const_pp_ ## va_char16_p_decode
va_xprintf_char32_p_ ## va_char32_p_decode
va_xprintf_char32_pp_ ## va_char32_p_decode
va_xprintf_char32_const_pp_ ## va_char32_p_decode

Point to that for every and every parameter form, a obvious printer characteristic is passe,
so for a obvious encoding, three functions must be equipped. A
usual such characteristic implementation appears as follows:

va_stream_t *va_xprintf_char_p_utf8(
    va_stream_t *s,
    char const *x)
{
    va_read_iter_t iter = VA_READ_ITER(&va_char_p_read_vtab_utf8, x);
    return va_xprintf_iter(s, &iter);
}

Output Circulation Encoding

For encoding strings into character arrays, the default encoding is
UTF-8, UTF-16, or UTF-32, relying on the string form. To override
the default, the following #defines might perhaps likely even be situation
forward of #encompass .

#provide an explanation for va_char_p_encode utf8
#provide an explanation for va_char16_p_encode utf16
#provide an explanation for va_char32_p_encode utf32

These are suffixed to salvage the vtab object for writing:

va_char_p_vtab_ ## va_char_p_encode
va_char16_p_vtab_ ## va_char16_p_encode
va_char32_p_vtab_ ## va_char32_p_encode

For dynamically distributed arrays, there are separate #definitions:

#provide an explanation for va_vec8_encode utf8
#provide an explanation for va_vec16_encode utf16
#provide an explanation for va_vec32_encode utf32

These are suffixed to salvage the vtab object for writing:

va_vec_vtab_ ## va_vec_encode
va_vec16_vtab_ ## va_vec16_encode
va_vec32_vtab_ ## va_vec32_encode

For FILE* output, the default encoding is UTF-8, UTF-16BE,
and UTF-32BE, relying on output character width. The
following #defines correspond to the encoding:

#provide an explanation for va_file8_encode utf8
#provide an explanation for va_file16_encode utf16be
#provide an explanation for va_file32_encode utf32be

These are suffixed to salvage the vtab object for writing:

va_file_vtab_ ## va_file_encode
va_file16_vtab_ ## va_file16_encode
va_file32_vtab_ ## va_file32_encode

Citation

C/C++ quotation

  • q quotation option in format specifier
  • when printing integers, here is neglected
  • when printing pointers, this adds the # flag, i.e., the
    0x prefix is printed
  • when printing strings, this selects C format quoted output
  • NULL strings print as NULL, and accomplish no longer situation the
    VA_E_NULL error, in distinction to unquoted printing.
  • without #, prints quotation marks, single for c and C,
    conversion, in any other case double.
  • with z prints the string size indicator in accordance with the input
    string: empty for char, u for char16_t, and U for
    char32_t (and moreover U for 64-bit ints).
  • quotation of unprintable characters
  • quotation of some characters in special notation:
    t, r, n, b, f, ', ", \.
  • 0 flag quotes all non-ASCII the usage of u or U. Point to
    that x is no longer passe, because it might perhaps perhaps likely well no longer stop, so
    quoting x1 plus 1 is extra sophisticated.
  • with 0 flag, chars that are marked as decoding errors are
    quoted as ufffd, the unreal character, to lead obvious of
    printing encoding errors with u quotation, which would
    get the resulting string extra injurious than with ideal the
    encoding errors. With out 0 flag, encoding errors are
    handed via if the input encoding equals output
    encoding, in any other case U+FFFD is encoded.
  • upper case codecs expend upper case letters in hexadecimals

Examples:

  • va_printf("%qs", "foo'bar") prints "foo'bar".
  • va_printf("%qzs", u"foo'bar") prints u"foo'bar".
  • va_printf("%qc", 10) prints 'n'.
  • va_printf("%qzc", 10) prints U'n'
  • va_printf("%#qc", 16) prints 20.
  • va_printf("%#0qc", 0x201c) prints u201c.
  • va_printf("%#0qC", 0x201c) prints u201C.
  • va_printf("%qa", (void*)18) prints 0x12 (on frequent machines)
  • va_printf("%qa", 18) prints 18
  • va_printf("%0qa", u"xd801") prints "xfffd"

Java/JSON quotation

  • Q quotation option in format specifier
  • Indulge in C, nonetheless consistently uses u or U and on no yarn octal
  • NULL strings print as null, and accomplish no longer situation the
    VA_E_NULL error, in distinction to unquoted printing.
  • the z flag is neglected.

Examples:

  • va_printf("%Qs", "foo'bar") prints "foo'bar".
  • va_printf("%Qc", 10) prints 'n'.
  • va_printf("%#Qc", 16) prints u0010.
  • va_printf("%#0Qc", 0x201c) prints u201c.
  • va_printf("%#0QC", 0x201c) prints u201C.
  • va_printf("%Qa", (void*)18) prints 0x12 (on frequent machines)
  • va_printf("%Qa", 18) prints 18

Bourne Shell quotation

  • ample quotation option in format specifier (mnemonic: Kor

Join the pack! Join 8000+ others registered customers, and get chat, get groups, post updates and get pals all the intention via the arena!
https://www.knowasiak.com/register/

Knowasiak
WRITTEN BY

Knowasiak

Hey! look, i give tutorials to all my users and i help them!