# Backspace doesn't remove utf-8 multibyte characters on console



## aeifn (Jan 24, 2020)

In FreeBSD 12.1 backspace in terminal only deletes one byte of multibyte character.
For example:

`# cat > /tmp/test
tы<backspace>t
# cat /tmp/test
t�t
# hexdump -C /tmp/test
74 d1 74 0a`

Where 'ы' is a russian letter (D18B in hexadecimal form). You may see that only second byte of multibyte character was deleted by backspace.


----------



## yuripv (Jan 25, 2020)

What is `locale` output?  What is your shell? Also, define "terminal" -- X11 terminal, ssh session, system console?


----------



## aeifn (Jan 25, 2020)

Thank you for answer!
`# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=`

This behavior is observed in vt(4) console, also under xterm(1).

I know, under linux it is solved by iutf8 console extension, but for freebsd i dont know an answer.


----------



## yuripv (Jan 25, 2020)

Interesting, I see the problem, yes, ssh'ing from windows using Putty, and using xterm locally.  Don't have linux installed anywhere, but can't reproduce in iTerm2 on my MBP.  Something to look into.


----------



## aeifn (Jan 25, 2020)

I found freebsd-bugs bugreport for june 2017 (with no answer) with the same problem.


----------



## memreflect (Jan 26, 2020)

The IUTF8 terminal input flag is arguably a hack, but it does solve the problem at least...for UTF-8.  Unfortunately, that still leaves other multibyte encodings like GB18030, EUC-KR, and Shift JIS that suffer from the same trouble as well (try U+6F22 U+8A9E 漢語; it's made up of two 3-byte UTF-8 sequences--E6 BC A2 and E8 AA 9E).

It's a chicken-and-egg problem really: the terminal is the one buffering the data before handing it off to read(2) (called by functions like getwchar(3) and fgets(3)), yet the terminal can't delete the entire character sequence because it can't possibly know what you're doing with it.  Things like the IUTF8 hack are perhaps a step in the right direction, but multibyte encodings are simply not an easy thing to deal with for terminals, especially when you consider the many control sequences, termios(4) support, and such to contend with.

It's not that it can't be done; it's just a daunting amount of work that very few, if any, people have put in.  xterm currently uses luit(1) for transforming characters from UTF-8 to the native encoding and vice-versa, but it still isn't a perfect solution.


----------



## aeifn (Jan 27, 2020)

I invented a practical solution using rlwrap(1).
So I can use:
`rlwrap cat > test`

Also, found another principal discussion about iutf8 in FreeBSD terminal at Debian kFreeBSD community.


----------

