Questions tagged [utf-8]

UTF-8 is a character encoding that describes each Unicode code point using a byte sequence of one to four bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

UTF-8 is a that can describe the set of code points in byte sequences of one to four bytes.

UTF-8 is the most widely used character encoding, and is recommended for use on the Internet. It is the standard character encoding on and other recent -like operating systems. It was designed to be backwards-compatible with while still supporting representation of all Unicode code points.

The algorithm for encoding code points in UTF-8 is described in RFC 3629.

Related tags

22178 questions
1342
votes
13 answers

UTF-8 all the way through

I'm setting up a new server and want to support UTF-8 fully in my web application. I have tried this in the past on existing servers and always seem to end up having to fall back to ISO-8859-1. Where exactly do I need to set the encoding/charsets?…
mercutio
  • 22,151
  • 10
  • 36
  • 37
1299
votes
9 answers

What's the difference between utf8_general_ci and utf8_unicode_ci?

Between utf8_general_ci and utf8_unicode_ci, are there any differences in terms of performance?
KahWee Teng
  • 13,658
  • 3
  • 21
  • 21
1059
votes
22 answers

What's the difference between UTF-8 and UTF-8 with BOM?

What's different between UTF-8 and UTF-8 with BOM? Which is better?
simple
  • 10,723
  • 3
  • 17
  • 11
859
votes
12 answers

Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence

Sample code (in a REPL): import json json_string = json.dumps("ברי צקלה") print(json_string) Output: "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4" The problem: it's not human readable. My (smart) users want to verify or even edit text files with…
Berry Tsakala
  • 15,313
  • 12
  • 57
  • 80
724
votes
18 answers

What is the difference between UTF-8 and Unicode?

I have heard conflicting opinions from people - according to the Wikipedia UTF-8 page. They are the same thing, aren't they? Can someone clarify?
sarsnake
  • 26,667
  • 58
  • 180
  • 286
644
votes
36 answers

Excel to CSV with UTF8 encoding

I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also…
Jeff Treuting
  • 13,910
  • 8
  • 36
  • 47
641
votes
14 answers

UTF-8, UTF-16, and UTF-32

What are the differences between UTF-8, UTF-16, and UTF-32? I understand that they will all store Unicode, and that each uses a different number of bytes to represent a character. Is there an advantage to choosing one over the other?
user60456
613
votes
31 answers

Is it possible to force Excel recognize UTF-8 CSV files automatically?

I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such CSV files (containing e.g. diacritics, cyrillic…
Lyubomyr Shaydariv
  • 20,327
  • 12
  • 64
  • 105
605
votes
21 answers

Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets? Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa. Everything goes: one-liners in your favorite scripting language, command-line tools…
Antti Kissaniemi
  • 18,944
  • 13
  • 54
  • 47
594
votes
7 answers

Why does modern Perl avoid UTF-8 by default?

I wonder why most modern solutions built using Perl don't enable UTF-8 by default. I understand there are many legacy problems for core Perl scripts, where it may break things. But, from my point of view, in the 21st century, big new projects (or…
w.k
  • 8,218
  • 4
  • 32
  • 55
506
votes
5 answers

What is the difference between utf8mb4 and utf8 charsets in MySQL?

What is the difference between utf8mb4 and utf8 charsets in MySQL? I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings; but I'm curious to know whats the difference of utf8mb4 group of encodings with other encoding types defined in MySQL…
Mojtaba Rezaeian
  • 8,268
  • 8
  • 31
  • 54
497
votes
8 answers

What is the difference between UTF-8 and ISO-8859-1?

What is the difference between UTF-8 and ISO-8859-1?
Jagadesh
  • 6,489
  • 8
  • 29
  • 30
483
votes
9 answers

What are Unicode, UTF-8, and UTF-16?

What's the basis for Unicode and why the need for UTF-8 or UTF-16? I have researched this on Google and searched here as well, but it's not clear to me. In VSS, when doing a file comparison, sometimes there is a message saying the two files have…
SoftwareGeek
  • 15,234
  • 19
  • 61
  • 78
460
votes
2 answers

Working with UTF-8 encoding in Python source

Consider: $ cat bla.py u = unicode('d…') s = u.encode('utf-8') print s $ python bla.py File "bla.py", line 1 SyntaxError: Non-ASCII character '\xe2' in file bla.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html…
Nullpoet
  • 10,949
  • 20
  • 48
  • 65
410
votes
14 answers

Unicode (UTF-8) reading and writing to files in Python

I'm having some brain failure in understanding reading and writing text to a file (Python 2.4). # The string, which has an a-acute in it. ss = u'Capit\xe1n' ss8 = ss.encode('utf8') repr(ss), repr(ss8) ("u'Capit\xe1n'", "'Capit\xc3\xa1n'") print…
Gregg Lind
  • 20,690
  • 15
  • 67
  • 81
1
2 3
99 100