Ticket #306 (closed data bug: invalid)

Opened 4 months ago

Last modified 3 months ago

Japanese names not exporting right in SQL

Reported by: meaganhanes+poke@… Owned by: eevee
Priority: minor Component: pokedex
Keywords: Cc:
Difficulty:

Description

Hello, first off thanks so much for providing your database dumps! I've noticed that the encoding doesn't seem to be set to unicode, as all of the Japanese names as well as the french accent in the word "pokémon" are being exported as gibberish.

Attachments

Change History

Changed 4 months ago by eevee

Unicode will be the death of me.

Changed 4 months ago by eevee

Wait, no, hang on. Both my checkout and a download from Trac display everything correctly. How are you getting the file and what are you viewing it with?

Changed 3 months ago by eevee

  • status changed from new to closed
  • resolution set to invalid

Viewing pokedex.sql with a hex editor (well, actually xxd):

011b020: 312c 3232 362c 302c 302c 2742 756c 6261  1,226,0,0,'Bulba
011b030: 7361 7572 272c 2727 2c27 e383 95e3 82b7  saur','','......
011b040: e382 aee3 8380 e383 8d27 2c27 4675 7368  .........','Fush
011b050: 6967 6964 616e 6527 2c31 2c30 2c4e 554c  igidane',1,0,NUL
011b060: 4c2c 2727 2c37 2c36 392c 2767 7261 7373  L,'',7,69,'grass

This gives the kana as: 0xe38395 0xe382b7 0xe382ae 0xe38380 0xe3838d

Kana fall within the three-byte UTF-8 range, from 0x800 to 0xffff. 1110xxxx 10yyyyyy 10zzzzzz -- UTF-8 encoding for these 11100011 10000011 10010101 -- first triplet of bytes in the above kana

Thus the original codepoint is 0011 000011 010101 == 00110000 11010101 or 0x30d5, which is the correct codepoint for フ.

Add/Change #306 (Japanese names not exporting right in SQL)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.