Migrating from SQL server to MySQL using pentaho unicode issue -


i have problem migrating data sql server mysql. have nvarchar columns in sql server , exporting them unicode textfile. when importing column utf-8 table of mysql error duplicate value: mysql sees no difference between 'kaneko, shûsuke' , 'kaneko, shusuke'. trying these values unique column.

what's wrong? must use charset in mysql?

i tried converting textfile utf8 before importing mysql, still getting same error.

it seems problem in mysql table creation. first use show create table on mysql prompt , see table structure. have used right charset , collate. can read here mysql docs

many times collation indeed not case insensitive, partly accent insensitive, ñ = n. (as joni salonen points out, incorrect!) á = a.

so can use binary collation have own drawback.binary collation compares string strcmp() in c do, if characters different (be case or diacritics difference). downside of sort order not natural.

an example of unnatural sort order (as in "binary" is) : a,b,a,b natural sort order in case e.g : a,a,b,b (small , capital variations of sme letter sorted next each other)

the practical advantage of binary collation speed, string comparison simple/fast. in general case, indexes binary might not produce expected results sort, exact matches can useful. use binary collation specific column (possibly best bet)

for ex-

drop table cc; create table cc ( c char(100) primary key ) default character set utf8 collate utf8_bin; insert cc values ( 'kaneko, shûsuke' ); insert cc values ( 'kaneko, shusuke' ); 

Comments