Handle Turkish dotted and dotless i properly#3011
Handle Turkish dotted and dotless i properly#3011yutannihilation merged 9 commits intotidyverse:masterfrom
Conversation
hadley
left a comment
There was a problem hiding this comment.
Approach seems good. It would be nice to have a unit test, but I can't think of an obvious way to simulate it.
| # Use chartr() for safety since toupper() fails to convert i to I in Turkish locale | ||
| lower_ascii <- "abcdefghijklmnopqrstuvwxyz" | ||
| upper_ascii <- "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | ||
| tolower_ascii <- function(x) chartr(upper_ascii, lower_ascii, x) |
There was a problem hiding this comment.
Can we make this to_lower_ascii() and to_upper_ascii()?
And I think we should wrap to_lower() and to_upper() to throw an error, to make sure that this error doesn't arise again in the future.
|
Thanks, I added some unit tests. Do you think it's overkill to install 5f1bb92 |
|
Ah, sorry, it seems I'm doing wrong about detecting the available locales... Let me fix.
|
|
Sorry my comment was supposed to reinforce not having a unit test. I don't think it gets us much here as it's likely to be fragile, and we've reduced the chance by overwriting |
| } | ||
|
|
||
| toupper <- function(x) { | ||
| stop('Please use `to_upper_ascii()`, which works fine in Turkish locale.', call. = FALSE) |
There was a problem hiding this comment.
"in Turkish locale" -> "in all locales"
|
Oh, I got it wrong, sorry. Agreed. |
|
Thanks for confirmation, @cystein! |
* Use chartr() instad of toupper() and tolower()
* Use chartr() instad of toupper() and tolower()
* Use chartr() instad of toupper() and tolower()
|
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Fix #3000
Turkish has two versions of i; dotted and dotless. Accordingly, the capital of
iisİ, notI, which make ggplot2 to fail to findPositionIdentityandStatIdentity. The same thing can happen on dotlessıandI(c.f. http://www.i18nguy.com/unicode/turkish-i18n.html)This seems a well-known problem, and ICU has a flag for this, but unfortunately, there's no way to use this via R's
toupper().After googling around, I found several options to fix this:
chartr()instead oftoupper()andtolower()(This PR)stringi::stri_trans_toupper()andstringi::stri_trans_tolower(), where we can specifylocale.For 2., stringi is cool, but adding it to the dependency is a bit heavy. For 3., it seems dangerous to mess the user's locale. So, from these, I think option 1 is rather reasonable, although not very cool.
chartr()is around 4x slower thantoupper(), but I hope this is acceptable considering that people in Turkish locale probably almost cannot use ggplot2... (I'm not familiar with this problem, so I might be wrong.)