Quantcast
Channel: How do I do a case-insensitive string comparison? - Stack Overflow
Viewing all articles
Browse latest Browse all 16

Answer by Veedrac for How do I do a case-insensitive string comparison?

$
0
0

Comparing strings in a case insensitive way seems trivial, but it's not. I will be using Python 3, since Python 2 is underdeveloped here.

The first thing to note is that case-removing conversions in Unicode aren't trivial. There is text for which text.lower() != text.upper().lower(), such as "ß":

>>> "ß".lower()'ß'>>> "ß".upper().lower()'ss'

But let's say you wanted to caselessly compare "BUSSE" and "Buße". Heck, you probably also want to compare "BUSSE" and "BUẞE" equal - that's the newer capital form. The recommended way is to use casefold:

str.casefold()

Return a casefolded copy of the string. Casefolded strings may be used forcaseless matching.

Casefolding is similar to lowercasing but more aggressive because it isintended to remove all case distinctions in a string. [...]

Do not just use lower. If casefold is not available, doing .upper().lower() helps (but only somewhat).

Then you should consider accents. If your font renderer is good, you probably think "ê" == "ê" - but it doesn't:

>>> "ê" == "ê"False

This is because the accent on the latter is a combining character.

>>> import unicodedata>>> [unicodedata.name(char) for char in "ê"]['LATIN SMALL LETTER E WITH CIRCUMFLEX']>>> [unicodedata.name(char) for char in "ê"]['LATIN SMALL LETTER E', 'COMBINING CIRCUMFLEX ACCENT']

The simplest way to deal with this is unicodedata.normalize. You probably want to use NFKD normalization, but feel free to check the documentation. Then one does

>>> unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê")True

To finish up, here this is expressed in functions:

import unicodedatadef normalize_caseless(text):    return unicodedata.normalize("NFKD", text.casefold())def caseless_equal(left, right):    return normalize_caseless(left) == normalize_caseless(right)

Viewing all articles
Browse latest Browse all 16

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>