Quantcast
Channel: How do I do a case-insensitive string comparison? - Stack Overflow
Viewing all articles
Browse latest Browse all 16

Answer by Nathan Craike for How do I do a case-insensitive string comparison?

$
0
0

Using Python 2, calling .lower() on each string or Unicode object...

string1.lower() == string2.lower()

...will work most of the time, but indeed doesn't work in the situations @tchrist has described.

Assume we have a file called unicode.txt containing the two strings Σίσυφος and ΣΊΣΥΦΟΣ. With Python 2:

>>> utf8_bytes = open("unicode.txt", 'r').read()>>> print repr(utf8_bytes)'\xce\xa3\xce\xaf\xcf\x83\xcf\x85\xcf\x86\xce\xbf\xcf\x82\n\xce\xa3\xce\x8a\xce\xa3\xce\xa5\xce\xa6\xce\x9f\xce\xa3\n'>>> u = utf8_bytes.decode('utf8')>>> print uΣίσυφοςΣΊΣΥΦΟΣ>>> first, second = u.splitlines()>>> print first.lower()σίσυφος>>> print second.lower()σίσυφοσ>>> first.lower() == second.lower()False>>> first.upper() == second.upper()True

The Σ character has two lowercase forms, ς and σ, and .lower() won't help compare them case-insensitively.

However, as of Python 3, all three forms will resolve to ς, and calling lower() on both strings will work correctly:

>>> s = open('unicode.txt', encoding='utf8').read()>>> print(s)ΣίσυφοςΣΊΣΥΦΟΣ>>> first, second = s.splitlines()>>> print(first.lower())σίσυφος>>> print(second.lower())σίσυφος>>> first.lower() == second.lower()True>>> first.upper() == second.upper()True

So if you care about edge-cases like the three sigmas in Greek, use Python 3.

(For reference, Python 2.7.3 and Python 3.3.0b1 are shown in the interpreter printouts above.)


Viewing all articles
Browse latest Browse all 16

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>