Why Do These Dtypes Compare Equal But Hash Different?

May 25, 2024 Post a Comment

In [30]: import numpy as np In [31]: d = np.dtype(np.float64) In [32]: d Out[32]: dtype('float64') In [33]: d == np.float64 Out[33]: True In [34]: hash(np.float64) Out[34]: -92

Solution 1:

As tttthomasssss notes, the type (class) for np.float64 and d are different. They are different kinds of things:

In [435]: type(np.float64)
Out[435]: type

Type type means (usually) that it is a function, so it can be used as:

In[436]: np.float64(0)
Out[436]: 0.0In[437]: type(_)
Out[437]: numpy.float64

creating a numeric object. Actually that looks more like a class definition. But since numpy uses a lot of compiled code, and its ndarray uses its own __new__, I wouldn't be surprised if it straddles the line.

In [438]: np.float64.__hash__??
Type:       wrapper_descriptor
String Form:<slot wrapper '__hash__' of 'float' objects>
Docstring:  x.__hash__() <==> hash(x)

I was thinking this would the hash(np.float64), but it might actually be the hash for an object of that type, e.g. hash(np.float64(0)). In that case hash(np.float64) just uses the default type.__hash__ method.

Moving on to the dtype:

In [439]: d=np.dtype(np.float64)

In [440]: type(d)
Out[440]: numpy.dtype

d is not a function or class:

In [441]: d(0)
...
TypeError: 'numpy.dtype'object is not callable

In [442]: d.__hash__??
Type:       method-wrapper
String Form:<method-wrapper '__hash__' of numpy.dtype object at 0xb60f8a60>
Docstring:  x.__hash__() <==> hash(x)

Looks like np.dtype does not define any special __hash__ method, it just inherits from object.

Further illustrating the difference between float64 and d, look at the class inheritance stack

In [443]: np.float64.__mro__
Out[443]: 
(numpy.float64,
 numpy.floating,
 numpy.inexact,
 numpy.number,
 numpy.generic,
 float,
 object)

In [444]: d.__mro__
...
AttributeError: 'numpy.dtype'object has no attribute '__mro__'

In [445]: np.dtype.__mro__
Out[445]: (numpy.dtype, object)

So np.float64 doesn't define a hash either, it just inherits from float. d doesn't have an __mro__ because it's an object, not a class.

numpy has enough compiled code, and a long history of its own, that you can't count on Python documentation always applying.

np.dtype and np.float64 evidently have __eq__ methods that allow them to be compared with each other, but numpy developers did not put any effort into making sure that the __hash__ methods comply. Most likely because they don't need to use either as a dictionary key.

I've never seen code like:

In [453]: dd={np.float64:12,d:34}

In [454]: dd
Out[454]: {dtype('float64'): 34, numpy.float64: 12}

In [455]: dd[np.float64]
Out[455]: 12

In [456]: dd[d]
Out[456]: 34

Solution 2:

They shouldn't behave this way, but __eq__ and __hash__ for numpy.dtype objects are broken on an essentially unfixable design level. I'll be pulling heavily from njsmith's comments on a dtype-related bug report for this answer.

np.float64 isn't actually a dtype. It's a type, in the ordinary sense of the Python type system. Specifically, if you retrieve a scalar from an array of float64 dtype, np.float64 is the type of the resulting scalar.

np.dtype(np.float64) is a dtype, an instance of numpy.dtype. dtypes are how NumPy records the structure of the contents of a NumPy array. They are particularly important for structured arrays, which can have very complex dtypes. While ordinary Python types could have filled much of the role of dtypes, creating new types on the fly for new structured arrays would be highly awkward, and it would probably have been impossible in the days before type-class unification.

numpy.dtype implements __eq__ basically like this:

def__eq__(self, other):
    ifisinstance(other, numpy.dtype):
        return regular_comparison(self, other)
    return self == numpy.dtype(other)

which is pretty broken. Among other problems, it's not transitive, it raises TypeError when it should return NotImplemented, and its output is really bizarre at times because of how dtype coercion works:

>>>x = numpy.dtype(numpy.float64)>>>x == None
True

numpy.dtype.__hash__ isn't any better. It makes no attempt to be consistent with the __hash__ methods of all the other types numpy.dtype.__eq__ accepts (and with so many incompatible types to deal with, how could it?). Heck, it shouldn't even exist, because dtype objects are mutable! Not just mutable like modules or file objects, where it's okay because __eq__ and __hash__ work by identity. dtype objects are mutable in ways that will actually change their hash value:

>>>x = numpy.dtype([('f1', float)])>>>hash(x)
-405377605
>>>x.names = ['f2']>>>hash(x)
1908240630

When you try to compare d == np.float64, d.__eq__ builds a dtype out of np.float64 and finds that d == np.dtype(np.float64) is True. When you take their hashes, though, np.float64 uses the regular (identity-based) hash for type objects and d uses the hash for dtype objects. Normally, equal objects of different types should have equal hashes, but the dtype implementation doesn't care about that.

Unfortunately, it's impossible to fix the problems with dtype __eq__ and __hash__ without breaking APIs people are relying on. People are counting on things like x.dtype == 'float64' or x.dtype == np.float64, and fixing dtypes would break that.

Solution 3:

They are not the same thing, while np.float64 is a type, d is an instance of numpy.dtype, hence they hash to different values, but all instances of d created the same way will hash to the same value because they are identical (which of course does not necessarily mean they point to the same memory location).

Edit:

Given your code above you can try the following:

In [72]: type(d)
Out[72]: numpy.dtype

In [74]: type(np.float64)
Out[74]: type

which shows you that the two are of different type and hence will hash to different values. Showing that different instances of numpy.dtype can be shown by the following example:

In [77]: import copy
In [78]: dd = copy.deepcopy(d) # Try copying

In [79]: dd
Out[79]: dtype('float64')

In [80]: hash(dd)
Out[80]: -6584369718629170405

In [81]: hash(d) # original d
Out[81]: -6584369718629170405

In [82]: ddd = np.dtype(np.float64) # new instance
In [83]: hash(ddd)
Out[83]: -6584369718629170405

# If using CPython, id returns the address in memory (see: https://docs.python.org/3/library/functions.html#id)
In [84]: id(ddd)
Out[84]: 4376165768

In [85]: id(dd)
Out[85]: 4459249168

In [86]: id(d)
Out[86]: 4376165768

Its nice to see that ddd (the instance created the same way as d), and d itself share the same object in memory, but dd (the copied object) uses a different address.

The equality checks evaluate as you would expect, given the hashes above:

In [87]: dd == np.float64
Out[87]: TrueIn [88]: d == np.float64
Out[88]: TrueIn [89]: ddd == np.float64
Out[89]: TrueIn [90]: d == dd
Out[90]: TrueIn [91]: d == ddd
Out[91]: TrueIn [92]: dd == ddd
Out[92]: True

Solution 4:

It's because you're hashing a type against a dtype object.

Although the values compare equal (as evidences by d == np.float64, their types are different:

printtype(d)
printtype(np.float64)

Produces

<type 'numpy.dtype'>
<type 'type'>

According to the Python docs:

hash(object)
Return the hash value of the object (if it has one). Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup. Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0).

And since a dtype is not a numeric type, there is no guarantee that such and object will result in the same hash as a type that compares equal.

EDIT: From the Python 3.5 docs:

object.__hash__(self)
Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. hash() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

Which appears to imply that hash(d) == hash(np.float64) should return True in your case.

I did notice that there is a note right after that states:

hash() truncates the value returned from an object’s custom hash() method to the size of a Py_ssize_t. This is typically 8 bytes on 64-bit builds and 4 bytes on 32-bit builds.

However, I wasn't able to determine that the size of the objects returned from the hash functions were in fact different; they appear the same (I used sys.getsizeof)

Getting Started with Python

Why Do These Dtypes Compare Equal But Hash Different?

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Post a Comment for "Why Do These Dtypes Compare Equal But Hash Different?"