Skip to content Skip to sidebar Skip to footer

Parsing C Structs In Python

I'm sure this is terribly wrong, and I'm having a couple of problems. I've written out an array of WIN32_FIND_DATAW structures to disk, one after another, and I'd like to consume

Solution 1:

As already mentioned in the comments, this is due to differences between windows and linux. The ctypes module tries to fit into the local environment, hence the mismatch. The best solution is to use the struct module to handle it in a platform independent manner. The following code shows how this can be done for a single record.

# Setup test data based on incomplete samplebytes = "\x16\x00\x00\x00\xdc\x5a\x9f\xd2\x31\x04\xca\x01\xba\x81\x89\x1a\x81\xe2\xcd\x01\xba\x81\x89\x1a\x81\xe2\xcd\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x24\x00\x52\x00\x65\x00\x63\x00\x79\x00\x63\x00\x6c\x00\x65\x00\x2e\x00\x42\x00\x69\x00\x6e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"bytes = bytes + "\x00"*(592-len(bytes))

import struct
import codecs

# typedef struct _WIN32_FIND_DATA {#   DWORD    dwFileAttributes;#   FILETIME ftCreationTime;#   FILETIME ftLastAccessTime;#   FILETIME ftLastWriteTime;#   DWORD    nFileSizeHigh;#   DWORD    nFileSizeLow;#   DWORD    dwReserved0;#   DWORD    dwReserved1;#   TCHAR    cFileName[MAX_PATH];#   TCHAR    cAlternateFileName[14];


fmt = "<L3Q4L520s28s"

attrs, creation, access, write, sizeHigh, sizeLow, reserved0, reserved1, name, alternateName = struct.unpack(fmt, bytes)
name = codecs.utf_16_le_decode(name)[0].strip('\x00')
alternateName = codecs.utf_16_le_decode(alternateName)[0].strip('\x00')
print name

NOTE: This assumes that the size of MAX_PATH is 260 (which should be true, but you never know).

To read all values from the file you need to read blocks of 592 bytes at a time and then decode it as above.

Solution 2:

You should be using the struct module from the standard library http://docs.python.org/2/library/struct.html since you are parsing a binary file format. The ctypes module is used for integrating shared libraries (DLLs) with a binary API into a Python app. I'm not saying that what you are trying to do is not possible, but using ctypes is more complicated that simply parsing C structs from a binary file.

Just remember that in C there is no such thing as a PWIN32_FIND_DATAW pointer. This is just a typedef that will resolve down to one of the raw C datatypes such as a 32-bit pointer, a 64-bit pointer, etc. The data in the file represents the raw base C datatypes.

In answer to comment... Avoid looking for shortcuts. You really do need deep understanding of the bits that are being written to the file and how they are organized. For that you will likely need to do some hexdumps and check the actual data representation. According to MS http://msdn.microsoft.com/en-ca/library/windows/desktop/aa365740(v=vs.85).aspx this is not a real complex structure. If the structure in wintypes doesn't work for you it is possible that you have found a bug. It is also possible that the on-disk structure is not identical to the in-ram structure. Often an in-ram data structure includes padding to maintain alignment on 16 or 64 byte boundaries. But programmers have been known to NOT dump the struct as is, but to pick it apart and output to a file minus the padding. Since ctypes/wintypes is intended for making binary api calls to a DLL its bias would be to include padding in the data layout. But the file might not include this.

Post a Comment for "Parsing C Structs In Python"