compression - Python 2.7 pyLZMA works, Python 3.4 LZMA module does not -


import sys import os import zlib  try:     import pylzma lzma except importerror:     import lzma  io import stringio import struct  #-----------------------------------------------------------------------------------------------------------------------  def read_ui8(c):     return struct.unpack('<b', c)[0] def read_ui16(c):     return struct.unpack('<h', c)[0] def read_ui32(c):     return struct.unpack('<i', c)[0]  def parse(input):     """parses header information swf file."""     if hasattr(input, 'read'):         input.seek(0)     else:         input = open(input, 'rb')      header = { }      # read 3-byte signature field     header['signature'] = signature = b''.join(struct.unpack('<3c', input.read(3))).decode()      # version     header['version'] = read_ui8(input.read(1))      # file size (stored 32-bit integer)     header['size'] = read_ui32(input.read(4))      # payload      if header['signature'] == 'fws':         print("the opened file doesn't appear compressed")         buffer = input.read(header['size'])     elif header['signature'] == 'cws':         print("the opened file appears compressed zlib")         buffer = zlib.decompress(input.read(header['size']))     elif header['signature'] == 'zws':         print("the opened file appears compressed lzma")         # zws(lzma)         # | 4 bytes       | 4 bytes    | 4 bytes       | 5 bytes    | n bytes    | 6 bytes         |         # | 'zws'+version | scriptlen  | compressedlen | lzma props | lzma data  | lzma end marker |         size = read_ui32(input.read(4))         buffer = lzma.decompress(input.read())      # containing rectangle (struct rect)      # number of bits used store each of rect values     # stored in first 5 bits of first byte.      nbits = read_ui8(buffer[0]) >> 3      current_byte, buffer = read_ui8(buffer[0]), buffer[1:]     bit_cursor = 5      item in 'xmin', 'xmax', 'ymin', 'ymax':         value = 0         value_bit in range(nbits-1, -1, -1): # == reversed(range(nbits))             if (current_byte << bit_cursor) & 0x80:                 value |= 1 << value_bit             # advance bit cursor next bit             bit_cursor += 1              if bit_cursor > 7:                 # we've exhausted current byte, consume next 1                 # buffer.                 current_byte, buffer = read_ui8(buffer[0]), buffer[1:]                 bit_cursor = 0          # convert value twips pixel value         header[item] = value / 20      header['width'] = header['xmax'] - header['xmin']     header['height'] = header['ymax'] - header['ymin']      header['frames'] = read_ui16(buffer[0:2])     header['fps'] = read_ui16(buffer[2:4])      input.close()     return header  header = parse(sys.argv[1]);  print('swf header') print('----------') print('version:      %s' % header['version']) print('signature:    %s' % header['signature']) print('dimensions:   %s x %s' % (header['width'], header['height'])) print('bounding box: (%s, %s, %s, %s)' % (header['xmin'], header['xmax'], header['ymin'], header['ymax'])) print('frames:       %s' % header['frames']) print('fps:          %s' % header['fps']) 

i under impression built in python 3.4 lzma module works same python 2.7 pylzma module. code i've provided runs on both 2.7 , 3.4, when run on 3.4 (which doesn't have pylzma resorts inbuilt lzma) following error:

_lzma.lzmaerror: input format not supported decoder 

why pylzma work python 3.4's lzma doesn't?

while not have answer why 2 modules work differently, have solution.

i unable non-stream lzma lzma.decompress work since not have enough knowledge lzma/xz/swf specs, got lzma.lzmadecompressor work. completeness, believe swf lzma uses header format (not 100% confirmed):

bytes  length  type  endianness  description  0- 2  3       ui8   -           swf signature: zws  3     1       ui8   -           swf version  4- 7  4       ui32  le          swf filelength aka file size   8-11  4       ui32  le          swf? compressed size (file size - 17)  12     1       -     -           lzma decoder properties 13-16  4       ui32  le          lzma dictionary size 17-    -       -     -           lzma compressed data (including rest of swf header) 

however lzma file format spec says should be:

bytes  length  type  endianness  description  0     1       -     -           lzma decoder properties  1- 4  4       ui32  le          lzma dictionary size  5-12  8       ui64  le          lzma uncompressed size 13-    -       -     -           lzma compressed data 

i never able head around uncompressed size should (if possible define format). pylzma seems not care this, while python 3.3 lzma does. however, seems explicit unknown size works , may specified ui64 value 2^64, e.g. 8*b'\xff' or 8*'\xff', shuffling around headers bit , instead of using:

buffer = lzma.decompress(input.read()) 

try:

d = lzma.lzmadecompressor(format=lzma.format_alone) buffer = d.decompress(input.read(5) + 8*b'\xff' + input.read()) 

note: had no local python3 interpreter available tested online modified read procedure, might not work out of box.

edit: confirmed work in python3 things needed changed, marcus mentioned unpack (easily solved using buffer[0:1] instead of buffer[0]). it's not necessary read whole file either, small chunk, 256 bytes should fine reading whole swf header. frames field bit quirky too, though believe have bit shifting, i.e.:

header['frames'] = read_ui16(buffer[0:2]) >> 8 

swf file format spec

lzma file format spec


Comments