赞
踩
我们都知道Android的apk文件就是一个zip格式的文件。由于工作需要,经常要解压apk文件拿到里面的资源,可是最近很多apk通过各种解压软件解压的时候都会失败,但是却能够安装和使用aapt2工具查看包的内容。本来通过python的zip可以批量解压,现在都要安装怕不是要了老命,于是就研究一下Android 11源码中的zip解压库,看看有什么特殊的地方。
https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.2.0.txt 这里是官方文档,想要最详细的格式可以看这里。
粗略来看zip可以分为这三个部分,第一部分保存文件数据,第二部分是核心目录保存的是第一部分中的文件的信息,最后是结束标志,他的作用首先是标志zip文件的结束,第二是存储了核心目录的信息,所以解析zip文件反而是从后往前来解析的。
I. End of central directory record: end of central dir signature 4 bytes (0x06054b50) //首先就是4个字节的标志位0x06054b50,用于找到EOCD number of this disk 2 bytes//当前的硬盘编号 number of the disk with the start of the central directory 2 bytes//核心目录开始的硬盘编号 total number of entries in the central directory on this disk 2 bytes//当前磁盘中保存的核心目录entry总数 total number of entries in the central directory 2 bytes//核心目录entry总数 size of the central directory 4 bytes//核心目录大小 offset of start of central directory with respect to the starting disk number 4 bytes//核心目录开始位置相对于磁盘编号的偏移 .ZIP file comment length 2 bytes//注释长度 .ZIP file comment (variable size)//注释内容的内容
解压zip的第一步操作就是在EOCD中找到核心目录开始的位置和大小。
Central directory structure: [file header 1] . . . [file header n] [digital signature] File header: central file header signature 4 bytes (0x02014b50)//魔数 version made by 2 bytes//压缩用的版本 version needed to extract 2 bytes//解压需要的最低版本 general purpose bit flag 2 bytes//通用位标记,如果最低位是1就是加密为0就是未加密 compression method 2 bytes//压缩方法 last mod file time 2 bytes//文件最后修改时间 last mod file date 2 bytes//文件最后修改日期 crc-32 4 bytes//CRC-32算法 compressed size 4 bytes//压缩后大小 uncompressed size 4 bytes//未压缩的大小 file name length 2 bytes//文件名长度 extra field length 2 bytes//扩展域长度 file comment length 2 bytes//文件注释长度 disk number start 2 bytes//文件开始位置的磁盘编号 internal file attributes 2 bytes//内部文件属性 external file attributes 4 bytes//外部文件属性 relative offset of local header 4 bytes//本地文件header的相对位移。 file name (variable size)。 //目录文件名 extra field (variable size) //扩展域 file comment (variable size) //文件注释内容 Digital signature: header signature 4 bytes (0x05054b50) size of data 2 bytes signature data (variable size)
核心目录由一个个file header组成,每一个file header描述了一个文件,可以拿到文件名。文件数据的位置和大小,接下来就可以去数据部分拿到文件解压了,其中general purpose bit flag & 0x01拿到最低位的值表示是否加密,将其改为1就可以实现最简单的伪加密,因为实际在打包时并没有加密设置密码只是修改了标识位,在android安装的时候不会去读这个标识位,而很多zip库和zip解压软件是会根据这个标识位来判断是否需要输入密码,从而实现了反解压的能力。
[local file header 1] [file data 1] [data descriptor 1] . . . [local file header n] [file data n] [data descriptor n] A. Local file header: local file header signature 4 bytes (0x04034b50) //标识位 version needed to extract 2 bytes //能解压的最低版本 general purpose bit flag 2 bytes //general purpose bit flag compression method 2 bytes //加密方法 last mod file time 2 bytes //文件最后修改时间 last mod file date 2 bytes //文件最后修改日期 crc-32 4 bytes //CRC32校验码 compressed size 4 bytes //压缩后大小 uncompressed size 4 bytes //未压缩的大小 file name length 2 bytes //文件名长度 extra field length 2 bytes //扩展域长度 file name (variable size)//文件名 extra field (variable size)//扩展区 B. File data Immediately following the local header for a file is the compressed or stored data for the file. The series of [local file header][file data][data descriptor] repeats for each file in the .ZIP archive. C. Data descriptor: //一般不会有 crc-32 4 bytes compressed size 4 bytes uncompressed size 4 bytes
可以发现Local file header内容和核心目录中是几乎一样的,接在Local file header后面就是文件数据了,根据数据长度和加密方式就可以解压了。
在frameworks中可以通过frameworks/base/libs/androidfw/ZipUtils.cpp来解压文件。但是仔细看代码会发现这个类只是对ziparchive库的函数的封装,最终调用都进入了ziparchive中。这个库的源码路径是system/core/libziparchive/
system/core/libziparchive/zip_archive.cc
int32_t OpenArchive(const char* fileName, ZipArchiveHandle* handle) {
const int fd = open(fileName, O_RDONLY | O_BINARY, 0);
ZipArchive* archive = new ZipArchive(fd, true);
*handle = archive;
if (fd < 0) {
ALOGW("Unable to open '%s': %s", fileName, strerror(errno));
return kIoError;
}
return OpenArchiveInternal(archive, fileName);
}
static int32_t OpenArchiveInternal(ZipArchive* archive, const char* debug_file_name) {
int32_t result = -1;
if ((result = MapCentralDirectory(debug_file_name, archive)) != 0) { //解析ECOD拿到核心目录的位置和其他信息
return result;
}
if ((result = ParseZipArchive(archive))) {//解析zip文件
return result;
}
return 0;
}
到这里激动人心的核心目录已经出来了,下面就看看是怎么通过MapCentralDirectory拿到核心目录
/* * Find the zip Central Directory and memory-map it. * * On success, returns 0 after populating fields from the EOCD area: * directory_offset * directory_ptr * num_entries */ static int32_t MapCentralDirectory(const char* debug_file_name, ZipArchive* archive) { //删除部分异常处理代码 /* * Perform the traditional EOCD snipe hunt. * * We're searching for the End of Central Directory magic number, * which appears at the start of the EOCD block. It's followed by * 18 bytes of EOCD stuff and up to 64KB of archive comment. We * need to read the last part of the file into a buffer, dig through * it to find the magic number, parse some values out, and use those * to determine the extent of the CD. * * We start by pulling in the last part of the file. */ off64_t read_amount = kMaxEOCDSearch; if (file_length < read_amount) { read_amount = file_length; } std::vector<uint8_t> scan_buffer(read_amount); int32_t result = MapCentralDirectory0(debug_file_name, archive, file_length, read_amount, scan_buffer.data()); return result; }
里面只是做了一些异常处理,最终用的MapCentralDirectory0函数来解析。异常处理中出现了很熟悉EocdRecord,这个结构体就是用来描述EOCD的。
static int32_t MapCentralDirectory0(const char* debug_file_name, ZipArchive* archive, off64_t file_length, off64_t read_amount, uint8_t* scan_buffer) { const off64_t search_start = file_length - read_amount; if (!archive->mapped_zip.ReadAtOffset(scan_buffer, read_amount, search_start)) { ALOGE("Zip: read %" PRId64 " from offset %" PRId64 " failed", static_cast<int64_t>(read_amount), static_cast<int64_t>(search_start)); return kIoError; } /* * Scan backward for the EOCD magic. In an archive without a trailing * comment, we'll find it on the first try. (We may want to consider * doing an initial minimal read; if we don't find it, retry with a * second read as above.) */ //循环查找ECOD int i = read_amount - sizeof(EocdRecord); for (; i >= 0; i--) { if (scan_buffer[i] == 0x50) { uint32_t* sig_addr = reinterpret_cast<uint32_t*>(&scan_buffer[i]); if (get_unaligned<uint32_t>(sig_addr) == EocdRecord::kSignature) {// kSignature = 0x06054b50;通过标志位找到EOCD ALOGV("+++ Found EOCD at buf+%d", i); break; } } } if (i < 0) { ALOGD("Zip: EOCD not found, %s is not zip", debug_file_name); return kInvalidFile; } const off64_t eocd_offset = search_start + i; const EocdRecord* eocd = reinterpret_cast<const EocdRecord*>(scan_buffer + i);//生成EocdRecord对象,这个对象的作用就是根据zip的EOCD结构解析数据 /* * Verify that there's no trailing space at the end of the central directory * and its comment. */ const off64_t calculated_length = eocd_offset + sizeof(EocdRecord) + eocd->comment_length; if (calculated_length != file_length) { ALOGW("Zip: %" PRId64 " extraneous bytes at the end of the central directory", static_cast<int64_t>(file_length - calculated_length)); return kInvalidFile; } /* * Grab the CD offset and size, and the number of entries in the * archive and verify that they look reasonable. */ if (static_cast<off64_t>(eocd->cd_start_offset) + eocd->cd_size > eocd_offset) { ALOGW("Zip: bad offsets (dir %" PRIu32 ", size %" PRIu32 ", eocd %" PRId64 ")", eocd->cd_start_offset, eocd->cd_size, static_cast<int64_t>(eocd_offset)); #if defined(__ANDROID__) if (eocd->cd_start_offset + eocd->cd_size <= eocd_offset) { android_errorWriteLog(0x534e4554, "31251826"); } #endif return kInvalidOffset; } if (eocd->num_records == 0) { ALOGW("Zip: empty archive?"); return kEmptyArchive; } //到这里各种异常判断结束,EOCD合法并可以拿到核心目录中File header的数量 ALOGV("+++ num_entries=%" PRIu32 " dir_size=%" PRIu32 " dir_offset=%" PRIu32, eocd->num_records, eocd->cd_size, eocd->cd_start_offset); /* * It all looks good. Create a mapping for the CD, and set the fields * in archive. */ //InitializeCentralDirectory创建相关变量保存起来 if (!archive->InitializeCentralDirectory(debug_file_name, static_cast<off64_t>(eocd->cd_start_offset), static_cast<size_t>(eocd->cd_size))) { ALOGE("Zip: failed to intialize central directory.\n"); return kMmapFailed; } archive->num_entries = eocd->num_records; archive->directory_offset = eocd->cd_start_offset; return 0; }
回到OpenArchiveInternal调用MapCentralDirectory拿到相关信息之后就是调用ParseZipArchive解析了。
//函数比较长删掉了一部分异常处理的代码 static int32_t ParseZipArchive(ZipArchive* archive) { const uint8_t* const cd_ptr = archive->central_directory.GetBasePtr(); const size_t cd_length = archive->central_directory.GetMapLength(); const uint16_t num_entries = archive->num_entries; /* * Create hash table. We have a minimum 75% load factor, possibly as * low as 50% after we round off to a power of 2. There must be at * least one unused entry to avoid an infinite loop during creation. */ archive->hash_table_size = RoundUpPower2(1 + (num_entries * 4) / 3); //创建hashtable archive->hash_table = reinterpret_cast<ZipStringOffset*>(calloc(archive->hash_table_size, sizeof(ZipStringOffset))); /* * Walk through the central directory, adding entries to the hash * table and verifying values. */ const uint8_t* const cd_end = cd_ptr + cd_length; const uint8_t* ptr = cd_ptr; for (uint16_t i = 0; i < num_entries; i++) { //循环获取每一个CentralDirectoryRecord if (ptr > cd_end - sizeof(CentralDirectoryRecord)) { ALOGW("Zip: ran off the end (item #%" PRIu16 ", %zu bytes of central directory)", i, cd_length); #if defined(__ANDROID__) android_errorWriteLog(0x534e4554, "36392138"); #endif return kInvalidFile; } const CentralDirectoryRecord* cdr = reinterpret_cast<const CentralDirectoryRecord*>(ptr); if (cdr->record_signature != CentralDirectoryRecord::kSignature) { //kSignature = 0x02014b50;每次都会判断一下标志位 ALOGW("Zip: missed a central dir sig (at %" PRIu16 ")", i); return kInvalidFile; } const off64_t local_header_offset = cdr->local_file_header_offset; const uint16_t file_name_length = cdr->file_name_length; const uint16_t extra_length = cdr->extra_field_length; const uint16_t comment_length = cdr->comment_length; const uint8_t* file_name = ptr + sizeof(CentralDirectoryRecord); // Add the CDE filename to the hash table. std::string_view entry_name{reinterpret_cast<const char*>(file_name), file_name_length};//根据filename创建entry_name const int add_result = AddToHash(archive->hash_table, archive->hash_table_size, entry_name, archive->central_directory.GetBasePtr());//加入hashtable,key是entry_name,fvalue是当前CentralDirectoryRecord的地址 ptr += sizeof(CentralDirectoryRecord) + file_name_length + extra_length + comment_length; } ALOGV("+++ zip good scan %" PRIu16 " entries", num_entries); return 0; }
到这里CentralDirectoryRecord的hashtable也创建好了,接下来要解压就是从hashtable中获取CentralDirectoryRecord,根据CentralDirectoryRecord找到对应数据的地址和长度截取数据就好了。
zip解压的流程就到这里结束,android中解压还是通过标准的流程。找到ECOD解析CentralDirectory->根据CentralDirectory创建CentralDirectoryRecord的hashtable->最终通过CentralDirectoryRecord中的文件地址和长度压缩方式,拿到数据解压。后续如果再遇到修改了其他地方导致解压失败应该也很容易解决了。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。