unDROP for InnoDB 工具能用于恢复损坏的MySQL数据库。在这篇文章中,我们会展示当MySQL数据库的文件损坏,甚至innodb_force_recovery=6也无法帮助时,如何恢复数据库。
InnoDB表空间的损坏可能是多种原因造成的。一个即将报废的硬盘可能写入垃圾数据,导致页校验错误。然后InnoDB报告给错误日志:
InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 4.
MySQL差劲的启动脚本是众所周知的。一个简单的升级过程可能出现两个mysqld的进程写入同一表空间的情况。这也会导致损坏。有时,电源复位不仅损坏InnoDB文件,而且使文件系统对操作系统不可用。
InnoDB 对于页的操作非常严格。如果校验不匹配或头中一些字段包含非预期值,InnoDB明智地倾向于崩溃,以避免进一步的损坏。
首先会尝试以innodb_force_recovery选项启动MySQL。此选项的目的是让用户转储其数据。由于无法修复表空间,用户必须删除表空间,创建新的,并加载回数据。
innodb_force_recovery 接受从1到6的值。值越高,InnoDB就会禁用越多功能。
在本文章,我们假设MySQL 即使在innodb_force_recovery=6的情况下也不能启动。
恢复工具包直接对InnoDB文件进行操作,它可以从InnoDB页面读取记录。如果页的某一部分被破坏,它将跳过那块,并继续进一步读取页的记录。
所以,我们来损坏一些InnoDB文件并对表进行恢复。
损坏InnoDB
简单起见,我们将重写.ibd 文件用户数据的一部分。
事情情况中,损坏可能发生在索引PRIMARY的任何地方。
在表sakila.actor的PRIMARY索引的中间,我们将数据用128个字符“A”重写:
0000C058 00 00 00 02 00 32 01 00 02 00 1C 69 6E 66 69 6D 75 6D 00 05 .....2.....infimum.. 0000C06C 00 0B 00 00 73 75 70 72 65 6D 75 6D 07 08 00 00 10 00 29 00 ....supremum......). 0000C080 01 00 00 00 00 05 1E 9F 00 00 01 4D 01 10 50 45 4E 45 4C 4F ...........M..PENELO 0000C094 50 45 47 55 49 4E 45 53 53 43 F2 F5 A9 08 04 00 00 18 00 26 PEGUINESSC.........& 0000C0A8 00 02 00 00 00 00 05 1E 9F 00 00 01 4D 01 1A 4E 49 43 4B 57 ............M..NICKW 0000C0BC 41 48 4C 42 45 52 47 43 F2 F5 A9 05 02 00 00 20 00 21 00 03 AHLBERGC....... .!.. 0000C0D0 00 00 00 00 05 1E 9F 00 00 01 4D 01 24 45 44 43 48 41 53 45 ..........M.$EDCHASE 0000C0E4 43 F2 F5 A9 05 08 04 00 28 00 27 00 04 00 00 00 00 05 1E 9F C.......(.'......... 0000C0F8 00 00 01 4D 01 2E 4A 45 4E 4E 49 46 45 52 44 41 56 49 53 43 ...M..JENNIFERDAVISC 0000C10C F2 F5 A9 0C 06 00 00 30 00 2C 00 05 00 00 00 00 05 1E 9F 00 .......0.,.......... 0000C120 00 01 4D 01 38 4A 4F 48 4E 4E 59 4C 4F 4C 4C 4F 42 52 49 47 ..M.8JOHNNYLOLLOBRIG 0000C134 49 44 41 43 F2 F5 A9 09 05 00 00 38 00 28 00 06 00 00 00 00 IDAC.......8.(...... 0000C148 05 1E 9F 00 00 01 41 41 41 41 41 41 41 41 41 41 41 41 41 41 ......AAAAAAAAAAAAAA 0000C15C 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA 0000C170 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA 0000C184 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA 0000C198 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA 0000C1AC 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAAAAA 0000C1C0 41 41 41 41 41 41 41 41 41 41 41 41 41 41 4E 4B 43 F2 F5 A9 AAAAAAAAAAAAAANKC... 0000C1D4 05 09 00 00 58 00 28 00 0A 00 00 00 00 05 1E 9F 00 00 01 4D ....X.(............M 0000C1E8 01 6A 43 48 52 49 53 54 49 41 4E 47 41 42 4C 45 43 F2 F5 A9 .jCHRISTIANGABLEC... 0000C1FC 04 04 00 00 60 00 22 00 0B 00 00 00 00 05 1E 9F 00 00 01 4D ....`."............M
损坏的 InnoDB 表使MySQL崩溃
MySQL如果读取损坏的数据页,mysql服务会崩溃:
mysql> SELECT COUNT(*) FROM sakila.actor +----------+ | COUNT(*) | +----------+ | 200 | +----------+ ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
在崩溃之前,MySQL将究竟哪里出了问题写入错误日志,并转储错误页:
Version: '5.6.19-67.0' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 67.0, Revision 618 InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 4. InnoDB: You may have to recover from a backup. 2014-07-14 20:18:44 7f060bfff700 InnoDB: Page dump in ascii and hex (16384 bytes): len 16384; hex 1bce9a5000000004ffffffffffffffff0000000026c3095945bf00000000000000000
恢复 InnoDB 表损坏
当你看到在InnoDB表空间中有损坏,要做的第一件事就是以innodb_force_recovery选项启动MySQL。尝试从一到六的所有值是有意义的。
我们假定即使innodb_force_recovery=6,MySQL也不启动,或者它启动了,但任何SELECT都使它崩溃。
恢复计划如下:
- 将损坏的 InnoDB 表空间拆分成页;以类型和index_id进行排序
- 从表的PRIMARY索引中获取记录
- DROP 损坏的表并创建一个新的
- 将记录载入回MySQL
我们需要分析两个表空间:ibdata1中和actor.ibd(因为选项innodb_file_per_table= ON)。 InnoDB字典存储在ibdata1中,我们需要它才能知道表sakila.actor的PRIMARY索引的index_id。
解析Mysql的系统表空间ibdata1
root@test:~/recovery/undrop-for-innodb# ./stream_parser -f /var/lib/mysql/ibdata1 Opening file: /var/lib/mysql/ibdata1 File information: ID of device containing file: 64768 inode number: 8028 protection: 100660 (regular file) number of hard links: 1 user ID of owner: 106 group ID of owner: 114 device ID (if special file): 0 blocksize for filesystem I/O: 4096 number of blocks allocated: 36864 time of last access: 1406832698 Thu Jul 31 14:51:38 2014 time of last modification: 1406833058 Thu Jul 31 14:57:38 2014 time of last status change: 1406833058 Thu Jul 31 14:57:38 2014 total size, in bytes: 18874368 (18.000 MiB) Size to process: 18874368 (18.000 MiB) All workers finished in 0 sec
然后是actor.ibd
root@test:~/recovery/undrop-for-innodb# ./stream_parser -f /var/lib/mysql/sakila/actor.ibd Opening file: /var/lib/mysql/sakila/actor.ibd File information: ID of device containing file: 64768 inode number: 8037 protection: 100660 (regular file) number of hard links: 1 user ID of owner: 106 group ID of owner: 114 device ID (if special file): 0 blocksize for filesystem I/O: 4096 number of blocks allocated: 224 time of last access: 1406832349 Thu Jul 31 14:45:49 2014 time of last modification: 1406832300 Thu Jul 31 14:45:00 2014 time of last status change: 1406832300 Thu Jul 31 14:45:00 2014 total size, in bytes: 114688 (112.000 kiB) Size to process: 114688 (112.000 kiB) All workers finished in 0 sec root@test:~/recovery/undrop-for-innodb#
恢复 InnoDB 字典
我们需要知道表sakila.actor的PRIMARY索引的index_id。参考关于InnoDB字典的更多信息。现在,我们只要获取sakila.actor的index_id:
root@test:~/recovery/undrop-for-innodb# ./c_parser -4f pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page -t dictionary/SYS_TABLES.sql |grep actor 000000000504 85000001320110 SYS_TABLES "sakila/actor" 13 4 1 0 0 "" 1 00000000050D 8E0000013B0110 SYS_TABLES "sakila/film\_actor" 20 3 1 0 0 "" 8 ... root@test:~/recovery/undrop-for-innodb# ./c_parser -4f pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page -t dictionary/SYS_INDEXES.sql |grep 13 000000000300 810000012D01D3 SYS_INDEXES 11 13 "REF\_IND" 1 0 0 304 000000000504 85000001320178 SYS_INDEXES 13 15 "PRIMARY" 1 3 1 3 000000000504 850000013201A6 SYS_INDEXES 13 16 "idx\_actor\_last\_name" 1 0 1 4 000000000505 860000013301CE SYS_INDEXES 14 17 "PRIMARY" 1 3 2 3 ...
因此,sakila.actor表的PRIMARY索引的index_id是15,转储数据中的第五列。
从表的PRIMARY索引恢复记录
c_parser 读取 InnoDB 页,将其与给定表结构进行匹配,并以制表符分隔值格式转储记录。
与InnoDB相反,当c_parser遇到损坏区域,它会跳过并继续读取页。
我们从 index_id 15读取记录,根据字典,它就是PRIMARY 索引。
root@test:~/recovery/undrop-for-innodb# ./c_parser -6f pages-actor.ibd/FIL_PAGE_INDEX/0000000000000015.page -t sakila/actor.sql > dumps/default/actor 2> dumps/default/actor_load.sql root@test:~/recovery/undrop-for-innodb# cat dumps/default/actor -- Page id: 3, Format: COMPACT, Records list: Invalid, Expected records: (0 200) 72656D756D07 08000010002900 actor 30064 "\0\0\0\0" "" "1972-09-20 23:07:44" 1050454E454C 4F50454755494E actor 19713 "ESSC" "" "2100-08-09 07:52:36" 00000000051E 9F0000014D011A actor 2 "NICK" "WAHLBERG" "2006-02-15 04:34:33" 00000000051E 9F0000014D0124 actor 3 "ED" "CHASE" "2006-02-15 04:34:33" 00000000051E 9F0000014D012E actor 4 "JENNIFER" "DAVIS" "2006-02-15 04:34:33" 00000000051E 9F0000014D0138 actor 5 "JOHNNY" "LOLLOBRIGIDA" "2006-02-15 04:34:33" 00000000051E 9F000001414141 actor 6 "AAAAA" "AAAAAAAAA" "2004-09-10 01:53:05" 00000000051E 9F0000014D016A actor 10 "CHRISTIAN" "GABLE" "2006-02-15 04:34:33" ...
我们找出了一些有效的记录,但肯定也有一些“垃圾”。注意在Nick Wahlberg之前恢复的记录。绝对应该有Penelope Guiness的,因为我们还没有在actor.ibd文件中重写该数据。
过滤器
我们可以通过对某些字段的可能值应用过滤器来提高恢复质量。在原始表中有200个记录,但前两个“垃圾”记录有一些奇怪的标识符(30064和19713)。我们知道actor标识符应在[1..300]的范围内。因此,我们要告诉解析器相匹配的条件。为此,我们在定义的actor表的actor.sql文件的注释中添加一个提示。该注释应当是特殊的格式事解析器能识别它们。 actor.sql部分文件的列表(注意注释后有一个逗号!):
CREATE TABLE `actor` ( `actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT /*!FILTER int_min_val: 1 int_max_val: 300 */, `first_name` varchar(45) NOT NULL, `last_name` varchar(45) NOT NULL, `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`actor_id`), KEY `idx_actor_last_name` (`last_name`) ) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=utf8;
应用过滤器后,识别出的记录看上去好多了:
root@test:~/recovery/undrop-for-innodb# ./c_parser -6f pages-actor.ibd/FIL_PAGE_INDEX/0000000000000015.page -t sakila/actor.sql > dumps/default/actor 2> dumps/default/actor_load.sql root@test:~/recovery/undrop-for-innodb# head -10 dumps/default/actor -- Page id: 3, Format: COMPACT, Records list: Invalid, Expected records: (0 200) 00000000051E 9F0000014D0110 actor 1 "PENELOPE" "GUINESS" "2006-02-15 04:34:33" 00000000051E 9F0000014D011A actor 2 "NICK" "WAHLBERG" "2006-02-15 04:34:33" 00000000051E 9F0000014D0124 actor 3 "ED" "CHASE" "2006-02-15 04:34:33" 00000000051E 9F0000014D012E actor 4 "JENNIFER" "DAVIS" "2006-02-15 04:34:33" 00000000051E 9F0000014D0138 actor 5 "JOHNNY" "LOLLOBRIGIDA" "2006-02-15 04:34:33" 00000000051E 9F000001414141 actor 6 "AAAAA" "AAAAAAAAA" "2004-09-10 01:53:05" 00000000051E 9F0000014D016A actor 10 "CHRISTIAN" "GABLE" "2006-02-15 04:34:33" 00000000051E 9F0000014D0174 actor 11 "ZERO" "CAGE" "2006-02-15 04:34:33" 00000000051E 9F0000014D017E actor 12 "KARL" "BERRY" "2006-02-15 04:34:33"
你看,Penelope Guiness 的记录已经有啦。唯一的问题是 – 无效记录6-“AAAAA”-“AAAAAAAAA”。这是因为记录的actor_id 似乎为6,这与预期的相符。理想情况下,转储必须没有垃圾记录,所以你可能要尝试添加更多其他键的过滤器。
或者,我们可以稍后手动删除数据库中这个记录。
DROP 损坏的表并创建一个新表
一旦有了所有表的转储,我们就需要创建MySQL的新实例。
如果这是单一的表损坏,尝试innodb_force_recovery=6来删除表是有用的。
如果MySQL连启动都不行,尝试将损坏的actor.ibd移到其他地方。在DROP TABLEactor后的恢复模式中,MySQL会从字典中删除记录。如果actor.frm仍存在就删除它。
目的是清理并运行MySQL,准备好导入表转储。
一旦MySQL 准备创建一个空表actor时:
mysql> CREATE TABLE `actor` ( -> `actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT, -> `first_name` varchar(45) NOT NULL, -> `last_name` varchar(45) NOT NULL, -> `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, -> PRIMARY KEY (`actor_id`), -> KEY `idx_actor_last_name` (`last_name`) -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ; Query OK, 0 rows affected (0.01 sec) mysql>
将记录载入MySQL
接下来我们将从恢复的转储加载信息:
root@test:~/recovery/undrop-for-innodb# mysql --local-infile -uroot -p$mypass Welcome to the MySQL monitor. Commands end with ; or \g. ... mysql> use sakila; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> source dumps/default/actor_load.sql Query OK, 0 rows affected (0.00 sec) Query OK, 199 rows affected, 1 warning (0.00 sec) Records: 198 Deleted: 1 Skipped: 0 Warnings: 1
最后,我们要知道我们由于损坏丢失了多少数据。
c_parser 提供预期和实际找回数据的计数
在每页的开头,它给出了预期记录的数字:
-- Page id: 3, Format: COMPACT, Records list: Invalid, Expected records: (0 200)
这表示预期有200 个记录,但记录列表被破坏(因此,记录列表:无效)。
在每页的最后,它给出实际找到记录的总计。
-- Page id: 3, Found records: 197, Lost records: YES, Leaf page: YES
Leave a Reply