使用开源组件mysql2pgsql进行mysql到GP的数据迁移:

以迁移表potluck为例

此处测试版本为mysql 5.7,mysql当前最新版本为8.0

A.全局数据迁移

mysql -u root -p 登录数据库

进入test1数据库 use test1,并找到要迁移的表信息:

MariaDB [test1]> DESCRIBE potluck;

±------------±------------±-----±----±--------±---------------+

| Field | Type | Null | Key | Default | Extra |

±------------±------------±-----±----±--------±---------------+

| id | int(11) | NO | PRI | NULL | auto_increment |

| name | varchar(20) | YES | | NULL | |

| food | varchar(30) | YES | | NULL | |

| confirmed | char(1) | YES | | NULL | |

| signup_date | date | YES | | NULL | |

±------------±------------±-----±----±--------±---------------+

MariaDB [test1]> select * from potluck;

±—±------±---------------±----------±------------+

| id | name | food | confirmed | signup_date |

±—±------±---------------±----------±------------+

| 1 | John | Casserole | Y | 2012-04-11 |

| 2 | Sandy | Key Lime Tarts | N | 2012-04-14 |

| 3 | Tom | BBQ | Y | 2012-04-18 |

| 4 | Tina | Salad | Y | 2012-04-10 |

±—±------±---------------±----------±------------+

4 rows in set (0.00 sec)

3.在mysql2pgsql的bin目录下找到并修改my.cfg文件并且配置响应mysql和greenplum的连接信息:

[gpadmin@vmcentos7 bin]$ cat my.cfg

–mysql源端的连接信息

[src.mysql]

host = “10.132.17.64”

port = “3306”

user = “root”

password = “changeme”

db = “test1”

encodingdir = “share”

encoding = “utf8”

binlogfile = “mysql-bin.000001”

binlogfile_offset = “4”

serverid = 1

[src.pgsql]

connect_string = “host=10.132.17.65 dbname=gpnrt port=5432 user=gpadmin password=gpadmin”

[local.pgsql]

connect_string = “dbname=test port=5888 user=test password=pgsql”

–Greenplum目标端的连接信息

[desc.pgsql]

connect_string = “host=10.132.17.65 dbname=gpnrt port=5432 user=gpadmin password=gpadmin”

target_schema = “public”

ignore_copy_error_count_each_table = “0”

[binlogloader]

loader_table_list = “loader_table_list.txt”

load_batch = 10

load_batch_gap = 10

4.在加载文件中添加需要迁移的数据表名称:

[gpadmin@vmcentos7 bin]$ cat loader_table_list.txt

potluck

5.首先生成需要加载表的DDL语句并加以修改

[gpadmin@vmcentos7 bin]$ ./mysql2pgsql -l loader_table_list.txt -d -n -j 1 -s public

ignore copy error count 0 each table

– Adding table: potluck

– Reference commands to create target tables (Please choose a distribution key and replace it with for each table):

CREATE TABLE public.potluck (id int4, name text, food text, confirmed text, signup_date timestamp) with (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, COMPRESSLEVEL=1, BLOCKSIZE=1048576, OIDS=false) DISTRIBUTED BY ();

– Number of tables: 1

添加分布键语句并且在greenplum里执行DDL建表

CREATE TABLE public.potluck (id int4, name text, food text, confirmed text, signup_date timestamp) with (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, COMPRESSLEVEL=1, BLOCKSIZE=1048576, OIDS=false) DISTRIBUTED BY (id);

6.执行数据加载程序

[gpadmin@vmcentos7 bin]$ ./mysql2pgsql -l loader_table_list.txt -j 1 -s public

ignore copy error count 0 each table

– Adding table: potluck

Starting data sync

Query to get source data for target table potluck: select * from test1.potluck

– Reference DDL to create the target table:

CREATE TABLE public.potluck (id int4, name text, food text, confirmed text, signup_date timestamp) with (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, COMPRESSLEVEL=1, BLOCKSIZE=1048576, OIDS=false) DISTRIBUTED BY () PARTITION BY RANGE () (START (date ‘’) INCLUSIVE END (date ‘’) EXCLUSIVE EVERY (INTERVAL ‘<1 month>’ ));

thread 0 migrate task 0 table test1.potluck 4 rows complete, time cost 71.990 ms

Number of rows migrated: 4 (number of source tables’ rows: 4)

Data sync time cost 184.033 ms

7.数据加载完成,数据已经进入greenplum

gpadmin=# select * from potluck;

id | name | food | confirmed | signup_date

----±------±---------------±----------±--------------------

2 | Sandy | Key Lime Tarts | N | 2012-04-14 00:00:00

4 | Tina | Salad | Y | 2012-04-10 00:00:00

1 | John | Casserole | Y | 2012-04-11 00:00:00

3 | Tom | BBQ | Y | 2012-04-18 00:00:00

(4 rows)

B.增量数据迁移

1.增量数据迁移,需要使用到binlog_loader和binlog_miner两个免费组件。

首先需要在mysql里设置一下参数使用row格式的binlog以备解析:

binlog_format = ROW

增量同步的实现方式是:

在客户端主机(也可以部署在其他主机)上启动一个临时 PG 数据库,用于临时存放从 MySQL 拉去到的 binlog 数据。

binlog_miner 从源 MySQL 的一个 binlog 文件开始,拉取和解析 binlog 并存放到临时 PG 中。

binlog_loader 从临时 PG 中读取增量数据,并做适当的处理,最终批量写入到目标 PostgreSQL 或 Greenplum 中去。

这里特别需要配置的

my.cnf

[src.mysql]

host = “192.168.1.1”

port = “3301”

user = “test”

password = “123456”

db = “test”

encodingdir = “share”

encoding = “utf8”

binlogfile = “mysql-bin.000001” --binlog的名称

binlogfile_offset = “4” --binlog中的记录偏移指针,一般从4开始

临时数据库的信息:

my.cnf

[local.pgsql]

connect_string = “dbname=test port=5432 user=test password=pgsql”

要使用实时同步功能,目标表必须有主键,在greenplum中的建表语句如下:

CREATE TABLE public.potluck (id int4 PRIMARY KEY, name text, food text, confirmed text, signup_date timestamp) DISTRIBUTED BY (id);

启动 binlog 拉取进程

nohup ./binlog_miner 1>minner.log 2>&1 &

启动 binlog 写入进程

nohup ./binlog_loader 1>loader.log 2>&1 &

这样所有在mysql中对指定表的各类数据变化都会被同步到目标DB中,包括对应行的 insert update delete等。

5.这两个同步进程所有进行过的同步操作都会在Postgres中间数据中进行记录,及时拉取和写入进程由于异常出现了重启等问题,该程序仍然可以断点续传持续对mysql和GPDB之间的表进行同步操作。

C. 在GPDB中通过外部表映射直接访问mysql中的数据表

通过GPDB提供的外部表功能和gpfdist的transform功能来直接将mysql中的表映射为GPDB的外表具体步骤如下:

编写脚本通过mysql命令输出对应mysql中表的内容:

[gpadmin@vmcentos7 mysql]$ cat mysql2gp.sh

#/bin/bash

PARA=1TBNM=1

TBNM=1TBNM={PARA##*/}

cd (cd"(cd "(cd"(dirname "0")";pwd)rm−f/mysql/0")"; pwd)

rm -f /mysql/0")";pwd)rm−f/mysql/{TBNM}.csv

. ~/.bashrc

echo “select * from TBNMintooutfile′/mysql/{TBNM} into outfile '/mysql/TBNMintooutfile′/mysql/{TBNM}.csv’ fields terminated by ‘,’ lines terminated by ‘\r\n’” > ${TBNM}.sql

mysql -uroot -pchangeme test1 < TBNM.sqlcat/mysql/{TBNM}.sql

cat /mysql/TBNM.sqlcat/mysql/{TBNM}.csv

rm -f /mysql/${TBNM}.csv

编写gpfdist使用的tranform文件

[gpadmin@vmcentos7 mysql]$ cat mysql.yaml

VERTION: 1.0

TRANSFORMATIONS:

mysql:

TYPE: input

CONTENT: data

COMMAND: /bin/bash /mysql/mysql2gp.sh %filename%

4.在GPDB中创建响应外部表:

create external table ext_mysql_potluck(

like potluck

)location(

‘gpfdist://192.168.234.130:8080/potluck#transform=mysql’

)FORMAT ‘CSV’ (DELIMITER ‘,’) ENCODING ‘GBK’;

5.启动gpdfist

gpfdist -c mysql.yaml -p 8080 -l gpfdist.log &

6.直接通过外表访问mysql中的表。

gpadmin_1=# select * from ext_mysql_potluck;

id | name | food | confirmed | signup_date

----±------±---------------±----------±--------------------

1 | John | Casserole | Y | 2012-04-11 00:00:00

2 | Sandy | Key Lime Tarts | N | 2012-04-14 00:00:00

3 | Tom | BBQ | Y | 2012-04-18 00:00:00

4 | Tina | Salad | Y | 2012-04-10 00:00:00

5 | ryan | Casserole | N | 2012-04-11 00:00:00

6 | Alice | Casserole | N | 2012-04-11 00:00:00

7 | ryan | Casserole | N | 2012-04-11 00:00:00

8 | ryan | Casserole | N | 2012-04-11 00:00:00

9 | ryan | Casserole | N | 2012-04-11 00:00:00

10 | Alice | Casserole | Y | 2012-04-11 00:00:00

11 | ryan | Casserole | N | 2012-04-11 00:00:00

(11 rows)

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐