å°å¤é¨æ°æ®å¯¼å
¥(import)æ°æ®åºæ¯å¨æ°æ®åºåºç¨ä¸ä¸ä¸ªå¾å¸¸è§çéæ±ãå
¶å®è¿å°±æ¯å¨æ°æ®ç管çåæä½ä¸çETL (Extract, transform, load)çL (Load)é¨åï¼ä¹å°±æ¯è¯´ï¼å°ç¹å®ç»æ(structure)æè
æ ¼å¼(format)çæ°æ®å¯¼å
¥æ个ç®çå°(æ¯å¦æ°æ®åºï¼è¿éæ们讨论MySQL)ã
ETL Process
æ¬æè¦è®¨è®ºçå
容ï¼æ¯å¦ä½æ¹ä¾¿å°å°å¤ç§æ ¼å¼(JSON, Text, XML, CSV)çæ°æ®å¯¼å
¥MySQLä¹ä¸ã
æ¬æ大纲ï¼
å°Textæ件ï¼å
æ¬CSVæ件ï¼å¯¼å
¥MySQL
å°XMLæ件导å
¥MySQL
å°JSONæ件导å
¥MySQL
使ç¨MySQL workbenchçTable Data Export and Import Wizardè¿è¡JSONæCSVæ件ç导å
¥å¯¼åº
1. å°Textæ件ï¼å
æ¬CSVæ件ï¼å¯¼å
¥MySQL
è¿éæ们ç讨论æ¯åºäºä¸ä¸ªåå®ï¼Text fileåCSV fileæ¯æçæ¯è¾è§èçæ ¼å¼ç(properly formatted)ï¼æ¯å¦è¯´æ¯è¡çæ¯ä¸ªæ°æ®å(field)ä¹é´æ¯ç±ä¸ä¸ªå
±åçåé符ï¼æ¯å¦tab: \tï¼åéçã
é£ä¹é¦å
ï¼ä½ éè¦æ ¹æ®ä½ çæ°æ®çæ ¼å¼ï¼æåªäºåï¼ï¼æ¥è®¾è®¡å¥½æ°æ®åºç对åºç表 ï¼çSchemaï¼ã
举个ä¾åï¼è¦å¤ççTextæ件æè
CSVæ件æ¯ä»¥\tä½ä¸ºåé符çï¼æ¯è¡æid, name, balanceè¿ä¹ä¸ä¸ªæ°æ®åï¼é£ä¹é¦å
æ们éè¦å¨æ°æ®åºä¸å建è¿ä¸ªè¡¨ï¼
CREATE TABLE sometable(id INT, name VARCHAR(255), balance DECIMAL(8,4));
å建æå以åå°±å¯ä»¥å¯¼å
¥äºãæä½æ¹å¼å¾ç®åï¼
LOAD DATA LOCAL INFILE 'ä½ çæ件路å¾ï¼å¦~/file.csvï¼' INTO TABLE sometable FIELDS TERMINATED BY '\t' [ENCLOSED BY '"'(å¯é)] LINES TERMINATED BY '\n' (id, name, balance)
è¿éè¦æ³¨æçæ¯ï¼æ们éè¦å¼å¯local-infileè¿ä¸ªMySQLçé
ç½®åæ°ï¼æè½å¤æå导å
¥ã究å
¶åå ï¼ä»MySQLçManualä¸å¯ä»¥çå°è¿ä¹ä¸æ®µè¯ï¼
LOCAL works only if your server and your client both have been configured to permit it. For example, if mysqld was started with --local-infile=0, LOCAL does not work. See Section 6.1.6, âSecurity Issues with LOAD DATA LOCALâ.
è¿æ¯MySQLåºäºå®å
¨èèçé»è®¤é
ç½®ãå æ¤ï¼æ们éè¦å¨é
ç½®æ件my.cnfä¸ï¼ä»¥Debianåè¡ççLinux, å¦Ubuntu为ä¾ï¼ å³æ¯å¨/etc/my.cnfä¸ï¼ï¼ç¡®ä¿ï¼
local-infile=1
æææ¯å¨å½ä»¤è¡å¯å¨MySQLæ¶å ä¸--local-infileè¿ä¸é¡¹ï¼
mysql --local-infile -uroot -pyourpwd yourdbname
æ¤å¤ï¼æ们ä¹å¯ä»¥ä½¿ç¨MySQLçä¸ä¸ªå®æ¹å¯¼å
¥ç¨åºmysqlimport ï¼è¿ä¸ªç¨åºæ¬è´¨ä¸å°±æ¯ä¸ºLOAD DATA FILEæä¾äºä¸ä¸ªå½ä»¤è¡çinterfaceï¼å¾å®¹æç解ï¼æ们è¿éå°±ä¸å详述ã
2. å°XMLæ件导å
¥MySQL
è¿ä»¶äºçå®ææ¹å¼ï¼ä¸æ们çXMLçå½¢å¼æçå¾å¤§çå
³ç³»ã
举个ä¾å说ï¼å½ä½ çXMLæ°æ®æ件æçå¾é常è§èçæ ¼å¼ï¼æ¯å¦ï¼
<?xml version="1.0"?>
<row>
<field name="id">1</field>
<field name="name">Free</field>
<field name="balance">2333.3333</field>
</row>
<row>
<field name="id">2</field>
<field name="name">Niki</field>
<field name="balance">1289.2333</field>
</row>
æè
<row column1="value1" column2="value2" .../>
æ们就å¯ä»¥å¾æ¹ä¾¿ä½¿ç¨LOAD XMLæ¥å¯¼å
¥ï¼è¿éå¯ä»¥åè§MySQLçå®æ¹æå--LOAD XML Syntaxã
ç¶èæ们å¯è½æå¦å¤ä¸äºéæ±ï¼æ¯å¦è¯´ï¼æ们å¯è½ä¼æ³è¦å°XMLæ件çåæ å°å°ä¸åååçå(TABLE COLUMN)ä¹ä¸ãè¿éè¦æ³¨æï¼MySQL v5.0.7以åï¼MySQLçStored Procedureä¸ä¸è½åè¿è¡LOAD XML INFILE æè
LOAD DATA INFILEãæ以转æ¢çç¨åº(procedure)çç¼åæ¹å¼ä¸å¨æ¤ä¹åææä¸åãè¿éï¼æ们éè¦ä½¿ç¨Load_File()åExtractValue()è¿ä¸¤ä¸ªå½æ°ã
以ä¸æ¯ä¸ä¸ªç¤ºä¾XMLæ件åç¨åºï¼
æ件ï¼
<?xml version="1.0"?>
<some_list>
<someone id="1" fname="Rob" lname="Gravelle"/>
<someone id="2" fname="Al" lname="Bundy"/>
<someone id="3" fname="Little" lname="Richard"/>
</some_list>
ç¨åºï¼
DELIMITER $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `import_some_xml`(path varchar(255), node varchar(255))
BEGIN
declare xml_content text;
declare v_row_index int unsigned default 0;
declare v_row_count int unsigned;
declare v_xpath_row varchar(255);
set xml_content = load_file(path);
-- calculate the number of row elements.
set v_row_count = extractValue(xml_content, concat('count(', node, ')'));
-- loop through all the row elements
while v_row_index < v_row_count do
set v_row_index = v_row_index + 1;
set v_xpath_row = concat(node, '[', v_row_index, ']/@*');
insert into applicants values (
extractValue(xml_content, concat(v_xpath_row, '[1]')),
extractValue(xml_content, concat(v_xpath_row, '[2]')),
extractValue(xml_content, concat(v_xpath_row, '[3]'))
);
end while;
END
å¨MySQLä¸ï¼ä½¿ç¨å®è¿è¡å¯¼å
¥ï¼
call import_some_xml('ä½ çXMLæ件路å¾', '/some_list/someone');
ç¨åºç¸å½çç´ç½ï¼åªè¦äºè§£ä¸ä¸MySQLçèæ¬ç¼åå³å¯ã
è¿éæä¸ä¸DELIMITER $$ãæ们ç¥éMySQLçå½ä»¤åé符é»è®¤ä¸ºåå·ï¼ç¶èèæ¬ä¸å¾æ¾ç¶æ¯æåå·çï¼ä½æ¯æ们并ä¸å¸æç«å³æ§è¡ï¼æ以æ们éè¦ä¸´æ¶æ´æ¹åé符ã
3. å°JSONæ件导å
¥MySQL
å¦ä½å°JSONæ件导å
¥MySQLä¸ï¼æ¯ä¸ä¸ªå¾æ趣çè¯é¢ãJSONæ¯ä¸ç§ç°å¨ç¸å½å¸¸ç¨çæ件ç»æï¼æ以ææ¡å®ç导å
¥å
·ææ¯è¾å¹¿æ³çæä¹ã
å¾å¤æ¶åï¼æ们å¤ççJSONæ°æ®æ¯ä»¥å¦ä¸å½¢å¼åºç°çï¼
{"name":"Julia","gender":"female"}
{"name":"Alice","gender":"female"}
{"name":"Bob","gender":"male"}
{"name":"Julian","gender":"male"}
è并ä¸æ¯è§æ´ç[{},{},{},{}]ï¼ä¸äºNoSQLæ°æ®åºçExportï¼ã
è¿æ ·çå½¢å¿å¯¹äºè½½å
¥æä¸ä¸ªå¥½å¤ï¼å 为æ¯ä¸è¡æ¯ä¸ä¸ªJSON Objectï¼æ以æ们便å¯ä»¥æè¡å¤çæ¤æ件ï¼èä¸éè¦å 为JSONçä¸¥æ ¼ç»æå°æ´ä¸ªæ件ï¼æ¯å¦ä¸ä¸ªè®¸å¤Gç.jsonæ件ï¼å
¨é¨è½½å
¥ã
æ¹å¼ä¸ 使ç¨common-schema
common-schemaæ¯ä¸ä¸ªåºç¨å¾å¹¿æ³çMySQLçæ¡æ¶ï¼å®æçå¾ä¸°å¯çåè½å详ç»çææ¡£ãæ们å¯ä»¥ä½¿ç¨å®çJSON解æçåè½ãï¼å®è¿å
·æJSON转æ¢æXMLççæ¹ä¾¿çåè½ï¼
å
·ä½è¯´æ¥ï¼å°common-schema导å
¥ä¹åï¼ä½¿ç¨å®çextract_json_valueå½æ°å³å¯ãæºç ä¸ï¼
create function extract_json_value(
json_text text charset utf8,
xpath text charset utf8
) returns text charset utf8
该å½æ°æ¥å两个åæ°ï¼ä¸ä¸ªæ¯json_textï¼è¡¨ç¤ºjsonæ件çå
容ï¼å¦ä¸ä¸ªæ¯xpathï¼è¡¨ç¤ºæ°æ®çç»æï¼è¿éå¯ä»¥ç±»æ¯XMLæ件çå¤çï¼ãå¾å¤è¯»è
åºè¯¥ç¥éï¼XPathæ¯ç¨æ¥å¯¹XMLä¸çå
ç´ è¿è¡å®ä½çï¼è¿éä¹å¯ä»¥ä½ä¸æ ·çç解ã
以æ¬æ®µå¼å§çå è¡JSON为ä¾ï¼è¿écommon-schemaç使ç¨å¦ä¸ä¾ï¼
select common_schema.extract_json_value(f.event_data,'/name') as name, common_schema.extract_json_value(f.event_data,'/gender') as gender, sum(f.event_count) as event_count from json_event_fact f group by name, gender;
å
³äºevent_dataï¼æ们éè¦å
ç解LOAD DATA INFILEæ¯ä¸ä¸ªeventï¼ä¸åçevent type对åºä¸åçevent dataãè¿é¨åç¥è¯å¯ä»¥åçEvent Data for Specific Event Types
å¦ææå
´è¶£ï¼å¯ä»¥åçå
¶æºç ãåçä¸ä¸ªåå°å¹¿æ³ä½¿ç¨ç项ç®çæºç ï¼å¯¹äºèªèº«æé¿æ¯å¾æççã
å½ç¶äºï¼æ们ä¹å¯ä»¥åä¹åå¤çXMLæ件导å
¥ä¸æ ·ï¼èªå·±ç¼åç¨åºãè¿é便ä¸åç»åºå®ä¾ç¨åºï¼æå
´è¶£ç读è
å¯ä»¥èªè¡ç¼åæè
è·ç¬è
交æµã
æ¹å¼äº 使ç¨mysqljsonimport
è¿æ¯Anders Karlssonçä¸ä¸ªå®æ度å¾é«çä½åãè¿ä¸ä»½ç¨åºç±Cåæãå®ä¾èµäºä¸ä¸ªJSON Parserï¼Janssonãä»ä»¬é½æçæ¯è¾å¥½çç»´æ¤åææ¡£ï¼æ以使ç¨ä¸ä½éªå¾å¥½ã
mysqljsonimportçä¸è½½å¨SourceForgeä¸ãå
·ä½ä½¿ç¨åç
§å
¶ææ¡£å³å¯ã
为äºæ¹ä¾¿ä¸çææºç å®è£
çæåï¼ç¬è
å¨è¿éæä¸ä¸å®è£
æµç¨å注æäºé¡¹ã
å®è£
å½ä»¤é¡ºåºå¦ä¸ï¼
$ wget
http://sourceforge.net/projects/mysqljson/files/myjsonimport_1.6/mysqljsonimport-1.6.tar.gz $ tar xvfz mysqljsonimport-1.6.tar.gz
$ cd mysqljsonimport-1.6
$ ./configure â-with-mysql=/xxx/mysql
$ make
$ make check
$ sudo make install
--with-mysqlè¿ä¸æ¥ä¸æ¯å¿
è¦çï¼åªè¦ä½ å®è£
çmysqlçè·¯å¾æ¯ç³»ç»çé»è®¤è·¯å¾ãå¾å
³é®çï¼èä¸å¾å®¹æ被ä¸çæçæå忽ç¥çæ¯ï¼è¿ä¸ä¸ªCç¨åºè¦æåç¼è¯åè¿è¡ï¼æ¯éè¦MySQLçC APIçï¼æ以éè¦å®è£
çä¾èµï¼é¤äºjanssonï¼è¿ælibmysqlclient-devã
janssonçå®è£
å°±æ¯ç®åçæºç å®è£
ï¼libmysqlclient-devåå¯ä»¥ä½¿ç¨å
管çå·¥å
·(æ¯å¦ubuntuä¸ä½¿ç¨apt-getå³å¯ï¼ç¼è¯åå®è£
åï¼å»ºè®®å
sudo apt-get update以é¿å
ä¸å¿
è¦ç麻ç¦)ã
导å
¥å½ä»¤ï¼
$ ./mysqljsonimport â-database test â-table tablename jsonfilename
è¿æä¸ä¸ªparserï¼ä½è
æ¯Kazuhoï¼æå
´è¶£ç读è
å¯ä»¥åçä¸ä¸ï¼ä»çç¸å
³åææ¯mysql_json - a MySQL UDF for parsing JSON ï¼github项ç®æ¯mysql_jsonã
4. 使ç¨MySQL workbench
Workbenchè¿ä¸ªå·¥å
·å¯¹äºè®¸å¤ä¸çæSQLè¯è¨æè
å½ä»¤è¡çæåè¿æ¯å¾æ¹ä¾¿åå好çãå©ç¨å®ï¼å¯ä»¥æ¹ä¾¿å°å¯¼å
¥å导åºCSVåJSONæ件ã
å
·ä½æä½å¾ä¾åè§MySQLå®æ¹æåå³å¯:Table Data Export and Import Wizardï¼è¿éä¸åèµè¿°ã
æï¼freenikï¼ç®ä¹¦ä½è
ï¼
åæé¾æ¥ï¼
http://www.jianshu.com/p/d330edb61fe2