やってみる

アウトプットすべく己を導くためのブログ。その試行錯誤すらたれ流す。

はてなAPIで取得したXMLからブログ情報を取得しDBに保存する

いよいよDBへ保存してみる。

開発環境

はてなフォトライフAtomAPI - Hatena Developer Center

成果物

GitHub

準備

1. DBを作成しておく

http://ytyaru.hatenablog.com/entry/2017/06/30 http://ytyaru.hatenablog.com/entry/2017/07/01 http://ytyaru.hatenablog.com/entry/2017/07/02 Server Error

2. XMLを取得しファイル保存しておく

http://ytyaru.hatenablog.com/entry/2017/06/23

3. datasetをインストールする

概要

  • BeautifulSoupでXML文書からデータを抽出する
  • datasetでSQLite3ファイルにデータを挿入する

本質的には、単にXMLファイルからsqlite3ファイルに変換するだけ。

SQLでアクセスできるようになるし、タグなど無駄な情報を削れる。単一ファイルなので保存もしやすい。

datasetをインストールする

$ sudo pip3 install dataset
[sudo] password for mint: 
Downloading/unpacking dataset
  Downloading dataset-0.8.0-py2.py3-none-any.whl
Downloading/unpacking normality>=0.3.9 (from dataset)
  Downloading normality-0.4.0.tar.gz
  Running setup.py (path:/tmp/pip_build_root/normality/setup.py) egg_info for package normality
    
Downloading/unpacking PyYAML>=3.10 (from dataset)
  Downloading PyYAML-3.12.tar.gz (253kB): 253kB downloaded
  Running setup.py (path:/tmp/pip_build_root/PyYAML/setup.py) egg_info for package PyYAML
    
Downloading/unpacking six>=1.7.3 (from dataset)
  Downloading six-1.10.0-py2.py3-none-any.whl
Downloading/unpacking sqlalchemy>=0.9.1 (from dataset)
  Downloading SQLAlchemy-1.1.6.tar.gz (5.2MB): 5.2MB downloaded
  Running setup.py (path:/tmp/pip_build_root/sqlalchemy/setup.py) egg_info for package sqlalchemy
    
    warning: no files found matching '*.jpg' under directory 'doc'
    warning: no files found matching '*.mako' under directory 'doc'
    warning: no files found matching 'distribute_setup.py'
    warning: no files found matching 'sa2to3.py'
    warning: no files found matching 'ez_setup.py'
    no previously-included directories found matching 'doc/build/output'
Downloading/unpacking alembic>=0.6.2 (from dataset)
  Downloading alembic-0.9.1.tar.gz (999kB): 999kB downloaded
  Running setup.py (path:/tmp/pip_build_root/alembic/setup.py) egg_info for package alembic
    
    warning: no files found matching '*.jpg' under directory 'docs'
    warning: no files found matching '*.sty' under directory 'docs'
    warning: no files found matching '*.dat' under directory 'tests'
    no previously-included directories found matching 'docs/build/output'
Requirement already satisfied (use --upgrade to upgrade): chardet in /usr/lib/python3/dist-packages (from normality>=0.3.9->dataset)
Downloading/unpacking Mako (from alembic>=0.6.2->dataset)
  Downloading Mako-1.0.6.tar.gz (575kB): 575kB downloaded
  Running setup.py (path:/tmp/pip_build_root/Mako/setup.py) egg_info for package Mako
    
    warning: no files found matching '*.xml' under directory 'examples'
    warning: no files found matching '*.mako' under directory 'examples'
    warning: no files found matching 'distribute_setup.py'
    warning: no files found matching 'ez_setup.py'
    no previously-included directories found matching 'doc/build/output'
Downloading/unpacking python-editor>=0.3 (from alembic>=0.6.2->dataset)
  Downloading python-editor-1.0.3.tar.gz
  Running setup.py (path:/tmp/pip_build_root/python-editor/setup.py) egg_info for package python-editor
    
Downloading/unpacking MarkupSafe>=0.9.2 (from Mako->alembic>=0.6.2->dataset)
  Downloading MarkupSafe-0.23.tar.gz
  Running setup.py (path:/tmp/pip_build_root/MarkupSafe/setup.py) egg_info for package MarkupSafe
    
Installing collected packages: dataset, normality, PyYAML, six, sqlalchemy, alembic, Mako, python-editor, MarkupSafe
  Running setup.py install for normality
    
  Running setup.py install for PyYAML
    checking if libyaml is compilable
    i686-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -c build/temp.linux-i686-3.4/check_libyaml.c -o build/temp.linux-i686-3.4/check_libyaml.o
    build/temp.linux-i686-3.4/check_libyaml.c:2:18: fatal error: yaml.h: そのようなファイルやディレクトリはありません
     #include <yaml.h>
                      ^
    compilation terminated.
    
    libyaml is not found or a compiler error: forcing --without-libyaml
    (if libyaml is installed correctly, you may need to
     specify the option --include-dirs or uncomment and
     modify the parameter include_dirs in setup.cfg)
    
  Found existing installation: six 1.5.2
    Not uninstalling six at /usr/lib/python3/dist-packages, owned by OS
  Running setup.py install for sqlalchemy
    building 'sqlalchemy.cprocessors' extension
    i686-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -c lib/sqlalchemy/cextension/processors.c -o build/temp.linux-i686-3.4/lib/sqlalchemy/cextension/processors.o
    i686-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-i686-3.4/lib/sqlalchemy/cextension/processors.o -o build/lib.linux-i686-3.4/sqlalchemy/cprocessors.cpython-34m.so
    building 'sqlalchemy.cresultproxy' extension
    i686-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -c lib/sqlalchemy/cextension/resultproxy.c -o build/temp.linux-i686-3.4/lib/sqlalchemy/cextension/resultproxy.o
    i686-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-i686-3.4/lib/sqlalchemy/cextension/resultproxy.o -o build/lib.linux-i686-3.4/sqlalchemy/cresultproxy.cpython-34m.so
    building 'sqlalchemy.cutils' extension
    i686-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -c lib/sqlalchemy/cextension/utils.c -o build/temp.linux-i686-3.4/lib/sqlalchemy/cextension/utils.o
    i686-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-i686-3.4/lib/sqlalchemy/cextension/utils.o -o build/lib.linux-i686-3.4/sqlalchemy/cutils.cpython-34m.so
    
    warning: no files found matching '*.jpg' under directory 'doc'
    warning: no files found matching '*.mako' under directory 'doc'
    warning: no files found matching 'distribute_setup.py'
    warning: no files found matching 'sa2to3.py'
    warning: no files found matching 'ez_setup.py'
    no previously-included directories found matching 'doc/build/output'
  Running setup.py install for alembic
    
    warning: no files found matching '*.jpg' under directory 'docs'
    warning: no files found matching '*.sty' under directory 'docs'
    warning: no files found matching '*.dat' under directory 'tests'
    no previously-included directories found matching 'docs/build/output'
    Installing alembic script to /usr/local/bin
  Running setup.py install for Mako
    
    warning: no files found matching '*.xml' under directory 'examples'
    warning: no files found matching '*.mako' under directory 'examples'
    warning: no files found matching 'distribute_setup.py'
    warning: no files found matching 'ez_setup.py'
    no previously-included directories found matching 'doc/build/output'
    Installing mako-render script to /usr/local/bin
  Running setup.py install for python-editor
    
  Running setup.py install for MarkupSafe
    
    building 'markupsafe._speedups' extension
    i686-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -c markupsafe/_speedups.c -o build/temp.linux-i686-3.4/markupsafe/_speedups.o
    i686-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-i686-3.4/markupsafe/_speedups.o -o build/lib.linux-i686-3.4/markupsafe/_speedups.cpython-34m.so
Successfully installed dataset normality PyYAML six sqlalchemy alembic Mako python-editor MarkupSafe
Cleaning up...

課題

  • ページネーションして全件取得したい

所感

課題の件も大事だが、ほかのDBへの保存も大事。とくに本題であるブログ本文の保存をやりたい。