とうちゃのPython学習日記

　　　　　２００８年１１月１４日　開始

[1] [2] [3]

[PR]

[PR]上記の広告は3ヶ月以上新規記事投稿のないブログに表示されています。新しい記事を書く事で広告が消えます。

【2025/07/09 22:20 】

CATEGORY[]

CSV File API

http://www.python.org/dev/peps/pep-0305/

link< python目次 >

File API

PEP: 305

Title: CSV File API

Version: 68056

Last-Modified: 2008-12-30 04:48:55 +0100 (Tue, 30 Dec 2008)

Author: Kevin Altis <altis at semi-retired.com>, Dave Cole <djc at object-craft.com.au>, Andrew McNamara <andrewm at object-craft.com.au>, Skip Montanaro <skip at pobox.com>, Cliff Wells <LogiplexSoftware at earthlink.net>

Discussions-To: <csv at mail.mojam.com>

Status: Final

Type: Standards Track

Content-Type: text/x-rst

Created: 26-Jan-2003

Post-History: 31-Jan-2003, 13-Feb-2003

Contents

Abstract

The Comma Separated Values (CSV) file format is the most common import and export format for spreadsheets and databases. Although many CSV files are simple to parse, the format is not formally defined by a stable specification and is subtle enough that parsing lines of a CSV file with something like line.split(",") is eventually bound to fail. This PEP defines an API for reading and writing CSV files. It is accompanied by a corresponding module which implements the API.

To Do (Notes for the Interested and Ambitious)

Better motivation for the choice of passing a file object to the constructors. See http://manatee.mojam.com/pipermail/csv/2003-January/000179.html
Unicode. ugh.

Application Domain

This PEP is about doing one thing well: parsing tabular data which may use a variety of field separators, quoting characters, quote escape mechanisms and line endings. The authors intend the proposed module to solve this one parsing problem efficiently. The authors do not intend to address any of these related topics:

data interpretation (is a field containing the string "10" supposed to be a string, a float or an int? is it a number in base 10, base 16 or base 2? is a number in quotes a number or a string?)
locale-specific data representation (should the number 1.23 be written as "1.23" or "1,23" or "1 23"?) -- this may eventually be addressed.
fixed width tabular data - can already be parsed reliably.

Rationale

Often, CSV files are formatted simply enough that you can get by reading them line-by-line and splitting on the commas which delimit the fields. This is especially true if all the data being read is numeric. This approach may work for awhile, then come back to bite you in the butt when somebody puts something unexpected in the data like a comma. As you dig into the problem you may eventually come to the conclusion that you can solve the problem using regular expressions. This will work for awhile, then break mysteriously one day. The problem grows, so you dig deeper and eventually realize that you need a purpose-built parser for the format.

CSV formats are not well-defined and different implementations have a number of subtle corner cases. It has been suggested that the "V" in the acronym stands for "Vague" instead of "Values". Different delimiters and quoting characters are just the start. Some programs generate whitespace after each delimiter which is not part of the following field. Others quote embedded quoting characters by doubling them, others by prefixing them with an escape character. The list of weird ways to do things can seem endless.

All this variability means it is difficult for programmers to reliably parse CSV files from many sources or generate CSV files designed to be fed to specific external programs without a thorough understanding of those sources and programs. This PEP and the software which accompany it attempt to make the process less fragile.

Existing Modules

This problem has been tackled before. At least three modules currently available in the Python community enable programmers to read and write CSV files:

Object Craft's CSV module [2]
Cliff Wells' Python-DSV module [3]
Laurence Tratt's ASV module [4]

Each has a different API, making it somewhat difficult for programmers to switch between them. More of a problem may be that they interpret some of the CSV corner cases differently, so even after surmounting the differences between the different module APIs, the programmer has to also deal with semantic differences between the packages.

Module Interface

This PEP supports three basic APIs, one to read and parse CSV files, one to write them, and one to identify different CSV dialects to the readers and writers.

Reading CSV Files

CSV readers are created with the reader factory function:

obj = reader(iterable [, dialect='excel']
             [optional keyword args])

A reader object is an iterator which takes an iterable object returning lines as the sole required parameter. If it supports a binary mode (file objects do), the iterable argument to the reader function must have been opened in binary mode. This gives the reader object full control over the interpretation of the file's contents. The optional dialect parameter is discussed below. The reader function also accepts several optional keyword arguments which define specific format settings for the parser (see the section "Formatting Parameters"). Readers are typically used as follows:

csvreader = csv.reader(file("some.csv"))
for row in csvreader:
    process(row)

Each row returned by a reader object is a list of strings or Unicode objects.

When both a dialect parameter and individual formatting parameters are passed to the constructor, first the dialect is queried for formatting parameters, then individual formatting parameters are examined.

Writing CSV Files

Creating writers is similar:

obj = writer(fileobj [, dialect='excel'],
             [optional keyword args])

A writer object is a wrapper around a file-like object opened for writing in binary mode (if such a distinction is made). It accepts the same optional keyword parameters as the reader constructor.

Writers are typically used as follows:

csvwriter = csv.writer(file("some.csv", "w"))
for row in someiterable:
    csvwriter.writerow(row)

To generate a set of field names as the first row of the CSV file, the programmer must explicitly write it, e.g.:

csvwriter = csv.writer(file("some.csv", "w"), fieldnames=names)
csvwriter.write(names)
for row in someiterable:
    csvwriter.write(row)

or arrange for it to be the first row in the iterable being written.

Managing Different Dialects

Because CSV is a somewhat ill-defined format, there are plenty of ways one CSV file can differ from another, yet contain exactly the same data. Many tools which can import or export tabular data allow the user to indicate the field delimiter, quote character, line terminator, and other characteristics of the file. These can be fairly easily determined, but are still mildly annoying to figure out, and make for fairly long function calls when specified individually.

To try and minimize the difficulty of figuring out and specifying a bunch of formatting parameters, reader and writer objects support a dialect argument which is just a convenient handle on a group of these lower level parameters. When a dialect is given as a string it identifies one of the dialects known to the module via its registration functions, otherwise it must be an instance of the Dialect class as described below.

Dialects will generally be named after applications or organizations which define specific sets of format constraints. Two dialects are defined in the module as of this writing, "excel", which describes the default format constraints for CSV file export by Excel 97 and Excel 2000, and "excel-tab", which is the same as "excel" but specifies an ASCII TAB character as the field delimiter.

Dialects are implemented as attribute only classes to enable users to construct variant dialects by subclassing. The "excel" dialect is a subclass of Dialect and is defined as follows:

class Dialect:
    # placeholders
    delimiter = None
    quotechar = None
    escapechar = None
    doublequote = None
    skipinitialspace = None
    lineterminator = None
    quoting = None

class excel(Dialect):
    delimiter = ','
    quotechar = '"'
    doublequote = True
    skipinitialspace = False
    lineterminator = '\r\n'
    quoting = QUOTE_MINIMAL

The "excel-tab" dialect is defined as:

class exceltsv(excel):
    delimiter = '\t'

(For a description of the individual formatting parameters see the section "Formatting Parameters".)

To enable string references to specific dialects, the module defines several functions:

dialect = get_dialect(name)
names = list_dialects()
register_dialect(name, dialect)
unregister_dialect(name)

get_dialect() returns the dialect instance associated with the given name. list_dialects() returns a list of all registered dialect names. register_dialects() associates a string name with a dialect class. unregister_dialect() deletes a name/dialect association.

Formatting Parameters

In addition to the dialect argument, both the reader and writer constructors take several specific formatting parameters, specified as keyword parameters. The formatting parameters understood are:

quotechar specifies a one-character string to use as the quoting character. It defaults to '"'. Setting this to None has the same effect as setting quoting to csv.QUOTE_NONE.
delimiter specifies a one-character string to use as the field separator. It defaults to ','.
escapechar specifies a one-character string used to escape the delimiter when quotechar is set to None.
skipinitialspace specifies how to interpret whitespace which immediately follows a delimiter. It defaults to False, which means that whitespace immediately following a delimiter is part of the following field.
lineterminator specifies the character sequence which should terminate rows.
quoting controls when quotes should be generated by the writer. It can take on any of the following module constants:
- csv.QUOTE_MINIMAL means only when required, for example, when a field contains either the quotechar or the delimiter
- csv.QUOTE_ALL means that quotes are always placed around fields.
- csv.QUOTE_NONNUMERIC means that quotes are always placed around nonnumeric fields.
- csv.QUOTE_NONE means that quotes are never placed around fields.
doublequote controls the handling of quotes inside fields. When True two consecutive quotes are interpreted as one during read, and when writing, each quote is written as two quotes.

When processing a dialect setting and one or more of the other optional parameters, the dialect parameter is processed before the individual formatting parameters. This makes it easy to choose a dialect, then override one or more of the settings without defining a new dialect class. For example, if a CSV file was generated by Excel 2000 using single quotes as the quote character and a colon as the delimiter, you could create a reader like:

csvreader = csv.reader(file("some.csv"), dialect="excel",
                       quotechar="'", delimiter=':')

Other details of how Excel generates CSV files would be handled automatically because of the reference to the "excel" dialect.

Reader Objects

Reader objects are iterables whose next() method returns a sequence of strings, one string per field in the row.

Writer Objects

Writer objects have two methods, writerow() and writerows(). The former accepts an iterable (typically a list) of fields which are to be written to the output. The latter accepts a list of iterables and calls writerow() for each.

Implementation

There is a sample implementation available. [1] The goal is for it to efficiently implement the API described in the PEP. It is heavily based on the Object Craft csv module. [2]

Testing

The sample implementation [1] includes a set of test cases.

Issues

Should a parameter control how consecutive delimiters are interpreted? Our thought is "no". Consecutive delimiters should always denote an empty field.
What about Unicode? Is it sufficient to pass a file object gotten from codecs.open()? For example:
```
csvreader = csv.reader(codecs.open("some.csv", "r", "cp1252"))

csvwriter = csv.writer(codecs.open("some.csv", "w", "utf-8"))
```
In the first example, text would be assumed to be encoded as cp1252. Should the system be aggressive in converting to Unicode or should Unicode strings only be returned if necessary?

In the second example, the file will take care of automatically encoding Unicode strings as utf-8 before writing to disk.

Note: As of this writing, the csv module doesn't handle Unicode data.
What about alternate escape conventions? If the dialect in use includes an escapechar parameter which is not None and the quoting parameter is set to QUOTE_NONE, delimiters appearing within fields will be prefixed by the escape character when writing and are expected to be prefixed by the escape character when reading.
Should there be a "fully quoted" mode for writing? What about "fully quoted except for numeric values"? Both are implemented (QUOTE_ALL and QUOTE_NONNUMERIC, respectively).
What about end-of-line? If I generate a CSV file on a Unix system, will Excel properly recognize the LF-only line terminators? Files must be opened for reading or writing as appropriate using binary mode. Specify the lineterminator sequence as 'rn'. The resulting file will be written correctly.
What about an option to generate dicts from the reader and accept dicts by the writer? See the DictReader and DictWriter classes in csv.py.
Are quote character and delimiters limited to single characters? For the time being, yes.
How should rows of different lengths be handled? Interpretation of the data is the application's job. There is no such thing as a "short row" or a "long row" at this level.

References

[1]	(1, 2) csv module, Python Sandbox (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/csv/)

[2]	(1, 2) csv module, Object Craft (http://www.object-craft.com.au/projects/csv)

[3]	Python-DSV module, Wells (http://sourceforge.net/projects/python-dsv/)

[4]	ASV module, Tratt (http://tratt.net/laurie/python/asv/)

There are many references to other CSV-related projects on the Web. A few are included here.

Copyright

This document has been placed in the public domain.

【2009/01/30 21:43 】

CATEGORY[Python]

zip関数

パイソンレシピ

複数の配列型をまとめて反復するには、zip関数を使います。

 >>> a=['apple','orange','lemon']
>>> b=[10,20,30]
>>> for fruit,qty in zip(a,b):
...     print "There are",qty," ",fruit
...
There are 10   apple
There are 20   orange
There are 30   lemon

【2009/02/08 13:36 】

CATEGORY[Pythonわかったよ！]

マルチスレッド処理

http://www.python.jp/doc/release/tut/node13.html#SECTION0013400000000000000000

11.4 マルチスレッド処理

スレッド処理 (threading) とは、順序的な依存関係にない複数のタスクを分割するテクニックです。スレッド処理は、ユーザの入力を受け付けつつ、背後で別のタスクを動かすようなアプリケーションの応答性を高めます。主なユースケースには、 I/O を別のスレッドの計算処理と並列して動作させるというものがあります。

以下のコードでは、高水準のモジュール threading でメインのプログラムを動かしながら背後で別のタスクを動作させられるようにする方法を示しています:

    import threading, zipfile

    class AsyncZip(threading.Thread):
        def __init__(self, infile, outfile):
            threading.Thread.__init__(self)        
            self.infile = infile
            self.outfile = outfile
        def run(self):
            f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
            f.write(self.infile)
            f.close()
            print 'Finished background zip of: ', self.infile

    background = AsyncZip('mydata.txt', 'myarchive.zip')
    background.start()
    print 'The main program continues to run in foreground.'
    
    background.join()    # Wait for the background task to finish
    print 'Main program waited until background was done.'

マルチスレッドアプリケーションを作る上で最も難しい問題は、データやリソースを共有するスレッド間の調整 (coordination) です。この問題を解決するため、 threading モジュールではロックやイベント、状態変数、セマフォといった数々の同期プリミティブを提供しています。

こうしたツールは強力な一方、ちょっとした設計上の欠陥で再現困難な問題を引き起こすことがあります。したがって、タスク間調整では Queue モジュールを使って他のスレッドから一つのスレッドにリクエストを送り込み、一つのリソースへのアクセスをできるだけ一つのスレッドに集中させるアプローチを勧めます。スレッド間の通信や調整にQueue オブジェクトを使うと、設計が容易になり、可読性が高まり、信頼性が増します。

11.5 ログ記録

logging モジュールでは、数多くの機能をそなえた柔軟性のあるログ記録システムを提供しています。最も簡単な使い方では、ログメッセージをファイルや sys.stderr に送信します:

    import logging
    logging.debug('Debugging information')
    logging.info('Informational message')
    logging.warning('Warning:config file %s not found', 'server.conf')
    logging.error('Error occurred')
    logging.critical('Critical error -- shutting down')

上記のコードは以下のような出力になります:

    WARNING:root:Warning:config file server.conf not found
    ERROR:root:Error occurred
    CRITICAL:root:Critical error -- shutting down

デフォルトでは、単なる情報やデバッグメッセージの出力は抑制され、出力は標準エラーに送信されます。選択可能な送信先には、email、データグラム、ソケット、 HTTP サーバへの送信などがあります。新たにフィルタを作成すると、DEBUG, INFO, WARNING, ERROR, CRITICAL といったメッセージのプライオリティに従って配送先を変更できます。

ログ記録システムは Python から直接設定できますし、アプリケーションを変更しなくてもカスタマイズできるよう、ユーザが編集できる設定ファイルでも設定できます。

11.6 弱参照

Python は自動的にメモリを管理します (ほとんどのオブジェクトの参照回数をカウントし、ガベージコレクションによって循環参照を除去します)。オブジェクトに対する最後の参照がなくなってしばらくするとメモリは解放されます。

このようなアプローチはほとんどのアプリケーションでうまく動作しますが、中にはオブジェクトをどこか別の場所で利用するまでの間だけ追跡しておきたい場合もあります。残念ながら、オブジェクトを追跡するだけでは、オブジェクトに対する恒久的な参照を作ることになってしまいます。 weakref モジュールでは、オブジェクトを参照を作らずに追跡するためのツールを提供しています。弱参照オブジェクトが不要になると、弱参照 (weakref) テーブルから自動的に除去され、コールバック関数がトリガされます。弱参照を使う典型的な応用例には、作成コストの大きいオブジェクトのキャッシュがあります:

    >>> import weakref, gc
    >>> class A:
    ...     def __init__(self, value):
    ...             self.value = value
    ...     def __repr__(self):
    ...             return str(self.value)
    ...
    >>> a = A(10)                   # create a reference
    >>> d = weakref.WeakValueDictionary()
    >>> d['primary'] = a            # does not create a reference
    >>> d['primary']                # fetch the object if it is still alive
    10
    >>> del a                       # remove the one reference
    >>> gc.collect()                # run garbage collection right away
    0
    >>> d['primary']                # entry was automatically removed
    Traceback (most recent call last):
      File "<pyshell#108>", line 1, in -toplevel-
        d['primary']                # entry was automatically removed
      File "C:/PY24/lib/weakref.py", line 46, in __getitem__
        o = self.data[key]()
    KeyError: 'primary'

【2009/04/18 08:25 】

CATEGORY[Python 引用]

Linuxでは、pyファイル名に、日本語が使えない

Linuxでは、pyファイル名に、日本語が使えないのかな？？

python目次

【2009/05/10 10:11 】

CATEGORY[Pythonわかんない？]

印刷用のスタイルシート

スタイルシート側に

#wrapper {
    margin-left: auto;
    margin-right: auto;
    width:720px;
    background-color: #FFF;
}

---------------------------------------------------------------------------------------
html側に

<html>
<body>
<div id="wrapper">
  ・・・
  ・・・
/div>
</body>
</html>
---------------------------------------------------------------------------------------

【2009/05/31 13:11 】

CATEGORY[スタイルシート]

dmz

dmzファイル

【2009/09/20 18:13 】

CATEGORY[Pythonわかんない？]

PythonとDjangoでExcelファイルを作成(引用）

PythonとDjangoでExcelファイルを作成
Chris McAvoy

海外internet.com発の記事http://mediajam.info/topic/518331

はじめに

　クライアントにデータを操作させる必要がある場合には、Excelのスプレッドシートを用意するのが最も簡単です。スプレッドシートはPythonを使って簡単に作成できますし、スプレッドシートをWebからダウンロードさせることも、DjangoというWebフレームワークを使えば簡単です。この記事ではスプレッドシートの進化の歴史について簡単に触れた後、この2つの操作方法について説明します。

スプレッドシートの進化の歴史

　サメは何百年もの間、あまり進化していません。その最大の理由は、野生で十分に生きていけるからです。獲物を実に上手く捕らえることができるので、サメは自らを適応させる必要も、変化する必要もなかったのです。これ以上進化させるところはないというほどに、サメは完璧な恐怖のフィッシュハンターです。

　サメと同様に、スプレッドシートも数十年の間、あまり進化してきませんでした。サメの例と同じく、スプレッドシートもデータを実に上手く処理することができるので、ほとんど進化する必要がなかったのです。スプレッドシードでは大量のデータを扱うことができ、データをわかりやすい表形式で表示でき、プログラマでない人でも、数値の並べ替えや操作を行ってデータを分析できます。スプレッドシートは完璧な恐怖のデータハンターなのです。

Webアプリケーションのデータ表示

　スプレッドシートはそれほど素晴らしいものなので、改良を試みて時間を無駄にするよりも、その能力を素直に認めた方が得策な場合もあります。Webアプリケーションの機能の大半は、「データベースのデータを受け取って、それをWeb上に配置すること」です。反復的開発環境では、最初の数サイクルは「データをデータベースに入力すること」に費やされます。この段階が終わると、開発の依頼主がWebアプリケーションにデータを表示させてみて欲しいと言い始めるでしょう。ここからが面倒なところです。

　表示して欲しいと求められたところで、そのデータを実際にどのように操作したいのかを正確に理解しているクライアントはほとんどいません。その時点では自分で理解しているつもりなのかもしれませんが、言われたとおりにして見せると、たいていは「思っていたのと何だか違う」という反応が返ってきます。

　これは別に、クライアントが意地悪をしているわけではありません。すべてのデータを一度に見るという経験はおそらく今まで1回もなかったのでしょうから、当然と言えば当然なのです。新しい形の情報をいじってみたら、今まで考えもしなかったパターンを思いついたというだけのことです。

Excelファイルを提供するWebアプリケーション

　このような経験から、筆者はできるだけExcelを使ってクライアントにデータを表示して見せるようにしています。データはクライアントがWebアプリケーションからダウンロードできるようにしておきます。Excelで自由に操作してもらった後なら、データをどのように利用し、最終的にどのように表示させたいかについて、優れたアイディアをクライアントから聞き出すことができます。時間を節約でき、製品の品質向上にもつながります。

　「データをExcel形式でダウンロード」ボタンを用意しておくと、データの可搬性という点でも大いに役立ちます。アプリケーションが完成に近づき、データの表示にExcelを使用する必要がなくなったときでも、Excelへのエクスポート機能を残しておき、ユーザーがデータをオフラインで利用できるようにしましょう。これはさまざまな難題を解決する、間違いなく魅力的な機能です。

　スプレッドシートの素晴らしさをご理解いただいたところで、以降ではその作成方法を説明し、続いて、必要なときすぐにWebアプリケーションからダウンロードできるようにする方法について解説します。

　今回の例ではWebアプリケーションフレームワークにDjangoを使用しますが、基本的な考え方はどのツールキットを使用する場合でも同じです。

PythonによるExcelスプレッドシートの作成

　Excelのフォーマットは（言うまでもありませんが）プロプライエタリです。ただし、オープンにしようと最善を尽くしてはいるようです。コンマ区切りのファイルをExcelで開き、Excelのスプレッドシートとして操作することは簡単にできます。また、コンマ区切りのファイルをPythonで作成することもいたって簡単です。PythonとExcelはcsvファイルを共通言語とする、最高に相性の良い組み合わせです。

　Pythonの標準ライブラリには、優れたcsv出力クラスが付属しています。次の短い例で、その使い方を紹介します。

import csv

w = csv.writer(open('output.csv','w'))

for i in range(10):

w.writerow(range(10))

編集部注

　Windows環境で上記コードを実行した場合、改行コードが複数出力される現象が確認されています。
　この場合は、2行目の記述を以下のようにすれば問題を回避できます。
　　　w = csv.writer(open('output.csv','w'),lineterminator="¥n")

　output.csvの内容は以下のようになります。

0,1,2,3,4,5,6,7,8,9

　このファイルをExcelで開くと、区切り文字をたずねるダイアログが表示されます。コンマを選択すればそれで完了です。

　そうは言っても、csvファイルを開く際のこの一手間が、エンドユーザーのお気に召さないこともあります。そのような場合は、Roman Kiseliovが作成した、ExcelのバイナリファイルをPythonで作成するための素晴らしいライブラリを利用しましょう。これはpyExceleratorというライブラリで、Sourceforgeのプロジェクトページから入手可能です。このライブラリはWindowsやExcelを必要としないので、もっと簡単に実行できます。

　前回のcsvの例と同じく、10行×10列の表をpyExceleratorで作成してみましょう。

from pyExcelerator import *

wb = Workbook()

ws0 = wb.add_sheet('0')

for x in range(10):

for y in range(10):

# writing to a specific x,y

ws0.write(x,y,"this is cell %s, %s" % (x,y))

wb.save('output.xls')

　先ほどとスクリプトは似ていますが、いくつか重要な違いがあります。pyExceleratorではWorkbookオブジェクトを明示的に作成し、そこにWorksheetオブジェクトを追加する必要があります。Worksheetにデータを書き込むときは、書き込み先の位置をx,yで指定します。この方法は柔軟性に優れていますが、書き込む際に表を頭に思い浮かべることが必要になります。

　pyExceleratorの方が少し手間がかかりますが、機能という点でははるかに上です。簡単な表データの作成に加え、セルの書式設定や、Excelの式の挿入など、Excelでよく使われる機能はひととおり網羅しています。csv出力クラスでは、簡単な表を作成することしかできません。

DjangoからのExcelファイルのダウンロード

　Excel互換ファイルをPythonで作成する方法を覚えたところで、次はそのファイルをDjangoでダウンロード可能にする方法を見ていきましょう。手順は簡単です。また、同じ手順でほとんどすべての種類のバイナリファイルを生成し、提供することができます。ファイルを作成できれば、それをユーザーにダウンロードさせるのは簡単です。

　鍵を握るのは「content-type」というHTTPヘッダーです。ブラウザからサーバー上のファイルを要求する場合、トランザクションは次のようになります。

GET /wp-content/uploads/2007/10/cropped-dsc_0020.jpg HTTP/1.1

Host: weblog.lonelylion.com

User-Agent:Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.11)

Gecko/20071204 Ubuntu/7.10 (gutsy) Firefox/2.0.0.11

Accept: image/png,*/*;q=0.5

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

Referer: http://lonelylion.com/

HTTP/1.x 200 OK

Date: Sat, 02 Feb 2008 17:53:58 GMT

Server: Apache/1.3.37 (Unix) mod_throttle/3.1.2 DAV/1.0.3

mod_fastcgi/2.4.2 mod_gzip/1.3.26.1a PHP/4.4.8 mod_ssl/2.8.22

OpenSSL/0.9.7e

Last-Modified: Thu, 01 Nov 2007 03:22:12 GMT

Etag: "798f213-5c88-47294664"

Accept-Ranges: bytes

Content-Length: 23688

Keep-Alive: timeout=2, max=97

Connection: Keep-Alive

Content-Type: image/jpeg

　最後の行のContent-Type: image/jpegに注目してください。この行は、後続のデータが何であるかをブラウザに伝えています。ブラウザはこの行に基づいて、データの表示方法や、どの外部アプリケーションを開いてデータを表示するかを判断します。

　上記の例ではJpegの画像を要求しているため、Content-Typeに「image/jpeg」が設定されています。このヘッダーを「application/ms-excel」に変更すると、ファイルがExcelで開かれます。Djangoでヘッダーの記述を変更するのは非常に簡単です。以下に、csvファイルをブラウザに送信し、それをExcelで開くようブラウザに指示するDjangoビューの例を示します。

import csv

from StringIO import StringIO

from django.http import HttpResponse

def show_excel(request):

# use a StringIO buffer rather than opening a file

output = StringIO()

w = csv.writer(output)

for i in range(10):

w.writerow(range(10))

# rewind the virtual file

output.seek(0)

return HttpResponse(output.read(),

mimetype='application/ms-excel')

　最初のcsv作成の例にいくつか修正を加えました。まず、実際にファイルを開くのではなく、ファイルに似たStringIOオブジェクトを使用しています。また、応答をDjangoのHttpResponseオブジェクトでラッピングしています（HttpResponseオブジェクトはDjangoビュー標準の戻り値のオブジェクトタイプです）。さらに、HttpResponseにcontent_typeを渡すことによって、Content-Typeヘッダーを「application/ms-excel」に設定しています。

　このパターンを使用して、ほとんどすべての種類のバイナリデータをWebブラウザから取得することができます。PDF、画像、音声、動画など、あらゆるデータに対して生成用のライブラリが用意されているので、Content-Typeヘッダーの値さえわかれば、バイナリデータを生成することが可能です。

　それでは、同じテクニックを使ってpyExceleratorで生成したExcelファイルを出力してみましょう。

from pyExcelerator import *

from django.http import HttpResponse

def show_excel(request):

wb = Workbook()

ws0 = wb.add_sheet('0')

for x in range(10):

for y in range(10):

# writing to a specific x,y

ws0.write(x,y,"this is cell %s, %s" % (x,y))

wb.save('output.xls')

return HttpResponse(open('output.xls','rb').read(),

mimetype='application/ms-excel')

　これもやはり、先ほどのpyExceleratorを使用してExcelファイルを作成した例に似ています。ただこのコードには1つ難があります。一時ファイルを作成してデータを書き込み、その後あらためてファイルを開き、そこからデータを読み込んでいるのです。この方法だと、トラフィックが多い場合に問題が起こる可能性があります。また、あるユーザーがファイルを読み込んでいる間に別のユーザーがそこにアクセスすると、ファイルが破損してしまうかもしれません。ファイル名にタイムスタンプを付加するという解決方法もありますが、最も理想的なのは、Workbookのsaveメソッドにファイルに似たオブジェクトを渡してそこにデータを保存するという方法です。現在のpyExceleratorではこの方法は使用できませんが、必ずパッチが提供されると予想しています。

　筆者はたいていの場合、csvファイルを「Content-Type: application/ms-excel」としてブラウザに送信します。StringIOを使用すれば実装がシンプルになりますし、ダウンロード可能なスプレッドシートの作成にかかる時間は通常5分以内なので、ほとんどのエンドユーザーはおかしいとも感じません。これでユーザーの希望はかなえられますし、すぐにデータをいじってもらうことができます。ユーザーはデータをあれこれ操作することで、彼らが実際に使用するHTML表示についてより明確なビジョンを持つようになり、的確な要求をしてくるようになるでしょう。良いことだらけです。

【2009/10/08 10:03 】

CATEGORY[Python 引用]

今日からXAMPPの導入日記

今日からXAMPPの開始
ダウンロード
http://www.apachefriends.org/jp/xampp-windows.html

インストーラを使ってインストールする使ってインストールする
トップレベルのフォルダ、例えばH:\xamppにXAMPPを抽出
FileZilla FTPサーバは絶対パスを要求するため、起動しません

アンインストールに約４分
インストール
　ドライブレターなし
　localhostに接続
　セキュリティーの設定
以上で２５分

【2009/12/19 10:39 】

CATEGORY[XAMPP]

linux(Ubuntu)にXamppをインストールする

XAMPPをアンインストールするには、

sudo rm -rf /opt/lampp

ダウンロードしたファイルのあるディレクトリに移動する