Changeset 150

Show
Ignore:
Timestamp:
04/26/08 18:11:04 (2 months ago)
Author:
rgrp
Message:

[xl]: massive overhaul to integrate into Pylons 0.9.6. All tests passing again except for some annotater related ones which have been commented out.

  • Remove old etc/shakespeare.conf stuff and change to use paste/pylons config setup (as much as possible)
  • Templates:
    • Move templates from template/* to templates/*
    • Update to layout template to use ${page_title()} instead of ${page_title} etc (needed in Genshi >= 0.4).
    • Change to use pylons 'c' object
  • WUI: Convert to using pylons controller and pylons tests (using paste.fixture and not twill)
  • Model: move model files (dm.py) to model/dm.py and test in tests
  • Make all tests nosetests compatible by adding in @classmethod on setup_class/teardown_class methods.
  • WARNING: Routes 1.7.1 (installed with Pylons 0.9.6) has a bug in its processing of kwargs to url_for which have a list. This bites us in some of test_view stuff where multiple text names are passed in. Can solve this by upgrading to Routes > 1.7.1. Alternatively we could go back to hard-coding the routes.
Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • trunk/README.txt

    Revision 148 Revision 150
    1Introduction 1Introduction 
    2************ 2************ 
    3 3 
    4The Open Shakespeare package provides a full open set of shakespeare's works 4The Open Shakespeare package provides a full open set of shakespeare's works 
    5(often in multiple versions) along with ancillary material, a variety of tools 5(often in multiple versions) along with ancillary material, a variety of tools 
    6and a python API. 6and a python API. 
    7 7 
    8Specifically in addition to the works themselves (often in multiple versions) 8Specifically in addition to the works themselves (often in multiple versions) 
    9there is an introduction, a chronology, explanatory notes, a concordance and 9there is an introduction, a chronology, explanatory notes, a concordance and 
    10search facilities. 10search facilities. 
    11 11 
    12All material is open source/open knowledge so that anyone can use, redistribute 12All material is open source/open knowledge so that anyone can use, redistribute 
    13and reuse these materials freely. For exact details of the license under which 13and reuse these materials freely. For exact details of the license under which 
    14this package is made available please see COPYING.txt. 14this package is made available please see COPYING.txt. 
    15 15 
    16Open Shakespeare has been developed under the aegis of the Open Knowledge 16Open Shakespeare has been developed under the aegis of the Open Knowledge 
    17Foundation (http://www.okfn.org/). 17Foundation (http://www.okfn.org/). 
    18 18 
    19Contact the Project 19Contact the Project 
    20******************* 20******************* 
    21 21 
    22Please mail info@okfn.org or join the okfn-discuss mailing list: 22Please mail info@okfn.org or join the okfn-discuss mailing list: 
    23 23 
    24  http://lists.okfn.org/listinfo/okfn-discuss 24  http://lists.okfn.org/listinfo/okfn-discuss 
    25 25 
    26 26 
    27Installation and Setup 27Installation and Setup 
    28********************** 28********************** 
    29 29 
    301. Install the code 301. Install the code 
    31=================== 31=================== 
    32 32 
    331.1: (EITHER) Install using setup.py (preferred) 331.1: (EITHER) Install using setup.py (preferred) 
    34------------------------------------------------ 34------------------------------------------------ 
    35 35 
    36Install ``shakespeare`` using easy_install:: 36Install ``shakespeare`` using easy_install:: 
    37 37 
    38    easy_install shakespeare 38    easy_install shakespeare 
    39 39 
    40NB: If you don't have easy_install you can get from here: 40NB: If you don't have easy_install you can get from here: 
    41 41 
    42<http://peak.telecommunity.com/DevCenter/EasyInstall#installation-instructions> 42<http://peak.telecommunity.com/DevCenter/EasyInstall#installation-instructions> 
    43 43 
    44Make a config file as follows:: 44Make a config file as follows:: 
    45 45 
    46    paster make-config shakespeare config.ini 46    paster make-config shakespeare config.ini 
    47 47 
    48Tweak the config file as appropriate and then setup the application:: 48Tweak the config file as appropriate and then setup the application:: 
    49 49 
    50    paster setup-app config.ini 50    paster setup-app config.ini 
    51 51 
    521.2 (OR) Get the code straight from subversion 521.2 (OR) Get the code straight from subversion 
    53------------------------------------------------ 53------------------------------------------------ 
    54 54 
    551. Check out the subversion trunk:: 551. Check out the subversion trunk:: 
    56 56 
    57    svn co https://knowledgeforge.net/shakespeare/svn/trunk 57    svn co https://knowledgeforge.net/shakespeare/svn/trunk 
    58 58 
    592. Do:: 592. Do:: 
    60 60 
    61    sudo python setup.py develop 61    sudo python setup.py develop 
    62 62 
    63 63 
    642. Cache Directory 642. Cache Directory 
    65================== 65================== 
    66 66 
    67Create a cache directory where texts and other material can be stored 67Create a cache directory where texts and other material can be stored 
    68 68 
    69This directory needs to be semi-permanent so do *not* put under a location such 69This directory needs to be semi-permanent so do *not* put under a location such 
    70as /tmp.  70as /tmp.  
    71 71 
    72 72 
    733. Create a configuration file   
    74==============================   
    75   
    761. copy the template at etc/shakespeare.conf.new to a suitable new location   
    77   (suggestion: etc/shakespeare.conf)   
    78   
    792. edit to reflect your setup (see comments in file)   
    80   
    813. make sure the config file can be found:   
    82  1. EITHER: it must be located at etc/shakespeare.conf relative to the   
    83       directory from which you run scripts   
    84     
    85  2. OR: set the SHAKESPEARECONF environment variable to contain the path to   
    86       the configuration file   
    87   
    88 73 
    895. Initialize the system 745. Initialize the system 
    90======================== 75======================== 
    91 76 
    92Run: $ bin/shakespeare-admin init 77Run: $ bin/shakespeare-admin init 
    93 78 
    94This may take some time to run so be patient 79This may take some time to run so be patient 
    95 80 
    96TIP: using sqlite building the concordance really **does** seem to run forever 81TIP: using sqlite building the concordance really **does** seem to run forever 
    97so recommend using postgresql or mysql if you are going to build the 82so recommend using postgresql or mysql if you are going to build the 
    98concordance.  83concordance.  
    99 84 
    100 85 
    101Getting Started 86Getting Started 
    102*************** 87*************** 
    103 88 
    104As a user: 89As a user: 
    105========== 90========== 
    106 91 
    107Start up the web interface by running the webserver: 92Start up the web interface by running the webserver: 
    108 93 
    109  $ bin/shakespeare-admin runserver 94  $ bin/shakespeare-admin runserver 
    110 95 
    111Then visit http://localhost:8080/ using your favourite web browser. 96Then visit http://localhost:8080/ using your favourite web browser. 
    112 97 
    113As a developer: 98As a developer: 
    114=============== 99=============== 
    115 100 
      1010. Copy development.ini.tmpl to development.ini and edit to your taste. 
      102 
    1161. Check out the administrative commands: $ bin/shakespeare-admin help. 1031. Check out the administrative commands: $ bin/shakespeare-admin help. 
    117 104 
    1182. Run the tests: $ py.test 1052. Run the tests using either py.test of nosetests:: 
    119      
    120Note that:   
    121      
    122  * The tests use [py.test] so you will need to have installed this   
    123 106 
    124  * To run the website tests (site_test etc) you will need to install [twill] 107    $ nosetests shakespeare 
    125    and have the webserver running   
    126 108 
    127[py.test]: http://codespeak.net/py/current/doc/getting-started.html   
    128[twill]: http://twill.idyll.org/   
    129   
  • trunk/development.ini.tmpl

    Revision 148 Revision 150
    1# 1# 
    2# shakespeare - Pylons development environment configuration 2# shakespeare - Pylons development environment configuration 
    3# 3# 
    4# The %(here)s variable will be replaced with the parent directory of this file 4# The %(here)s variable will be replaced with the parent directory of this file 
    5# 5# 
    6[DEFAULT] 6[DEFAULT] 
    7debug = true 7debug = true 
    8# Uncomment and replace with the address which should receive any error reports 8# Uncomment and replace with the address which should receive any error reports 
    9#email_to = you@yourdomain.com 9#email_to = you@yourdomain.com 
    10smtp_server = localhost 10smtp_server = localhost 
    11error_email_from = paste@localhost 11error_email_from = paste@localhost 
      12 
      13# directory where we can store all local copies of texts 
      14# at present should be different from the app's cache_dir 
      15cachedir = %(here)s/cache 
      16 
    12 17 
    13[server:main] 18[server:main] 
    14use = egg:Paste#http 19use = egg:Paste#http 
    15host = 0.0.0.0 20host = 0.0.0.0 
    16port = 5000 21port = 5000 
    17 22 
    18[app:main] 23[app:main] 
    19use = egg:shakespeare 24use = egg:shakespeare 
    20full_stack = true 25full_stack = true 
    21cache_dir = %(here)s/data 26cache_dir = %(here)s/data 
    22beaker.session.key = shakespeare 27beaker.session.key = shakespeare 
    23beaker.session.secret = somesecret 28beaker.session.secret = somesecret 
    24 29 
    25# If you'd like to fine-tune the individual locations of the cache data dirs 30# If you'd like to fine-tune the individual locations of the cache data dirs 
    26# for the Cache data, or the Session saves, un-comment the desired settings 31# for the Cache data, or the Session saves, un-comment the desired settings 
    27# here: 32# here: 
    28#beaker.cache.data_dir = %(here)s/data/cache 33#beaker.cache.data_dir = %(here)s/data/cache 
    29#beaker.session.data_dir = %(here)s/data/sessions 34#beaker.session.data_dir = %(here)s/data/sessions 
    30 35 
    31# WARNING: *THE LINE BELOW MUST BE UNCOMMENTED ON A PRODUCTION ENVIRONMENT* 36# WARNING: *THE LINE BELOW MUST BE UNCOMMENTED ON A PRODUCTION ENVIRONMENT* 
    32# Debug mode will enable the interactive debugging tool, allowing ANYONE to 37# Debug mode will enable the interactive debugging tool, allowing ANYONE to 
    33# execute malicious code after an exception is raised. 38# execute malicious code after an exception is raised. 
    34#set debug = false 39#set debug = false 
      40 
      41# using sqlite in memory leads to thread issues when using db ... 
      42# sqlobject.dburi = sqlite:///:memory: 
      43sqlobject.dburi = postgres://<username>:<password>@localhost/<your-dbname> 
    35 44 
    36 45 
    37# Logging configuration 46# Logging configuration 
    38[loggers] 47[loggers] 
    39keys = root, shakespeare 48keys = root, shakespeare 
    40 49 
    41[handlers] 50[handlers] 
    42keys = console 51keys = console 
    43 52 
    44[formatters] 53[formatters] 
    45keys = generic 54keys = generic 
    46 55 
    47[logger_root] 56[logger_root] 
    48level = INFO 57level = INFO 
    49handlers = console 58handlers = console 
    50 59 
    51[logger_shakespeare] 60[logger_shakespeare] 
    52level = DEBUG 61level = DEBUG 
    53handlers = 62handlers = 
    54qualname = shakespeare 63qualname = shakespeare 
    55 64 
    56[handler_console] 65[handler_console] 
    57class = StreamHandler 66class = StreamHandler 
    58args = (sys.stderr,) 67args = (sys.stderr,) 
    59level = NOTSET 68level = NOTSET 
    60formatter = generic 69formatter = generic 
    61 70 
    62[formatter_generic] 71[formatter_generic] 
    63format = %(asctime)s,%(msecs)03d %(levelname)-5.5s [%(name)s] %(message)s 72format = %(asctime)s,%(msecs)03d %(levelname)-5.5s [%(name)s] %(message)s 
    64datefmt = %H:%M:%S 73datefmt = %H:%M:%S 
      74 
  • trunk/shakespeare.egg-info/paste_deploy_config.ini_tmpl

    Revision 148 Revision 150
    1# 1# 
    2# shakespeare - Pylons configuration 2# shakespeare - Pylons configuration 
    3# 3# 
    4# The %(here)s variable will be replaced with the parent directory of this file 4# The %(here)s variable will be replaced with the parent directory of this file 
    5# 5# 
    6[DEFAULT] 6[DEFAULT] 
    7debug = true 7debug = true 
    8email_to = you@yourdomain.com 8email_to = you@yourdomain.com 
    9smtp_server = localhost 9smtp_server = localhost 
    10error_email_from = paste@localhost 10error_email_from = paste@localhost 
      11 
      12# directory where we can store all local copies of texts 
      13# at present should be different from the app's cache_dir 
      14cachedir = ./cache 
    11 15 
    12[server:main] 16[server:main] 
    13use = egg:Paste#http 17use = egg:Paste#http 
    14host = 0.0.0.0 18host = 0.0.0.0 
    15port = 5000 19port = 5000 
    16 20 
    17[app:main] 21[app:main] 
    18use = egg:shakespeare 22use = egg:shakespeare 
    19full_stack = true 23full_stack = true 
    20cache_dir = %(here)s/data 24cache_dir = %(here)s/data 
    21beaker.session.key = shakespeare 25beaker.session.key = shakespeare 
    22beaker.session.secret = ${app_instance_secret} 26beaker.session.secret = ${app_instance_secret} 
    23app_instance_uuid = ${app_instance_uuid} 27app_instance_uuid = ${app_instance_uuid} 
    24 28 
    25# If you'd like to fine-tune the individual locations of the cache data dirs 29# If you'd like to fine-tune the individual locations of the cache data dirs 
    26# for the Cache data, or the Session saves, un-comment the desired settings 30# for the Cache data, or the Session saves, un-comment the desired settings 
    27# here: 31# here: 
    28#beaker.cache.data_dir = %(here)s/data/cache 32#beaker.cache.data_dir = %(here)s/data/cache 
    29#beaker.session.data_dir = %(here)s/data/sessions 33#beaker.session.data_dir = %(here)s/data/sessions 
    30 34 
    31# WARNING: *THE LINE BELOW MUST BE UNCOMMENTED ON A PRODUCTION ENVIRONMENT* 35# WARNING: *THE LINE BELOW MUST BE UNCOMMENTED ON A PRODUCTION ENVIRONMENT* 
    32# Debug mode will enable the interactive debugging tool, allowing ANYONE to 36# Debug mode will enable the interactive debugging tool, allowing ANYONE to 
    33# execute malicious code after an exception is raised. 37# execute malicious code after an exception is raised. 
    34set debug = false 38set debug = false 
    35 39 
      40# using sqlite in memory leads to thread issues when using db ... 
      41# sqlobject.dburi = sqlite:///:memory: 
      42sqlobject.dburi = postgres://<username>:<password>@localhost/<your-dbname> 
    36 43 
    37# Logging configuration 44# Logging configuration 
    38[loggers] 45[loggers] 
    39keys = root 46keys = root 
    40 47 
    41[handlers] 48[handlers] 
    42keys = console 49keys = console 
    43 50 
    44[formatters] 51[formatters] 
    45keys = generic 52keys = generic 
    46 53 
    47[logger_root] 54[logger_root] 
    48level = INFO 55level = INFO 
    49handlers = console 56handlers = console 
    50 57 
    51[handler_console] 58[handler_console] 
    52class = StreamHandler 59class = StreamHandler 
    53args = (sys.stderr,) 60args = (sys.stderr,) 
    54level = NOTSET 61level = NOTSET 
    55formatter = generic 62formatter = generic 
    56 63 
    57[formatter_generic] 64[formatter_generic] 
    58format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)s 65format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)s 
      66 
      67 
      68[misc] 
      69# directory where we can store all local copies of texts 
      70cachedir = ./cache 
      71 
      72[db] 
      73# sqlobject database uri. see sqlobject documentation for details 
      74# uri = postgres://user:pass@host/dbname 
      75uri = sqlite:/:memory: 
      76 
      77[web] 
      78# directory where the templates used by web front end are kept 
      79template_dir = ./src/shakespeare/template 
      80 
      81[annotater] 
      82# url at which marginalia files (css/js etc) should be mounted 
      83marginalia_prefix = /marginalia 
  • trunk/shakespeare/__init__.py

    Revision 148 Revision 150
    1__version__ = '0.5dev' 1__version__ = '0.5dev' 
    2__application_name__ = 'shakespeare' 2__application_name__ = 'shakespeare' 
    3 3 
    4def conf(): 4def conf(): 
    5    import os 5    import os 
    6    defaultPath = os.path.abspath('./etc/%s.conf' % __application_name__6    defaultPath = os.path.abspath('./development.ini'
    7    envVarName = __application_name__.upper() + 'CONF' 7    envVarName = __application_name__.upper() + 'CONF' 
    8    confPath = os.environ.get(envVarName, defaultPath) 8    confPath = os.environ.get(envVarName, defaultPath) 
    9    if not os.path.exists(confPath): 9    if not os.path.exists(confPath): 
    10        raise ValueError('No Configuration file exists at: %s' % confPath) 10        raise ValueError('No Configuration file exists at: %s' % confPath) 
    11    import ConfigParser  11  
    12    conf = ConfigParser.SafeConfigParser()  12     # register the config 
    13    conf.read(confPath)  13     import paste.deploy 
       14     import shakespeare.config.environment 
       15     pasteconf = paste.deploy.appconfig('config:' + confPath) 
       16  
       17     shakespeare.config.environment.load_environment(pasteconf.global_conf, 
       18         pasteconf.local_conf) 
       19     from pylons import config 
       20     conf = config 
       21  
       22     # import ConfigParser 
       23     # conf = ConfigParser.SafeConfigParser() 
       24     # conf.read(confPath) 
       25  
    14    return conf 26    return conf 
    15 27      
    16   
  • trunk/shakespeare/cache.py

    Revision 139 Revision 150
    1import os 1import os 
    2import urllib 2import urllib 
    3 3 
    4import shakespeare 4import shakespeare 
    5conf = shakespeare.conf() 5conf = shakespeare.conf() 
    6 6 
    7class Cache(object): 7class Cache(object): 
    8    """Provide a local filesystem cache for material. 8    """Provide a local filesystem cache for material. 
    9    """ 9    """ 
    10 10 
    11    def __init__(self, cache_path): 11    def __init__(self, cache_path): 
    12        self.cache_path = cache_path 12        self.cache_path = cache_path 
    13 13 
    14    def path(self, remote_url, version=''): 14    def path(self, remote_url, version=''): 
    15        """Get local path to text of remote url. 15        """Get local path to text of remote url. 
    16        @type: string giving version of text (''|'cleaned') 16        @type: string giving version of text (''|'cleaned') 
    17        """ 17        """ 
    18        protocolEnd = remote_url.index(':') + 3  # add 3 for :// 18        protocolEnd = remote_url.index(':') + 3  # add 3 for :// 
    19        path = remote_url[protocolEnd:] 19        path = remote_url[protocolEnd:] 
    20        base, name = os.path.split(path) 20        base, name = os.path.split(path) 
    21        name = version + name 21        name = version + name 
    22        offset = os.path.join(base, name) 22        offset = os.path.join(base, name) 
    23        localPath = self.path_from_offset(offset) 23        localPath = self.path_from_offset(offset) 
    24        return localPath 24        return localPath 
    25 25 
    26    def download_url(self, url, overwrite=False): 26    def download_url(self, url, overwrite=False): 
    27        """Download a url to the local cache 27        """Download a url to the local cache 
    28        @overwrite: if True overwrite an existing local copy otherwise don't 28        @overwrite: if True overwrite an existing local copy otherwise don't 
    29        """ 29        """ 
    30        localPath = self.path(url) 30        localPath = self.path(url) 
    31        dirpath = os.path.dirname(localPath) 31        dirpath = os.path.dirname(localPath) 
    32        if overwrite or not(os.path.exists(localPath)): 32        if overwrite or not(os.path.exists(localPath)): 
    33            if not os.path.exists(dirpath): 33            if not os.path.exists(dirpath): 
    34                os.makedirs(dirpath) 34                os.makedirs(dirpath) 
    35            # use wget as it seems to work more reliably on wikimedia 35            # use wget as it seems to work more reliably on wikimedia 
    36            # see extensive comments on issue in shakespeare.eb.Wikimedia class 36            # see extensive comments on issue in shakespeare.eb.Wikimedia class 
    37            # rgrp: 2008-03-18 use urllib rather than wget despite these issues 37            # rgrp: 2008-03-18 use urllib rather than wget despite these issues 
    38            # as wget is fairly specific to linux/unix and even there may not 38            # as wget is fairly specific to linux/unix and even there may not 
    39            # be installed. 39            # be installed. 
    40            # cmd = 'wget -O %s %s' % (localPath, url)  40            # cmd = 'wget -O %s %s' % (localPath, url)  
    41            # os.system(cmd) 41            # os.system(cmd) 
    42            urllib.urlretrieve(url, localPath) 42            urllib.urlretrieve(url, localPath) 
    43 43 
    44    def path_from_offset(self, offset): 44    def path_from_offset(self, offset): 
    45        "Get full path of file in cache given by offset." 45        "Get full path of file in cache given by offset." 
    46        return os.path.join(self.cache_path, offset) 46        return os.path.join(self.cache_path, offset) 
    47 47 
    48 48 
    49default_path = shakespeare.conf().get('misc', 'cachedir') 49default_path = shakespeare.conf()['cachedir'] 
    50default = Cache(default_path) 50default = Cache(default_path) 
    51 51 
  • trunk/shakespeare/cache_test.py

    Revision 50 Revision 150
    1import os 1import os 
    2import shutil 2import shutil 
    3import tempfile 3import tempfile 
    4 4 
    5import shakespeare.cache 5import shakespeare.cache 
    6 6 
    7class TestCache(object): 7class TestCache(object): 
    8 8 
      9    @classmethod 
    9    def setup_class(cls): 10    def setup_class(cls): 
    10        cls.cache_path = tempfile.mkdtemp() 11        cls.cache_path = tempfile.mkdtemp() 
    11        cls.cache = shakespeare.cache.Cache(cls.cache_path) 12        cls.cache = shakespeare.cache.Cache(cls.cache_path) 
    12        cls.url = 'http://www.gutenberg.org/dirs/GUTINDEX.ALL' 13        cls.url = 'http://www.gutenberg.org/dirs/GUTINDEX.ALL' 
    13        cls.url2 = 'http://project.knowledgeforge.net/shakespeare/svn/trunk/CHANGELOG.txt' 14        cls.url2 = 'http://project.knowledgeforge.net/shakespeare/svn/trunk/CHANGELOG.txt' 
    14 15 
      16    @classmethod 
    15    def teardown_class(cls): 17    def teardown_class(cls): 
    16        shutil.rmtree(cls.cache_path) 18        shutil.rmtree(cls.cache_path) 
    17 19 
    18    def test_path(self): 20    def test_path(self): 
    19        exp = os.path.join(self.cache_path, self.url[7:]) 21        exp = os.path.join(self.cache_path, self.url[7:]) 
    20        out = self.cache.path(self.url) 22        out = self.cache.path(self.url) 
    21        assert out == exp 23        assert out == exp 
    22 24 
    23    def test_path_2(self): 25    def test_path_2(self): 
    24        exp = os.path.join(self.cache_path, 26        exp = os.path.join(self.cache_path, 
    25                'www.gutenberg.org/dirs/cleanedGUTINDEX.ALL') 27                'www.gutenberg.org/dirs/cleanedGUTINDEX.ALL') 
    26        out = self.cache.path(self.url, 'cleaned') 28        out = self.cache.path(self.url, 'cleaned') 
    27        assert exp == out 29        assert exp == out 
    28 30 
    29    def test_download_url(self): 31    def test_download_url(self): 
    30        exp = os.path.join(self.cache_path, self.url2[7:]) 32        exp = os.path.join(self.cache_path, self.url2[7:]) 
    31        self.cache.download_url(self.url2, overwrite=True) 33        self.cache.download_url(self.url2, overwrite=True) 
    32        assert os.path.exists(exp) 34        assert os.path.exists(exp) 
    33  35  
  • trunk/shakespeare/concordance.py

    Revision 74 Revision 150
    1""" 1""" 
    2Concordance (and statistics) for texts in database. 2Concordance (and statistics) for texts in database. 
    3 3 
    4To build concordance use ConcordanceBuilder.  To access concordance/statistics 4To build concordance use ConcordanceBuilder.  To access concordance/statistics 
    5use Concordance/Statistics class.  Concordance and statistics are provided as 5use Concordance/Statistics class.  Concordance and statistics are provided as 
    6dictionaries keyed by words. 6dictionaries keyed by words. 
    7 7 
    8NB: all word keys have been lower-cased in order to render them 8NB: all word keys have been lower-cased in order to render them 
    9case-insensitive 9case-insensitive 
    10""" 10""" 
    11import re 11import re 
    12 12 
    13import sqlobject 13import sqlobject 
    14 14 
    15import shakespeare.index 15import shakespeare.index 
    16import shakespeare.cache 16import shakespeare.cache 
    17 17 
    18 18 
    19class ConcordanceBase(object): 19class ConcordanceBase(object): 
    20    """ 20    """ 
    21    TODO: caching?? 21    TODO: caching?? 
    22    """ 22    """ 
    23    sqlcc = shakespeare.dm.Concordance 23    sqlcc = shakespeare.model.Concordance 
    24    sqlstat = shakespeare.dm.Statistic 24    sqlstat = shakespeare.model.Statistic 
    25 25 
    26    def __init__(self, filter_names=None): 26    def __init__(self, filter_names=None): 
    27        """ 27        """ 
    28        @param filter_names: a list of id names with which to filter results 28        @param filter_names: a list of id names with which to filter results 
    29            (i.e. only return results relating to those texts) 29            (i.e. only return results relating to those texts) 
    30        """ 30        """ 
    31        self._filter_names = filter_names 31        self._filter_names = filter_names 
    32        self.sqlcc_filter = self._make_filter(self.sqlcc) 32        self.sqlcc_filter = self._make_filter(self.sqlcc) 
    33        self.sqlstat_filter = self._make_filter(self.sqlstat) 33        self.sqlstat_filter = self._make_filter(self.sqlstat) 
    34 34 
    35    def _make_filter(self, sqlobj): 35    def _make_filter(self, sqlobj): 
    36        sql_filter = True 36        sql_filter = True 
    37        if self._filter_names is not None: 37        if self._filter_names is not None: 
    38            arglist = [] 38            arglist = [] 
    39            for name in self._filter_names: 39            for name in self._filter_names: 
    40                newarg = sqlobj.q.textID == self._name2id(name) 40                newarg = sqlobj.q.textID == self._name2id(name) 
    41                arglist.append(newarg) 41                arglist.append(newarg) 
    42            sql_filter = sqlobject.OR(*arglist) 42            sql_filter = sqlobject.OR(*arglist) 
    43        return sql_filter 43        return sql_filter 
    44     44     
    45    def _name2id(self, name): 45    def _name2id(self, name): 
    46        return shakespeare.dm.Material.byName(name).id 46        return shakespeare.model.Material.byName(name).id 
    47 47 
    48    def keys(self): 48    def keys(self): 
    49        """Return list of *distinct* words in concordance/statistics 49        """Return list of *distinct* words in concordance/statistics 
    50        """ 50        """ 
    51        all = self.sqlstat.select(self.sqlstat_filter, 51        all = self.sqlstat.select(self.sqlstat_filter, 
    52                           orderBy=self.sqlstat.q.word, 52                           orderBy=self.sqlstat.q.word, 
    53                           ) 53                           ) 
    54        words = [ xx.word for xx in list(all) ] 54        words = [ xx.word for xx in list(all) ] 
    55        distinct = list(set(words)) 55        distinct = list(set(words)) 
    56        distinct.sort() 56        distinct.sort() 
    57        return distinct 57        return distinct 
    58 58 
    59 59 
    60class Concordance(ConcordanceBase): 60class Concordance(ConcordanceBase): 
    61    """Concordance by word for a set of texts 61    """Concordance by word for a set of texts 
    62    """ 62    """ 
    63 63 
    64    def get(self, word): 64    def get(self, word): 
    65        """Get list of occurrences for word 65        """Get list of occurrences for word 
    66        @return: sqlobject query list  66        @return: sqlobject query list  
    67        """ 67        """ 
    68        select = self.sqlcc.select(sqlobject.AND(self.sqlcc_filter, self.sqlcc.q.word==word)) 68        select = self.sqlcc.select(sqlobject.AND(self.sqlcc_filter, self.sqlcc.q.word==word)) 
    69        return select 69        return select 
    70 70 
    71class Statistics(ConcordanceBase): 71class Statistics(ConcordanceBase): 
    72 72 
    73    def get(self, word): 73    def get(self, word): 
    74        select = self.sqlstat.select( 74        select = self.sqlstat.select( 
    75            sqlobject.AND(self.sqlstat_filter, self.sqlstat.q.word==word) 75            sqlobject.AND(self.sqlstat_filter, self.sqlstat.q.word==word) 
    76            ) 76            ) 
    77        total = 0 77        total = 0 
    78        for stat in select: 78        for stat in select: 
    79            total += stat.occurrences 79            total += stat.occurrences 
    80        return total 80        return total 
    81 81 
    82class ConcordanceBuilder(object): 82class ConcordanceBuilder(object): 
    83    """Build a concordance and associated statistics for a set of texts. 83    """Build a concordance and associated statistics for a set of texts. 
    84     84     
    85    """ 85    """ 
    86 86 
    87    # multiline, unicode and ignorecase 87    # multiline, unicode and ignorecase 
    88    word_regex = re.compile(r'\b(\w+)\b', re.U | re.M | re.I) 88    word_regex = re.compile(r'\b(\w+)\b', re.U | re.M | re.I) 
    89 89 
    90    words_to_ignore = [  90    words_to_ignore = [  
    91        # 'a', 'the', 'and', 'as', 'are', 'be', 'but', 'in' 91        # 'a', 'the', 'and', 'as', 'are', 'be', 'but', 'in' 
    92                        ] 92                        ] 
    93    non_words = [  93    non_words = [  
    94            'd', # accus'd 94            'd', # accus'd 
    95            't', 95            't', 
    96            ] 96            ] 
    97 97 
    98    def is_roman_numeral(self, word): 98    def is_roman_numeral(self, word): 
    99        digits = [ 'i', 'ii', 'iii', 'iv', 'v', 'vi', 'vii', 'viii', 'ix' ] 99        digits = [ 'i', 'ii', 'iii', 'iv', 'v', 'vi', 'vii', 'viii', 'ix' ] 
    100        others = [ 'l', 'x', 'c' ] 100        others = [ 'l', 'x', 'c' ] 
    101        if word == 'i': return False # exception because this conflicts with I 101        if word == 'i': return False # exception because this conflicts with I 
    102        while word[0] in others: 102        while word[0] in others: 
    103            if len(word) == 1: 103            if len(word) == 1: 
    104                return True 104                return True 
    105            else: 105            else: 
    106                word = word[1:] 106                word = word[1:] 
    107        return word in digits 107        return word in digits 
    108 108 
    109    def ignore_word(self, word): 109    def ignore_word(self, word): 
    110        "Return True if this word should not be added to the concordance." 110        "Return True if this word should not be added to the concordance." 
    111        bool1 = word in self.words_to_ignore 111        bool1 = word in self.words_to_ignore 
    112        bool2 = word in self.non_words 112        bool2 = word in self.non_words 
    113        # do roman numerals 113        # do roman numerals 
    114        bool3 = self.is_roman_numeral(word) 114        bool3 = self.is_roman_numeral(word) 
    115        return bool1 or bool2 or bool3 115        return bool1 or bool2 or bool3 
    116 116 
    117    def _text_already_done(self, text): 117    def _text_already_done(self, text): 
    118        numrecs = shakespeare.dm.Concordance.select( 118        numrecs = shakespeare.model.Concordance.select( 
    119                shakespeare.dm.Concordance.q.textID==text.id 119                shakespeare.model.Concordance.q.textID==text.id 
    120                ).count() 120                ).count() 
    121        return numrecs > 0 121        return numrecs > 0 
    122 122 
    123    def add_text(self, name, text=None): 123    def add_text(self, name, text=None): 
    124        """Add a text to the concordance. 124        """Add a text to the concordance. 
    125        @param name: name of text to add 125        @param name: name of text to add 
    126        @param text: [optional] a file-like object containing text data. If not 126        @param text: [optional] a file-like object containing text data. If not 
    127            provided will default to using file in cache associated with named 127            provided will default to using file in cache associated with named 
    128            text 128            text 
    129        """ 129        """ 
    130        dmText = shakespeare.dm.Material.byName(name) 130        dmText = shakespeare.model.Material.byName(name) 
    131        if self._text_already_done(dmText): 131        if self._text_already_done(dmText): 
    132            msg = 'Have already added to concordance text: %s' % dmText 132            msg = 'Have already added to concordance text: %s' % dmText 
    133            # raise ValueError(msg) 133            # raise ValueError(msg) 
    134            print msg 134            print msg 
    135            print 'Skipping' 135            print 'Skipping' 
    136            return 136            return 
    137        if text is None: 137        if text is None: 
    138            tpath = dmText.get_cache_path('plain') 138            tpath = dmText.get_cache_path('plain') 
    139            text = file(tpath) 139            text = file(tpath) 
    140        lineCount = 0 140        lineCount = 0 
    141        charIndex = 0 141        charIndex = 0 
    142        stats = {} 142        stats = {} 
    143        trans = shakespeare.dm.Concordance._connection.transaction() 143        trans = shakespeare.model.Concordance._connection.transaction() 
    144        for line in text.readlines(): 144        for line in text.readlines(): 
    145            for match in self.word_regex.finditer(line): 145            for match in self.word_regex.finditer(line): 
    146                word = match.group().lower() # case insensitive 146                word = match.group().lower() # case insensitive 
    147                if self.ignore_word(word): 147                if self.ignore_word(word): 
    148                    continue 148                    continue 
    149                shakespeare.dm.Concordance(connection=trans, 149                shakespeare.model.Concordance(connection=trans, 
    150                                           text=dmText, 150                                           text=dmText, 
    151                                           word=word, 151                                           word=word, 
    152                                           line=lineCount, 152                                           line=lineCount, 
    153                                           char_index=charIndex+match.start()) 153                                           char_index=charIndex+match.start()) 
    154                stats[word] = stats.get(word, 0) + 1 154                stats[word] = stats.get(word, 0) + 1 
    155            lineCount += 1 155            lineCount += 1 
    156            charIndex += len(line) 156            charIndex += len(line) 
    157        trans.commit() 157        trans.commit() 
    158        trans = shakespeare.dm.Concordance._connection.transaction() 158        trans = shakespeare.model.Concordance._connection.transaction() 
    159        for word, value in stats.items(): 159        for word, value in stats.items(): 
    160            tresults  = shakespeare.dm.Statistic.select( 160            tresults  = shakespeare.model.Statistic.select( 
    161                sqlobject.AND( 161                sqlobject.AND( 
    162                    shakespeare.dm.Statistic.q.textID == dmText.id, 162                    shakespeare.model.Statistic.q.textID == dmText.id, 
    163                    shakespeare.dm.Statistic.q.word == word 163                    shakespeare.model.Statistic.q.word == word 
    164                    )) 164                    )) 
    165            try: 165            try: 
    166                dbstat = list(tresults)[0] 166                dbstat = list(tresults)[0] 
    167                dbstat.occurrences += value 167                dbstat.occurrences += value 
    168            except: 168            except: 
    169                shakespeare.dm.Statistic( 169                shakespeare.model.Statistic( 
    170                        connection=trans, 170                        connection=trans, 
    171                        text=dmText, 171                        text=dmText, 
    172                        word=word, 172                        word=word, 
    173                        occurrences=value 173                        occurrences=value 
    174                        ) 174                        ) 
    175        trans.commit() 175        trans.commit() 
    176 176 
    177 177 
    178    def remove_text(self, name): 178    def remove_text(self, name): 
    179        """Remove a text from the concordance. 179        """Remove a text from the concordance. 
    180 180 
    181        @param name: as for add_text 181        @param name: as for add_text 
    182        """ 182        """ 
    183        dmText = shakespeare.dm.Material.byName(name) 183        dmText = shakespeare.model.Material.byName(name) 
    184        recs = shakespeare.dm.Concordance.select( 184        recs = shakespeare.model.Concordance.select( 
    185                shakespeare.dm.Concordance.q.textID==dmText.id 185                shakespeare.model.Concordance.q.textID==dmText.id 
    186                ) 186                ) 
    187        for rec in recs: 187        for rec in recs: 
    188