Changeset 154
- Timestamp:
- 05/21/08 01:56:27 (6 months ago)
- Files:
-
- trunk/README.txt (modified) (1 diff)
- trunk/contrib/size.py (modified) (1 diff)
- trunk/contrib/view_raw.py (modified) (1 diff)
- trunk/shakespeare/cli.py (modified) (1 diff)
- trunk/shakespeare/concordance.py (modified) (1 diff)
- trunk/shakespeare/controllers/site.py (modified) (1 diff)
- trunk/shakespeare/model/dm.py (modified) (1 diff)
- trunk/shakespeare/tests/test_model.py (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
trunk/README.txt
Revision 150 Revision 154 1 Introduction 1 Introduction 2 ************ 2 ************ 3 3 4 The Open Shakespeare package provides a full open set of shakespeare's works 4 The Open Shakespeare package provides a full open set of shakespeare's works 5 (often in multiple versions) along with ancillary material, a variety of tools 5 (often in multiple versions) along with ancillary material, a variety of tools 6 and a python API. 6 and a python API. 7 7 8 Specifically in addition to the works themselves (often in multiple versions) 8 Specifically in addition to the works themselves (often in multiple versions) 9 there is an introduction, a chronology, explanatory notes, a concordance and 9 there is an introduction, a chronology, explanatory notes, a concordance and 10 search facilities. 10 search facilities. 11 11 12 All material is open source/open knowledge so that anyone can use, redistribute 12 All material is open source/open knowledge so that anyone can use, redistribute 13 and reuse these materials freely. For exact details of the license under which 13 and reuse these materials freely. For exact details of the license under which 14 this package is made available please see COPYING.txt. 14 this package is made available please see COPYING.txt. 15 15 16 Open Shakespeare has been developed under the aegis of the Open Knowledge 16 Open Shakespeare has been developed under the aegis of the Open Knowledge 17 Foundation (http://www.okfn.org/). 17 Foundation (http://www.okfn.org/). 18 18 19 Contact the Project 19 Contact the Project 20 ******************* 20 ******************* 21 21 22 Please mail info@okfn.org or join the okfn-discuss mailing list: 22 Please mail info@okfn.org or join the okfn-discuss mailing list: 23 23 24 http://lists.okfn.org/listinfo/okfn-discuss 24 http://lists.okfn.org/listinfo/okfn-discuss 25 25 26 26 27 Installation and Setup 27 Installation and Setup 28 ********************** 28 ********************** 29 29 30 1. Install the code 30 1. Install the code 31 =================== 31 =================== 32 32 33 1.1: (EITHER) Install using setup.py (preferred) 33 1.1: (EITHER) Install using setup.py (preferred) 34 ------------------------------------------------ 34 ------------------------------------------------ 35 35 36 Install ``shakespeare`` using easy_install:: 36 Install ``shakespeare`` using easy_install:: 37 37 38 easy_install shakespeare 38 easy_install shakespeare 39 39 40 NB: If you don't have easy_install you can get from here: 40 NB: If you don't have easy_install you can get from here: 41 41 42 <http://peak.telecommunity.com/DevCenter/EasyInstall#installation-instructions> 42 <http://peak.telecommunity.com/DevCenter/EasyInstall#installation-instructions> 43 43 44 Make a config file as follows:: 44 Make a config file as follows:: 45 45 46 paster make-config shakespeare config.ini 46 paster make-config shakespeare config.ini 47 47 48 Tweak the config file as appropriate and then setup the application:: 48 Tweak the config file as appropriate and then setup the application:: 49 49 50 paster setup-app config.ini 50 paster setup-app config.ini 51 51 52 1.2 (OR) Get the code straight from subversion 52 1.2 (OR) Get the code straight from subversion 53 ------------------------------------------------ 53 ------------------------------------------------ 54 54 55 1. Check out the subversion trunk:: 55 1. Check out the subversion trunk:: 56 56 57 svn co https://knowledgeforge.net/shakespeare/svn/trunk 57 svn co https://knowledgeforge.net/shakespeare/svn/trunk 58 58 59 2. Do:: 59 2. Do:: 60 60 61 sudo python setup.py develop 61 sudo python setup.py develop 62 62 63 63 64 2. Cache Directory 64 2. Cache Directory 65 ================== 65 ================== 66 66 67 Create a cache directory where texts and other material can be stored 67 Create a cache directory where texts and other material can be stored 68 68 69 This directory needs to be semi-permanent so do *not* put under a location such 69 This directory needs to be semi-permanent so do *not* put under a location such 70 as /tmp. 70 as /tmp. 71 71 72 72 73 73 74 5. Initialize the system 74 5. Initialize the system 75 ======================== 75 ======================== 76 76 77 Run: $ bin/shakespeare-admin init77 Run:: 78 78 79 This may take some time to run so be patient 79 $ shakespeare-admin db create 80 $ shakespeare-admin db init 80 81 81 TIP: using sqlite building the concordance really **does** seem to run forever 82 If you want to build the concordance do:: 82 so recommend using postgresql or mysql if you are going to build the 83 83 concordance. 84 $ shakespeare-admin concordance 85 86 NB: This may take some time to run so be patient. TIP: using sqlite building 87 the concordance really **does** seem to run forever so recommend using 88 postgresql or mysql if you are going to build the concordance. 84 89 85 90 86 Getting Started 91 Getting Started 87 *************** 92 *************** 88 93 89 As a user: 94 As a user: 90 ========== 95 ========== 91 96 92 Start up the web interface by running the webserver: 97 Start up the web interface by running the webserver:: 93 98 94 $ bin/shakespeare-admin runserver99 $ paster serve {your-config.ini} 95 100 96 Then visit http://localhost:8080/ using your favourite web browser. 101 NB: {your-config.ini} should be replaced with the name of the config file you 102 created earlier. 103 97 104 98 As a developer: 105 As a developer: 99 =============== 106 =============== 100 107 101 0. Copy development.ini.tmpl to development.ini and edit to your taste. 108 0. Copy development.ini.tmpl to development.ini and edit to your taste. 102 109 103 1. Check out the administrative commands: $ bin/shakespeare-admin help. 110 1. Check out the administrative commands: $ bin/shakespeare-admin help. 104 111 105 2. Run the tests using either py.test of nosetests:: 112 2. Run the tests using either py.test of nosetests:: 106 113 107 $ nosetests shakespeare 114 $ nosetests shakespeare 108 115 trunk/contrib/size.py
Revision 61 Revision 154 1 #!/usr/bin/env python 1 #!/usr/bin/env python 2 """ 2 """ 3 Print shakespeare plays and their sizes. 3 Print shakespeare plays and their sizes. 4 4 5 Use Gutenberg plain versions 5 Use Gutenberg plain versions 6 """ 6 """ 7 import shakespeare.index 7 import shakespeare.index 8 8 9 def count_words(fileobj): 9 def count_words(fileobj): 10 """Count the number of words in a file.""" 10 """Count the number of words in a file.""" 11 count = 0 11 count = 0 12 for line in fileobj: 12 for line in fileobj: 13 words = line.split() 13 words = line.split() 14 count += len(words) 14 count += len(words) 15 return count 15 return count 16 16 17 numtexts = 0 17 numtexts = 0 18 totalwords = 0 18 totalwords = 0 19 for text in shakespeare.index.all: 19 for text in shakespeare.index.all: 20 # if you wanted the title it would be text.title 20 # if you wanted the title it would be text.title 21 name = text.name 21 name = text.name 22 # want gutenberg version but not folios 22 # want gutenberg version but not folios 23 # if you want to include folios remove the second condition 23 # if you want to include folios remove the second condition 24 if '_gut' in name and not '_gut_f' in name: 24 if '_gut' in name and not '_gut_f' in name: 25 numtexts += 1 25 numtexts += 1 26 fileobj = file(text.get_ cache_path('plain'))26 fileobj = file(text.get_text()) 27 numwords = count_words(fileobj) 27 numwords = count_words(fileobj) 28 print name.ljust(60), numwords 28 print name.ljust(60), numwords 29 totalwords += numwords 29 totalwords += numwords 30 print '-------------------------' 30 print '-------------------------' 31 print 'Total: %s works, %s words' % (numtexts, totalwords) 31 print 'Total: %s works, %s words' % (numtexts, totalwords) trunk/contrib/view_raw.py
Revision 91 Revision 154 1 #!/usr/bin/env python 1 #!/usr/bin/env python 2 import sys 2 import sys 3 3 4 import shakespeare.dm 4 import shakespeare.dm 5 5 6 name = sys.argv[1] 6 name = sys.argv[1] 7 work = shakespeare.dm.Material.byName(name) 7 work = shakespeare.dm.Material.byName(name) 8 path = work.get_ cache_path('plain')8 path = work.get_text() 9 ff = file(path) 9 ff = file(path) 10 print path 10 print path 11 indata = unicode(ff.read(), 'utf-8') 11 indata = unicode(ff.read(), 'utf-8') 12 print indata.encode('utf-8') 12 print indata.encode('utf-8') trunk/shakespeare/cli.py
Revision 151 Revision 154 1 #!/usr/bin/env python 1 #!/usr/bin/env python 2 2 3 import cmd 3 import cmd 4 import os 4 import os 5 import StringIO 5 import StringIO 6 6 7 class ShakespeareAdmin(cmd.Cmd): 7 class ShakespeareAdmin(cmd.Cmd): 8 """ 8 """ 9 TODO: self.verbose option and associated self._print 9 TODO: self.verbose option and associated self._print 10 """ 10 """ 11 11 12 prompt = 'The Bard > ' 12 prompt = 'The Bard > ' 13 13 14 def run_interactive(self, line=None): 14 def run_interactive(self, line=None): 15 """Run an interactive session. 15 """Run an interactive session. 16 """ 16 """ 17 print 'Welcome to shakespeare-admin interactive mode\n' 17 print 'Welcome to shakespeare-admin interactive mode\n' 18 self.do_about() 18 self.do_about() 19 print 'Type: "?" or "help" for help on commands.\n' 19 print 'Type: "?" or "help" for help on commands.\n' 20 while 1: 20 while 1: 21 try: 21 try: 22 self.cmdloop() 22 self.cmdloop() 23 break 23 break 24 except KeyboardInterrupt: 24 except KeyboardInterrupt: 25 raise 25 raise 26 26 27 def do_help(self, line=None): 27 def do_help(self, line=None): 28 cmd.Cmd.do_help(self, line) 28 cmd.Cmd.do_help(self, line) 29 29 30 def do_about(self, line=None): 30 def do_about(self, line=None): 31 import shakespeare 31 import shakespeare 32 version = shakespeare.__version__ 32 version = shakespeare.__version__ 33 about = \ 33 about = \ 34 '''Open Shakespeare version %s. Copyright the Open Knowledge Foundation. 34 '''Open Shakespeare version %s. Copyright the Open Knowledge Foundation. 35 Open Shakespeare is open-knowledge and open-source. See COPYING for details. 35 Open Shakespeare is open-knowledge and open-source. See COPYING for details. 36 ''' % version 36 ''' % version 37 print about 37 print about 38 38 39 def do_quit(self, line=None): 39 def do_quit(self, line=None): 40 sys.exit() 40 sys.exit() 41 41 42 def do_EOF(self, *args): 42 def do_EOF(self, *args): 43 print '' 43 print '' 44 sys.exit() 44 sys.exit() 45 45 46 # ================= 46 # ================= 47 # Commands 47 # Commands 48 48 49 def do_db(self, line=None): 49 def do_db(self, line=None): 50 actions = [ 'create', 'clean', 'rebuild' ]50 actions = [ 'create', 'clean', 'rebuild', 'init' ] 51 if line is None or line not in actions: 51 if line is None or line not in actions: 52 self.help_db() 52 self.help_db() 53 return 1 53 return 1 54 import shakespeare.dm 54 import shakespeare.model 55 shakespeare.dm.__dict__[line+'db']() 55 if line == 'init': 56 import pkg_resources 57 pkg = 'shksprdata' 58 meta = pkg_resources.resource_stream(pkg, 'texts/metadata.txt') 59 shakespeare.model.Material.load_from_metadata(meta) 60 else: 61 shakespeare.model.__dict__[line+'db']() 56 62 57 def help_db(self, line=None): 63 def help_db(self, line=None): 58 usage = \ 64 usage = \ 59 '''db <action> 65 '''db { create | clean | rebuild | init } 60 66 ''' 61 Where action is one of create, clean, rebuild.''' 62 print usage 67 print usage 63 68 64 def do_gutenberg(self, line=None): 69 def do_gutenberg(self, line=None): 65 import shakespeare.gutenberg 70 import shakespeare.gutenberg 66 helper = shakespeare.gutenberg.Helper(verbose=True) 71 helper = shakespeare.gutenberg.Helper(verbose=True) 67 if not line: 72 if not line: 68 helper.execute() 73 helper.execute() 69 elif line == 'print_index': 74 elif line == 'print_index': 70 import pprint 75 import pprint 71 pprint.pprint(helper.get_index()) 76 pprint.pprint(helper.get_index()) 72 else: 77 else: 73 msg = 'Unknown argument %s' % line 78 msg = 'Unknown argument %s' % line 74 raise Exception(msg) 79 raise Exception(msg) 75 80 76 def help_gutenberg(self, line=None): 81 def help_gutenberg(self, line=None): 77 usage = \ 82 usage = \ 78 """ 83 """ 79 Download and process all Project Gutenberg shakespeare texts""" 84 Download and process all Project Gutenberg shakespeare texts""" 80 print usage 85 print usage 81 86 82 def do_moby(self, line=None): 87 def do_moby(self, line=None): 83 import shakespeare.moby 88 import shakespeare.moby 84 helper = shakespeare.moby.Helper(verbose=True) 89 helper = shakespeare.moby.Helper(verbose=True) 85 if not line: 90 if not line: 86 helper.execute() 91 helper.execute() 87 elif line == 'print_index': 92 elif line == 'print_index': 88 import pprint 93 import pprint 89 pprint.pprint(helper.get_index()) 94 pprint.pprint(helper.get_index()) 90 else: 95 else: 91 msg = 'Unknown argument %s' % line 96 msg = 'Unknown argument %s' % line 92 raise Exception(msg) 97 raise Exception(msg) 93 98 94 def help_moby(self, line=None): 99 def help_moby(self, line=None): 95 usage = \ 100 usage = \ 96 ''' 101 ''' 97 Download and process all Moby/Bosak shakespeare texts''' 102 Download and process all Moby/Bosak shakespeare texts''' 98 print usage 103 print usage 99 104 100 def _init_index(self): 105 def _init_index(self): 101 import shakespeare.index 106 import shakespeare.index 102 self._index = shakespeare.index.all 107 self._index = shakespeare.index.all 103 108 104 def _filter_index(self, line): 109 def _filter_index(self, line): 105 """Filter items in index return only those whose id (url) is in line 110 """Filter items in index return only those whose id (url) is in line 106 If line is empty or None return all items 111 If line is empty or None return all items 107 """ 112 """ 108 if line: 113 if line: 109 textsToAdd = [] 114 textsToAdd = [] 110 textNames = line.split() 115 textNames = line.split() 111 for item in self._index: 116 for item in self._index: 112 if item.name in textNames: 117 if item.name in textNames: 113 textsToAdd.append(item) 118 textsToAdd.append(item) 114 return textsToAdd 119 return textsToAdd 115 else: 120 else: 116 self._init_index() 121 self._init_index() 117 return self._index 122 return self._index 118 123 119 def do_ print_index(self, line):124 def do_index(self, line): 120 self._init_index() 125 self._init_index() 121 header = \ 126 header = \ 122 ''' +-------------------+ 127 ''' +-------------------+ 123 | Index of Material | 128 | Index of Material | 124 +-------------------+ 129 +-------------------+ 125 130 126 ''' 131 ''' 127 print header 132 print header 128 for row in self._index: 133 for row in self._index: 129 print row.name.ljust(35), row.title 134 print row.name.ljust(35), row.title 130 135 131 def help_ print_index(self, line=None):136 def help_index(self, line=None): 132 usage = \ 137 usage = \ 133 '''Print index of Shakespeare texts to stdout''' 138 '''Print index of Shakespeare texts to stdout''' 134 print usage 139 print usage 135 140 136 def do_ make_concordance(self, line=None):141 def do_concordance(self, line=None): 137 self._init_index() 142 self._init_index() 138 print 'Making concordance (this may take some time ...):' 143 print 'Making concordance (this may take some time ...):' 139 from shakespeare.concordance import ConcordanceBuilder 144 from shakespeare.concordance import ConcordanceBuilder 140 import time 145 import time 141 start = end = 0 146 start = end = 0 142 start = time.time() 147 start = time.time() 143 cc = ConcordanceBuilder() 148 cc = ConcordanceBuilder() 144 textsToAdd = [] 149 textsToAdd = [] 145 if line is not None: 150 if line is not None: 146 textsToAdd = self._filter_index(line) 151 textsToAdd = self._filter_index(line) 147 else: 152 else: 148 def gut_non_folio(material): 153 def gut_non_folio(material): 149 return '_gut' in material.name and 'gut_f' not in material.name 154 return '_gut' in material.name and 'gut_f' not in material.name 150 textsToAdd = filter(gut_non_folio, self._index) 155 textsToAdd = filter(gut_non_folio, self._index) 151 for item in textsToAdd: 156 for item in textsToAdd: 152 print 'Adding: %s (%s)' % (item.name, item.title) 157 print 'Adding: %s (%s)' % (item.name, item.title) 153 cc.add_text(item.name) 158 cc.add_text(item.name) 154 end = time.time() 159 end = time.time() 155 timetaken = end - start 160 timetaken = end - start 156 print 'Finished. Time taken was %ss' % timetaken 161 print 'Finished. Time taken was %ss' % timetaken 157 162 158 def help_ make_concordance(self, line=None):163 def help_concordance(self, line=None): 159 usage = \ 164 usage = \ 160 '''Create a concordance 165 '''Create a concordance 161 166 162 If no arguments supplied then use all non-folio gutenberg shakespeare texts. 167 If no arguments supplied then use all non-folio gutenberg shakespeare texts. 163 Otherwise arguments should be a space seperated list of work name ids 168 Otherwise arguments should be a space seperated list of work name ids 164 ''' 169 ''' 165 print usage166 167 def do_init(self, line=None):168 self.do_gutenberg(line)169 self.do_moby(line)170 self.do_make_concordance(line)171 172 def help_init(self, line=None):173 usage = \174 '''Convenience function that sets everything up by running:175 1. gutenberg176 2. moby177 3. make_concordance'''178 print usage 170 print usage 179 171 180 def do_runserver(self, line=None): 172 def do_runserver(self, line=None): 181 self.help_runserver() 173 self.help_runserver() 182 174 183 def help_runserver(self, line=None): 175 def help_runserver(self, line=None): 184 usage = \ 176 usage = \ 185 '''This command has been DEPRECATED. 177 '''This command has been DEPRECATED. 186 178 187 Please use `paster serve` to run a server now, e.g.:: 179 Please use `paster serve` to run a server now, e.g.:: 188 180 189 paster serve <my-config.ini> 181 paster serve <my-config.ini> 190 ''' 182 ''' 191 print usage 183 print usage 192 184 193 185 194 def main(): 186 def main(): 195 import optparse 187 import optparse 196 usage = \ 188 usage = \ 197 '''%prog [options] <command> 189 '''%prog [options] <command> 198 190 199 Run about or help for details.''' 191 Run about or help for details.''' 200 parser = optparse.OptionParser(usage) 192 parser = optparse.OptionParser(usage) 201 parser.add_option('-v', '--verbose', dest='verbose', help='Be verbose', 193 parser.add_option('-v', '--verbose', dest='verbose', help='Be verbose', 202 action='store_true', default=False) 194 action='store_true', default=False) 203 options, args = parser.parse_args() 195 options, args = parser.parse_args() 204 196 205 if len(args) == 0: 197 if len(args) == 0: 206 parser.print_help() 198 parser.print_help() 207 return 1 199 return 1 208 else: 200 else: 209 cmd = ShakespeareAdmin() 201 cmd = ShakespeareAdmin() 210 args = ' '.join(args) 202 args = ' '.join(args) 211 args = args.replace('-','_') 203 args = args.replace('-','_') 212 cmd.onecmd(args) 204 cmd.onecmd(args) 213 205 trunk/shakespeare/concordance.py
Revision 150 Revision 154 1 """ 1 """ 2 Concordance (and statistics) for texts in database. 2 Concordance (and statistics) for texts in database. 3 3 4 To build concordance use ConcordanceBuilder. To access concordance/statistics 4 To build concordance use ConcordanceBuilder. To access concordance/statistics 5 use Concordance/Statistics class. Concordance and statistics are provided as 5 use Concordance/Statistics class. Concordance and statistics are provided as 6 dictionaries keyed by words. 6 dictionaries keyed by words. 7 7 8 NB: all word keys have been lower-cased in order to render them 8 NB: all word keys have been lower-cased in order to render them 9 case-insensitive 9 case-insensitive 10 """ 10 """ 11 import re 11 import re 12 12 13 import sqlobject 13 import sqlobject 14 14 15 import shakespeare.index 15 import shakespeare.index 16 import shakespeare.cache 16 import shakespeare.cache 17 17 18 18 19 class ConcordanceBase(object): 19 class ConcordanceBase(object): 20 """ 20 """ 21 TODO: caching?? 21 TODO: caching?? 22 """ 22 """ 23 sqlcc = shakespeare.model.Concordance 23 sqlcc = shakespeare.model.Concordance 24 sqlstat = shakespeare.model.Statistic 24 sqlstat = shakespeare.model.Statistic 25 25 26 def __init__(self, filter_names=None): 26 def __init__(self, filter_names=None): 27 """ 27 """ 28 @param filter_names: a list of id names with which to filter results 28 @param filter_names: a list of id names with which to filter results 29 (i.e. only return results relating to those texts) 29 (i.e. only return results relating to those texts) 30 """ 30 """ 31 self._filter_names = filter_names 31 self._filter_names = filter_names 32 self.sqlcc_filter = self._make_filter(self.sqlcc) 32 self.sqlcc_filter = self._make_filter(self.sqlcc) 33 self.sqlstat_filter = self._make_filter(self.sqlstat) 33 self.sqlstat_filter = self._make_filter(self.sqlstat) 34 34 35 def _make_filter(self, sqlobj): 35 def _make_filter(self, sqlobj): 36 sql_filter = True 36 sql_filter = True 37 if self._filter_names is not None: 37 if self._filter_names is not None: 38 arglist = [] 38 arglist = [] 39 for name in self._filter_names: 39 for name in self._filter_names: 40 newarg = sqlobj.q.textID == self._name2id(name) 40 newarg = sqlobj.q.textID == self._name2id(name) 41 arglist.append(newarg) 41 arglist.append(newarg) 42 sql_filter = sqlobject.OR(*arglist) 42 sql_filter = sqlobject.OR(*arglist) 43 return sql_filter 43 return sql_filter 44 44 45 def _name2id(self, name): 45 def _name2id(self, name): 46 return shakespeare.model.Material.byName(name).id 46 return shakespeare.model.Material.byName(name).id 47 47 48 def keys(self): 48 def keys(self): 49 """Return list of *distinct* words in concordance/statistics 49 """Return list of *distinct* words in concordance/statistics 50 """ 50 """ 51 all = self.sqlstat.select(self.sqlstat_filter, 51 all = self.sqlstat.select(self.sqlstat_filter, 52 orderBy=self.sqlstat.q.word, 52 orderBy=self.sqlstat.q.word, 53 ) 53 ) 54 words = [ xx.word for xx in list(all) ] 54 words = [ xx.word for xx in list(all) ] 55 distinct = list(set(words)) 55 distinct = list(set(words)) 56 distinct.sort() 56 distinct.sort() 57 return distinct 57 return distinct 58 58 59 59 60 class Concordance(ConcordanceBase): 60 class Concordance(ConcordanceBase): 61 """Concordance by word for a set of texts 61 """Concordance by word for a set of texts 62 """ 62 """ 63 63 64 def get(self, word): 64 def get(self, word): 65 """Get list of occurrences for word 65 """Get list of occurrences for word 66 @return: sqlobject query list 66 @return: sqlobject query list 67 """ 67 """ 68 select = self.sqlcc.select(sqlobject.AND(self.sqlcc_filter, self.sqlcc.q.word==word)) 68 select = self.sqlcc.select(sqlobject.AND(self.sqlcc_filter, self.sqlcc.q.word==word)) 69 return select 69 return select 70 70 71 class Statistics(ConcordanceBase): 71 class Statistics(ConcordanceBase): 72 72 73 def get(self, word): 73 def get(self, word): 74 select = self.sqlstat.select( 74 select = self.sqlstat.select( 75 sqlobject.AND(self.sqlstat_filter, self.sqlstat.q.word==word) 75 sqlobject.AND(self.sqlstat_filter, self.sqlstat.q.word==word) 76 ) 76 ) 77 total = 0 77 total = 0 78 for stat in select: 78 for stat in select: 79 total += stat.occurrences 79 total += stat.occurrences 80 return total 80 return total 81 81 82 class ConcordanceBuilder(object): 82 class ConcordanceBuilder(object): 83 """Build a concordance and associated statistics for a set of texts. 83 """Build a concordance and associated statistics for a set of texts. 84 84 85 """ 85 """ 86 86 87 # multiline, unicode and ignorecase 87 # multiline, unicode and ignorecase 88 word_regex = re.compile(r'\b(\w+)\b', re.U | re.M | re.I) 88 word_regex = re.compile(r'\b(\w+)\b', re.U | re.M | re.I) 89 89 90 words_to_ignore = [ 90 words_to_ignore = [ 91 # 'a', 'the', 'and', 'as', 'are', 'be', 'but', 'in' 91 # 'a', 'the', 'and', 'as', 'are', 'be', 'but', 'in' 92 ] 92 ] 93 non_words = [ 93 non_words = [ 94 'd', # accus'd 94 'd', # accus'd 95 't', 95 't', 96 ] 96 ] 97 97 98 def is_roman_numeral(self, word): 98 def is_roman_numeral(self, word): 99 digits = [ 'i', 'ii', 'iii', 'iv', 'v', 'vi', 'vii', 'viii', 'ix' ] 99 digits = [ 'i', 'ii', 'iii', 'iv', 'v', 'vi', 'vii', 'viii', 'ix' ] 100 others = [ 'l', 'x', 'c' ] 100 others = [ 'l', 'x', 'c' ] 101 if word == 'i': return False # exception because this conflicts with I 101 if word == 'i': return False # exception because this conflicts with I 102 while word[0] in others: 102 while word[0] in others: 103 if len(word) == 1: 103 if len(word) == 1: 104 return True 104 return True 105 else: 105 else: 106 word = word[1:] 106 word = word[1:] 107 return word in digits 107 return word in digits 108 108 109 def ignore_word(self, word): 109 def ignore_word(self, word): 110 "Return True if this word should not be added to the concordance." 110 "Return True if this word should not be added to the concordance." 111 bool1 = word in self.words_to_ignore 111 bool1 = word in self.words_to_ignore 112 bool2 = word in self.non_words 112 bool2 = word in self.non_words 113 # do roman numerals 113 # do roman numerals 114 bool3 = self.is_roman_numeral(word) 114 bool3 = self.is_roman_numeral(word) 115 return bool1 or bool2 or bool3 115 return bool1 or bool2 or bool3 116 116 117 def _text_already_done(self, text): 117 def _text_already_done(self, text): 118 numrecs = shakespeare.model.Concordance.select( 118 numrecs = shakespeare.model.Concordance.select( 119 shakespeare.model.Concordance.q.textID==text.id 119 shakespeare.model.Concordance.q.textID==text.id 120 ).count() 120 ).count() 121 return numrecs > 0 121 return numrecs > 0 122 122 123 def add_text(self, name, text=None): 123 def add_text(self, name, text=None): 124 """Add a text to the concordance. 124 """Add a text to the concordance. 125 @param name: name of text to add 125 @param name: name of text to add 126 @param text: [optional] a file-like object containing text data. If not 126 @param text: [optional] a file-like object containing text data. If not 127 provided will default to using file in cache associated with named 127 provided will default to using file in cache associated with named 128 text 128 text 129 """ 129 """ 130 dmText = shakespeare.model.Material.byName(name) 130 dmText = shakespeare.model.Material.byName(name) 131 if self._text_already_done(dmText): 131 if self._text_already_done(dmText): 132 msg = 'Have already added to concordance text: %s' % dmText 132 msg = 'Have already added to concordance text: %s' % dmText 133 # raise ValueError(msg) 133 # raise ValueError(msg) 134 print msg 134 print msg 135 print 'Skipping' 135 print 'Skipping' 136 return 136 return 137 if text is None: 137 if text is None: 138 tpath = dmText.get_ cache_path('plain')138 tpath = dmText.get_text() 139 text = file(tpath) 139 text = file(tpath) 140 lineCount = 0 140 lineCount = 0 141 charIndex = 0 141 charIndex = 0 142 stats = {} 142 stats = {} 143 trans = shakespeare.model.Concordance._connection.transaction() 143 trans = shakespeare.model.Concordance._connection.transaction() 144 for line in text.readlines(): 144 for line in text.readlines(): 145 for match in self.word_regex.finditer(line): 145 for match in self.word_regex.finditer(line): 146 word = match.group().lower() # case insensitive 146 word = match.group().lower() # case insensitive 147 if self.ignore_word(word): 147 if self.ignore_word(word): 148 continue 148 continue 149 shakespeare.model.Concordance(connection=trans, 149 shakespeare.model.Concordance(connection=trans, 150 text=dmText, 150 text=dmText, 151 word=word, 151 word=word, 152 line=lineCount, 152 line=lineCount, 153 char_index=charIndex+match.start()) 153 char_index=charIndex+match.start()) 154 stats[word] = stats.get(word, 0) + 1 154 stats[word] = stats.get(word, 0) + 1 155 lineCount += 1 155 lineCount += 1 156 charIndex += len(line) 156 charIndex += len(line) 157 trans.commit() 157 trans.commit() 158 trans = shakespeare.model.Concordance._connection.transaction() 158 trans = shakespeare.model.Concordance._connection.transaction() 159 for word, value in stats.items(): 159 for word, value in stats.items(): 160 tresults = shakespeare.model.Statistic.select( 160 tresults = shakespeare.model.Statistic.select( 161 sqlobject.AND( 161 sqlobject.AND( 162 shakespeare.model.Statistic.q.textID == dmText.id, 162 shakespeare.model.Statistic.q.textID == dmText.id, 163 shakespeare.model.Statistic.q.word == word 163 shakespeare.model.Statistic.q.word == word 164 )) 164 )) 165 try: 165 try: 166 dbstat = list(tresults)[0] 166 dbstat = list(tresults)[0] 167 dbstat.occurrences += value 167 dbstat.occurrences += value 168 except: 168 except: 169 shakespeare.model.Statistic( 169 shakespeare.model.Statistic( 170 connection=trans, 170 connection=trans, 171 text=dmText, 171 text=dmText, 172 word=word, 172 word=word, 173 occurrences=value 173 occurrences=value 174 ) 174 ) 175 trans.commit() 175 trans.commit() 176 176 177 177 178 def remove_text(self, name): 178 def remove_text(self, name): 179 """Remove a text from the concordance. 179 """Remove a text from the concordance. 180 180 181 @param name: as for add_text 181 @param name: as for add_text 182 """ 182 """ 183 dmText = shakespeare.model.Material.byName(name) 183 dmText = shakespeare.model.Material.byName(name) 184 recs = shakespeare.model.Concordance.select( 184 recs = shakespeare.model.Concordance.select( 185 shakespeare.model.Concordance.q.textID==dmText.id 185 shakespeare.model.Concordance.q.textID==dmText.id 186 ) 186 ) 187 for rec in recs: 187 for rec in recs: 188 shakespeare.model.Concordance.delete(rec.id) 188 shakespeare.model.Concordance.delete(rec.id) 189 stats = shakespeare.model.Statistic.select( 189 stats = shakespeare.model.Statistic.select( 190 shakespeare.model.Statistic.q.textID==dmText.id 190 shakespeare.model.Statistic.q.textID==dmText.id 191 ) 191 ) 192 for stat in stats: 192 for stat in stats: 193 shakespeare.model.Statistic.delete(stat.id) 193 shakespeare.model.Statistic.delete(stat.id) 194 194 trunk/shakespeare/controllers/site.py
Revision 150 Revision 154 1 import logging 1 import logging 2 2 3 import genshi 3 import genshi 4 4 5 from shakespeare.lib.base import * 5 from shakespeare.lib.base import * 6 6 7 import shakespeare 7 import shakespeare 8 import shakespeare.index 8 import shakespeare.index 9 import shakespeare.format 9 import shakespeare.format 10 import shakespeare.concordance 10 import shakespeare.concordance 11 import shakespeare.model as model 11 import shakespeare.model as model 12 12 13 # import this after dm so that db connection is set 13 # import this after dm so that db connection is set 14 import annotater.store 14 import annotater.store 15 import annotater.marginalia 15 import annotater.marginalia 16 16 17 log = logging.getLogger(__name__) 17 log = logging.getLogger(__name__) 18 18 19 19 20 class SiteController(BaseController): 20 class SiteController(BaseController): 21 21 22 def index(self): 22 def index(self): 23 c.works_index = shakespeare.index.all 23 c.works_index = shakespeare.index.all 24 return render('index') 24 return render('index') 25 25 26 def guide(self): 26 def guide(self): 27 return render('guide') 27 return render('guide') 28 28 29 def view(self): 29 def view(self): 30 name = request.params.get('name', '') 30 name = request.params.get('name', '') 31 format = request.params.get('format', 'plain') 31 format = request.params.get('format', 'plain') 32 if format == 'annotate': 32 if format == 'annotate': 33 return self.view_annotate(name) 33 return self.view_annotate(name) 34 namelist = name.split() 34 namelist = name.split() 35 numtexts = len(namelist) 35 numtexts = len(namelist) 36 textlist = [model.Material.byName(tname) for tname in namelist] 36 textlist = [model.Material.byName(tname) for tname in namelist] 37 # special case (only return the first text) 37 # special case (only return the first text) 38 if format == 'raw': 38 if format == 'raw': 39 tpath = textlist[0].get_cache_path('plain') 39 result = textlist[0].get_text().read() 40 result = file(tpath).read() 41 status = '200 OK' 40 status = '200 OK' 42 response.headers['Content-Type'] = 'text/plain' 41 response.headers['Content-Type'] = 'text/plain' 43 return result 42 return result 44 texts = [] 43 texts = [] 45 for item in textlist: 44 for item in textlist: 46 tpath = item.get_cache_path('plain') 45 tfileobj = item.get_text() 47 tfileobj = file(tpath) 48
