Existing Material That We Could Use
Obviously restricted to open material only.
General Possibilities
- wikipedia
- geonames and placeopedia: for geotagging of locations and wikipedia entries
- dbpedia
- freebase:
- api: completely json oriented (including its own bespoke MQL language).
- data: almost all from wikipedia
WW2 specific
- http://www.ibiblio.org/hyperwar/
- http://www.ww2db.com/ -- not clear about openness
Interface Howtos
DBPedia
It can be a little difficult to get started with dbpedia. When looking to to actually get data out one is confronted with the wealth of options presented on <http://wiki.dbpedia.org/Downloads> and <http://wiki.dbpedia.org/OnlineAccess>.
Since, at least at the start when just experimenting, one wants to avoid having to download dumps and start parsing the raw rdf into some db it seemed best to opt for the online access route. That means looking http://wiki.dbpedia.org/OnlineAccess. After some experimentation[^1] it becomes clear that one should be using the public SQARQL endpoint and that one could probably reverse engineer the basic schema by playing around with the SNORQL online interface at: <http://dbpedia.org/snorql/>[^2][^3]. So what we need is a nice wrapper that allows us to do sqarql queries. As we're python oriented a quick google gives <http://esw.w3.org/topic/SparqlImplementations> and from that:
As the second of these seem a little better documented and had been recently updated let's go for that:
# unfortunately a simple easy_install sparql does not work ... sudo easy_install http://downloads.sourceforge.net/sparql-wrapper/sparql-wrapper-python-1.1.0.tar.gz
Now its time to do some coding ... [some while later] We have some code to QUERY and DESCRIBE resources: browser:trunk/microfacts/getdata/dbpedia.py
[^1]: Tip for dbpedia: queries, such as those with the Leipzig query builder, which take more than 3 minutes to complete are a problem ...
[^2]: Tip you can browse standard wikipedia categories by doing searches of the form (repace ${wikipedia ...} with your category):
SELECT * WHERE {
?subject skos:subject <http://dbpedia.org/resource/${wikipedia_category_name}>.
}
[^3]: Looking through some of the results it is clear that, unfortunately (and probably as a result of wikipedia's "inconsistences", things like dates to not always end up under the same property names or with the same formatting.
Freebase
Freebase main source of information also seems to be wikipedia. However is not RDF based but uses a JSON oriented triple schema of their own invention.
Can get stuff like:
MQL (metaweb query language) is documented at: http://www.freebase.com/view/guid/9202a8c04000641f800000000544e13e
Have some example scripts (e.g. [1]) but had trouble modding this to do anything other than give back albums ...
[1]: http://www.freebase.com/view/guid/9202a8c04000641f8000000005c79adb`
