[deepamehta-devel] Translations DM-style
Malte Reissig
mre at deepamehta.de
Wed Dec 14 11:06:09 CET 2016
Hi Robert,
For a proposal using properties for storing multilingual topic values
see also jri's reply to my posting to this list from June 2015.
http://lists.deepamehta.de/pipermail/devel-lists.deepamehta.de/2015-June/000599.html
How i understood jri's reply is that we could just
- declare a topics value (the thing the dm4 storage stores in "value")
as some default (language) value, e.g. "en_UK"
- and introduce a "value_de_DE" property for each topic's translated value.
As jri' points out, this approach would have some advantages about the
way i explored with trying to integrate multi-lingual topic values for
items (and properties) stored in wikidata.
I am in the hope that this helps you to identify the best track in advance.
Kind Regards,
Malte
--
What i can now say about the decisions for the two wikidata-plugins is:
1. About representing languages in DM4:
* wikipedia and thus wikidata uses a two letter code to identify their
wikipedias in various languages. In the dm4-wikidata plugin you can find
a migration to create 43 curated language identifier topics each
representing a Wikipedia languageCode (with its Human Readable Label)
I think their language identifier system is based on the (now obsolte)
ISO-639-1 (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) but i
am not completely sure and i could not confirm this with a quick google
search on wikimedia or wikidata api pages
* the latest IETF recommendation for identifying languages by strings
is, up to my knowledge, the RFC 5646 - though i have seen some
developers/systems implementing it with an underscore (like "de_DE"
instead of "de-DE")
2. About modeling multi-lingual values in DM 4:
* In the dm4-wikidata-toolkit plugin i developed a "Entity Value" topic
type which aggregates the above mentioned wikipedia language identifier
topics. A "Wikidata Item" or "Wikidata Property" then has many "Entity
Value" topics (in various languages).
On 13.12.2016 09:05, Robert Schuster wrote:
> Hi alltogether,
> soon I'll have the task of providing translation support to DM-managed.
> In a first shot the project was realized without translation support and
> in a 2nd phase this is going to be added. Of course, some design
> decision had been made to make this transition smooth (e.g. not rely on
> data that is later translated as keys to data.)
>
> The data that needs to be translated can have the following form:
> (I write Topic like table entries)
>
> a) Algeria | http://link-to-a-german-article.html | 3 | 12 | 3432
>
> or
>
> b) Benin | Die Geschichte des Landes beginnt [...]
>
> So what I want to indicate is that I have basically 3 types of data:
> - facts, figures, statistics that do not need to be translated (a date
> stays a date, some statistics value too)
> - links to external resources that need to be provided in translated
> form, e.g. a link to the English variant of an article)
> - text in one language that needs to be provided in another directly in DM
>
> Has anyone done something similar and can share the approach?
>
> Is there already consensus about this?
>
> What I am having in mind is introducing a custom association
> "translation" which hosts a language specifier and which I then
> associate manually with all the data that is translated. My goal is to
> provide the data in DM via a REST interface. During the transformation
> of the DM-data I'll follow the translation associations to provide the
> translated value of an item.
>
> Does that sound good?
>
> All the best,
> Robert
>
>
More information about the devel
mailing list