[deepamehta-devel] Translations DM-style

Malte Reissig mre at deepamehta.de
Wed Dec 14 11:06:09 CET 2016


Hi Robert,

For a proposal using properties for storing multilingual topic values 
see also jri's reply to my posting to this list from June 2015.

http://lists.deepamehta.de/pipermail/devel-lists.deepamehta.de/2015-June/000599.html

How i understood jri's reply is that we could just

- declare a topics value (the thing the dm4 storage stores in "value") 
as some default (language) value, e.g. "en_UK"

- and introduce a "value_de_DE" property for each topic's translated value.

As jri' points out, this approach would have some advantages about the 
way i explored with trying to integrate multi-lingual topic values for 
items (and properties) stored in wikidata.

I am in the hope that this helps you to identify the best track in advance.

Kind Regards,

Malte

-- 

What i can now say about the decisions for the two wikidata-plugins is:

1. About representing languages in DM4:

* wikipedia and thus wikidata uses a two letter code to identify their 
wikipedias in various languages. In the dm4-wikidata plugin you can find 
a migration to create 43 curated language identifier topics each 
representing a Wikipedia languageCode (with its Human Readable Label)

I think their language identifier system is based on the (now obsolte) 
ISO-639-1 (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) but i 
am not completely sure and i could not confirm this with a quick google 
search on wikimedia or wikidata api pages

* the latest IETF recommendation for identifying languages by strings 
is, up to my knowledge, the RFC 5646 - though i have seen some 
developers/systems implementing it with an underscore (like "de_DE" 
instead of "de-DE")

2. About modeling multi-lingual values in DM 4:

* In the dm4-wikidata-toolkit plugin i developed a "Entity Value" topic 
type which aggregates the above mentioned wikipedia language identifier 
topics. A "Wikidata Item" or "Wikidata Property" then has many "Entity 
Value" topics (in various languages).


On 13.12.2016 09:05, Robert Schuster wrote:
> Hi alltogether,
> soon I'll have the task of providing translation support to DM-managed.
> In a first shot the project was realized without translation support and
> in a 2nd phase this is going to be added. Of course, some design
> decision had been made to make this transition smooth (e.g. not rely on
> data that is later translated as keys to data.)
>
> The data that needs to be translated can have the following form:
> (I write Topic like table entries)
>
> a) Algeria | http://link-to-a-german-article.html | 3 | 12 | 3432
>
> or
>
> b) Benin | Die Geschichte des Landes beginnt [...]
>
> So what I want to indicate is that I have basically 3 types of data:
> - facts, figures, statistics that do not need to be translated (a date
> stays a date, some statistics value too)
> - links to external resources that need to be provided in translated
> form, e.g. a link to the English variant of an article)
> - text in one language that needs to be provided in another directly in DM
>
> Has anyone done something similar and can share the approach?
>
> Is there already consensus about this?
>
> What I am having in mind is introducing a custom association
> "translation" which hosts a language specifier and which I then
> associate manually with all the data that is translated. My goal is to
> provide the data in DM via a REST interface. During the transformation
> of the DM-data I'll follow the translation associations to provide the
> translated value of an item.
>
> Does that sound good?
>
> All the best,
> Robert
>
>



More information about the devel mailing list