![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
![[site community profile]](https://www.dreamwidth.org/img/comm_staff.png)
Better multilingual entry support
Title:
Better multilingual entry support
Area:
entries, search
Summary:
Allow entries to be tagged with the language(s) that they are composed of. This can be used to power more interesting things around the site.
Description:
Entries composed of written or spoken material (text, images of writing, audio, video) usually have one or more languages in which the material is presented. Allowing entries to be voluntarily tagged by their owners to describe the language(s) they are using might allow some interesting features to be developed based on entry tagging.
If a particular spelling appears in more than one language, specifying the language of the entry in site search could help find the thing someone's looking for.
Statistics on actual use of the site by users who speak different languages might be helpful to staff, especially if the technical barriers to offering the site in translation are overcome.
It could help users better connect with people who speak their same language, especially users whose preferred language is in a minority on the site.
What would the user interface be like? A whole long list of possible languages could a) be unwieldy, b) might also leave out languages used by actual site users (sign languages and constructed languages spring to mind as languages that might be left out of even a fairly exhaustive list of languages, and entries with embedded video might have sign language, and fannish communities are reasonably likely to include Tengwar and Klingon, and goodness knows there are probably more use cases that I know nothing of).
One way to do it might be like the tags interface, where something can be typed in, and attempt to autofill from a preset list, but accept new entries gracefully. If designed properly, unique data entered here on public entries could be logged, collated, and presented to an administrator on a regular basis for review; items that are found to be actual common languages not present on the list could then be entered.
Any site function that involves searching by language should allow for synonyms -- three different people might use "tlhIngan Hol", "pIqaD", and "Klingon" to mean the same language -- to say nothing of the typos. There should be a way to bundle known synonyms and known typos -- and also a way to override this bundling.
Another challenge is that people might not tag all their entries (to say nothing of back entries). How hard/expensive would it be to autodetect languages? Failing autodetection, could a default be set by user, like the last language they used?
This suggestion:
Should be implemented as-is.
38 (57.6%)
Should be implemented with changes. (please comment)
4 (6.1%)
Shouldn't be implemented.
2 (3.0%)
(I have no opinion)
20 (30.3%)
(Other: please comment)
2 (3.0%)
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
- Autodiscovery doesn't deal well with multiple-language entries, and poster may want to correct it
- Even with single language entries, autodiscovery may either be wrong or not give the desired amount of detail (eg, guess it's pt but the poster wants to specify pt_BR)
- Poster may want to leave the language unspecified, eg if the point is for readers to figure it out by themselves
no subject
no subject
It's possible that you could handle most of the 20% by using the user's list of languages from their profile page (*), either simply (we can't be 100% certain of the auto detect, so let's use the default / not set it) or smarter (auto detect says it's X or Y, the user speaks B, M and X, so X it is).
Personally, while I agree that the post entry page is already busy enough, I would prefer to see it as an enterable field. I think that ultimately, the user should be able to control it. Actually, let me take a step back. I think it depends on how it's to be used.
I think the already linked suggestion for languages on profiles is a good precursor to this, and has all the same questions for how you manage languages and the entry and listing thereof.
If this is something that is supposed to be a user choice and/or will be displayed on individual user entries, then it needs to be something that the user can enter, so it becomes the users choice of whether they specify a single language or multiple, or default to leaving it blank and none at all. Leaving it blank is fine for user display here, but if it's to be used for other tools, you still have the problem of selecting a default language for the entry, which may mean forcing the user to have a primary language, and then other languages, which I'm not keen on.
If it's to be an invisible field, that's more used in the background for allowing more choices on the Latest Things page (I'd love to be able to browse for Japanese / Korean posts), then auto detection is probably good enough, within the provisos above. It takes the onus away from the user of having to key in the same language every single time, but it also takes away their choice.
no subject
I'm not sure how big of an inconvenience that would be for people who post in multiple languages, or whether it might lead to mis-tagging, but for those of us posting in only one it would be a nice convenience. I'm not sure how to weigh those two needs against each other, though.
no subject
no subject
(Tangent: I would still want /latest/ to display all entries in all languages; if I wanted to see only entries in French, I could bookmark something like /latest/?lang=fr to see all the French entries. Or maybe even ?lang=fr,ko,en for multiple languages? But I like seeing the linguistic variety on /latest/; it's a good demonstration that DW isn't just for one particular niche.)
Back to the original suggestion:
- Language tags should constitute a separate entry area from regular tags.
- An entry can contain multiple languages, so tagging an entry with multiple languages should be part of this feature.
- I'd like to be able to set a default language tag going forward; I will write predominantly in English, and I wouldn't want to have to remember to add the language tag every time. I'm already bad enough at consistently using regular tags >_>
- Going backwards, it would be awesome to retroactively apply a language tag as part of the mass-entry-editor, even if this had to be a paid-only or time-delayed feature.
- Synonym-bundling of language tags, yes, absolutely, especially across languages.
no subject
no subject
no subject
no subject
no subject
It's also an elegent solution to an issue now we're getting a fair few non-English posts. I'd also like to see a way of filtering them off my network page if possible, but that'd probably need a separate suggestion after there's a way to tag entries.
no subject
no subject
no subject
no subject
no subject
I'm a bit disillusioned today, I guess :/
no subject
no subject
no subject
no subject
What with the thousands and thousands of languages spoken/written today, yeah, I'd guess so.
I am not sure how I feel about this yet, will have to think about it some more.
no subject
I think canonicalisation would be a good idea, and I think that BCP 47 tags (aka IETF language tags) are a good thing to canonicalise to.
Perhaps something like "Please enter the language(s) used in this post, separated by commas. Please use language tags such as 'en' for English if you know them.", and then if someone enters something else, say something along the lines of "Your value of 'tlhIngan Hol' was recognised as 'tlh' (Klingon, tlhIngan-Hol). Use this standard value or keep your own?", with the language names taken from the "Description" field(s) in the IANA language subtag registry.
The names aren't always the prettiest (for example, "el" is "Modern Greek (1453-)"), but should be recognisable.
no subject
no subject
no subject
no subject
no subject
no subject
I'm envisioning something like the icon choice menu, populated by my previous choices, with the option to add a language as I want it - and being asked to identify the new language in the masterlist so the latest things could display it correctly.
I'm not happy about autodetect. I've had to use Google Translate quite a bit recently, and the detection often fails; but I can see it being problematic particularly for pairings like Serbian and Croatian which are probably close enough to be misdetected - and where users tend to care a whole lot about being misidentified.
no subject
Something like "I write mostly in "
And this setting can be used to set default language tag in the post form, letting user edit it.
no subject