Wikipedie:WikiProjekt Strojové zpracování/Skripty

Překlad parametrů obrázků

Skript překládá parametry obrázků do češtiny a částečně tak nahrazuje funkci translateMagicWords. Navíc však ale překládá i z němčiny, odstraňuje nadbytečné parametry, např. vpravo u náhledu, prázdné nebo duplicitní parametry, apod.

Testováno na

Pywikibot 3 a novější

Potřebné knihovny

regex
pywikibot.textlib

Umístění

napsané k vložení do pywikibot/cosmetic_changes.py
po úpravách i do vlastních skriptů (viz Potřebné knihovny)

Ukázka editace

[1]

Známé nedostatky

Neumí si poradit s obrázky, které obsahují v popisku externí odkaz, tabulku nebo šablonu. Zatím nedoporučeno používat bez řádné kontroly.

Autor

Dvorapa (diskuse)

Poslední úprava

12. listopadu 2021

Kód

    def translateImageParameters(self, text: str) -> str:
        """Use localized image parameters."""
        exceptions = ['nowiki']
        links = regex.findall(r'\[\[(?:[^\[\]]|(?R))+\]\]', text)

        for link in links:
            if not re.match(r'\[\[ *(?:File|Image|Obrázek|Soubor|Datei) *:', link, re.I):
                continue

            parts = [part.strip() for part in link[2:-2].split('|')]

            pre, sep, post = parts[0].partition(':')
            parts[0] = 'Soubor:' + post.lstrip()

            right = False
            position = False
            for i, part in enumerate(parts):
                pre, sep, post = part.partition('=')
                if part.startswith(('thumb', 'thumbnail', 'mini', 'miniatur')):
                    part = 'náhled'
                    if sep:
                        part += '=' + post.lstrip()
                elif part in ('right', 'rechts'):
                    part = 'vpravo'
                elif part in ('left', 'links'):
                    part = 'vlevo'
                elif part in ('none', 'ohne'):
                    part = 'žádné'
                elif part.startswith('hochkant'):
                    part = 'upright'
                    if sep:
                        part += '=' + post.lstrip()
                elif part in ('center', 'centre', 'zentriert'):
                    part = 'střed'
                elif part in ('frame', 'framed', 'enframed', 'gerahmt'):
                    part = 'rám'
                elif part in ('frameless', 'rahmenlos'):
                    part = 'bezrámu'
                elif part.startswith(('lang', 'sprache')):
                    part = 'jazyk'
                    if sep:
                        part += '=' + post.lstrip()
                elif part.startswith(('page', 'seite')):
                    part = 'strana'
                    if sep:
                        part += '=' + post.lstrip()
                elif part in ('border', 'rand'):
                    part = 'okraj'
                elif sep:
                    part = pre.rstrip() + '=' + post.lstrip()
                # elif part.isdigit():
                #     part += 'px'

                if part == 'rám' or part.startswith('náhled'):
                    right = True
                elif part in ('vpravo', 'vlevo', 'střed', 'žádné'):
                    if not position:
                        position = True
                    else:
                        part = None

                parts[i] = part

            # parts = list(dict.fromkeys(parts).keys())

            if right:
                if 'vpravo' in parts:
                    parts.remove('vpravo')

            result = '[[' + '|'.join([part for part in parts if part]) + ']]'
            if not textlib.isDisabled(text, text.find(link)):
                text = text.replace(link, result)
        return text

Zpřehlednění infoboxů

Skript na správná místa uvnitř infoboxů přidá mezery nebo odřádkování nebo naopak odebere nadbytečné mezery nebo odřádkování. Také pracuje s citačními šablonami, pokud nejsou v jednom řádku.

Testováno na

Pywikibot 3 a novější

Potřebné knihovny

re (případně regex)
pywikibot.textlib

Umístění

napsané k vložení do pywikibot/cosmetic_changes.py
po úpravách i do vlastních skriptů (viz Potřebné knihovny)

Ukázka editace

[2]

Známé nedostatky

Aktuálně pracuje s ručním seznamem šablon a výjimek, bylo by dobré, kdyby skript uměl pracovat s TemplateDaty (zatím nad moje síly, resp. asi by to šlo tak, že by si bot musel úplně na začátku vygenerovat seznam blokových šablon)
Také předpokládá, že článek neobsahuje sekvenci tří ostrých s a ostré s se nevyskytuje ani na úplném začátku, ani na úplném konci (znak lze změnit na libovolný jiný)

Autor

Dvorapa (diskuse)

Poslední úprava

8. dubna 2022

Kód

    def beautifyInfoboxes(self, text: str) -> str:
        """Format infoboxes and block templates."""
        exceptions = ['nowiki']

        def prepare(text):
            return r'[' + text[0].upper() + text[0].lower() + r']' + re.escape(text[1:]).replace('\ ', '[ _]')
        def match(templates, part):
            if not templates:
                return False
            return re.match(r'\{\{\s*(?:' + '|'.join([prepare(i) for i in templates]) + ')', part)

        text = textlib.replaceExcept(text, r'\{\{', r'ßßß{{', exceptions)
        text = textlib.replaceExcept(text, r'\}\}', r'}}ßßß', exceptions)
        text = textlib.replaceExcept(text, r'(\[+)', r'ßßß\1', exceptions)
        text = textlib.replaceExcept(text, r'(\]{1,2})', r'\1ßßß', exceptions)
        text = textlib.replaceExcept(text, r'\{\|', r'ßßß{|', exceptions)
        text = textlib.replaceExcept(text, r'\|\}([^\}])', r'|}ßßß\1', exceptions)
        text = textlib.replaceExcept(text, r'\<', r'ßßß<', exceptions)
        text = textlib.replaceExcept(text, r'\>', r'>ßßß', exceptions)
        pageParts = text.strip('ß').split('ßßß')

        inTemplate = [0]
        inLink = [False]
        inTable = [False]
        inTag = [False]
        lines_after = [r'\n']
        initial_spaces = [r'\n ']
        newPageParts = []

        no_line_after = ['Infobox - chemický prvek/Nestabilní izotop', 'Infobox - chemický prvek/Stabilní izotop']
        one_line_after = ['Infobox ', 'NFPA 704', 'Studenti píší Wikipedii', 'Taxobox', 'Singly', 'Kosmické těleso-dceřiné těleso', 'Kosmické těleso-teleskop']
        two_lines_after = []
        lines_after_as_is = ['Citace ']
        block_templates = no_line_after + one_line_after + two_lines_after + lines_after_as_is
        skip_templates = ['Infobox začátek', 'Infobox hlavička', 'Infobox obrázek', 'Infobox dvojitá', 'Infobox jednoduchá', 'Infobox konec', 'Infobox položka', 'Infobox chybí', 'Infobox - ročník Eurovize/Legenda', 'Infobox - animanga/Patička', 'Infobox - železniční trať/legenda', 'Infobox - železniční trať/hlavička', 'Infobox - číslo/řada', 'Infobox - politická strana/mandáty', 'Infobox - letiště/RWY', 'Infobox - letiště/Konec', 'Infobox - budova/kodbarvy', 'Infobox - chemický prvek/Legenda', 'Infobox - chemický prvek/Barva', 'Infobox - chemický prvek/Text', 'Infobox - chemický prvek/Skupina', 'Infobox - chemický prvek/Izotopy', 'Taxobox/barva', 'Taxobox/cat', 'Taxobox/compare', 'Taxobox/Stupeň ohrožení', 'Taxobox/statusWD']
        block_if_block = ['Citace ']

        after_block_template = False
        for part in pageParts:
            block = re.match(r'[^\n\|]+(\n+ *)\|', part)
            if ((match(block_templates, part) and not match(block_if_block, part)) or (block and match(block_if_block, part))) and not match(skip_templates, part):
                inTemplate.append(2)
                if match(no_line_after, part):
                    lines_after.append(r'')
                elif match(one_line_after, part):
                    lines_after.append(r'\n')
                elif match(two_lines_after, part):
                    lines_after.append(r'\n'*2)
                else:  # lines_after_as_is
                    lines_after.append(None)

                if block:
                    initial_spaces.append(block.group(1))
                else:
                    initial_spaces.append(initial_spaces[0])
            elif part[:2] == '{{':
                inTemplate.append(1)
            elif part[:1] == '[':
                inLink.append(True)
            elif part[:2] == '{|':
                inTable.append(True)
            elif part[:1] == '<':
                inTag.append(True)

            if after_block_template:
                if not (self.template or lines_after[-1] is None):
                    part = textlib.replaceExcept(part, r'^\s*', lines_after[-1], exceptions)
                lines_after.pop()
                after_block_template = False
            if inTemplate[-1] == 2 and not inLink[-1] and not inTable[-1] and not inTag[-1]:
                part = textlib.replaceExcept(part, r'\|\s*(?=\||\})', r'', exceptions)
                part = textlib.replaceExcept(part, r'\s*\|\s*', initial_spaces[-1] + r'| ', exceptions)
                part = textlib.replaceExcept(part, r'\{\{\s*', r'{{', exceptions)
                if part[:2] == '{{' and not '|' in part:
                    part = textlib.replaceExcept(part, r'\s*\}\}', r'}}', exceptions)
                else:
                    part = textlib.replaceExcept(part, r'\s*\}\}', initial_spaces[-1].rstrip(' ') + r'}}', exceptions)
                part = textlib.replaceExcept(part, r'\|([^=\|\}]*?)\s*=[ \t]*', r'|\1 = ', exceptions)
                if not re.search(r'\#[0-9a-fA-F]{3,6}', part) and not re.search(r'odkaz na (?:konečné pořadí|statistiky turnaje)', part):
                    part = textlib.replaceExcept(part, r'(\|[^=\|\}]*?=)\s*(\*|\#)', r'\1\n\2', exceptions)
            newPageParts.append(part)

            if part[-2:] == '}}' and inTemplate[-1] > 0:
                if inTemplate[-1] == 2:
                    initial_spaces.pop()
                    after_block_template = True
                inTemplate.pop()
            elif part[-1:] == ']' and inLink[-1]:
                inLink.pop()
            elif part[-2:] == '|}' and inTable[-1]:
                inTable.pop()
            elif part[-1:] == '>' and inTag[-1]:
                inTag.pop()

        return ''.join(newPageParts)