about summary refs log tree commit diff
path: root/pkgs/applications/graphics/tesseract
AgeCommit message (Collapse)AuthorFilesLines
2024-05-21pkgs/applications: remove uneeded fetchpatch argumentsSigmanificient2-2/+2
2024-02-11treewide: add `mainProgram`h7x42-0/+2
2024-01-19tesseract: 5.3.3 -> 5.3.4R. Ryantm1-2/+2
2024-01-03tesseract4: fix `gcc-13` build failureSergei Trofimovich1-0/+8
Without the change `tesseract4` fails the build on `staging-next` as https://hydra.nixos.org/build/245356748: In file included from points.cpp:24: ../../src/ccutil/helpers.h:40:17: error: 'uint64_t' has not been declared 40 | void set_seed(uint64_t seed) { | ^~~~~~~~
2023-11-19Merge pull request #226660 from stweil/tesseractArtturi1-3/+4
2023-10-20tesseract5: 5.3.2 -> 5.3.3R. Ryantm1-2/+2
2023-07-31treewide: Add meta.mainProgramRobert Hensing1-0/+1
This should fix most warnings getExe in based on grepping `nixos/`.
2023-07-15tesseract5: 5.3.1 -> 5.3.2R. Ryantm1-2/+2
2023-04-23tesseract5: update dependenciesStefan Weil1-3/+4
* remove unneeded dependency autoconf-archive * add missing build input libarchive * add missing build input curl Signed-off-by: Stefan Weil <sw@weilnetz.de>
2023-04-07tesseract5: 5.3.0 -> 5.3.1R. Ryantm1-2/+2
2023-01-28tesseract: 3.05.00 -> 3.05.02Robert Scott1-2/+10
fix build with leptonica 1.83
2023-01-28tesseract4: 4.1.1 -> 4.1.3Robert Scott1-13/+7
fix build with leptonica 1.83
2022-12-25tesseract5: 5.2.0 -> 5.3.0R. Ryantm1-2/+2
2022-08-25Merge pull request #186611 from viraptor/tesseract-5-darwinSandro2-4/+12
2022-08-25tesseract5: fix darwin compilationStanisław Pitucha2-4/+12
2022-08-14tesseract5: 5.1.0 -> 5.2.0R. Ryantm1-2/+2
2022-06-26treewide: rename maintainer `earvstedt` -> `erikarvstedt`Erik Arvstedt2-2/+2
The maintainer name now matches the Github username, which simplifies maintainer notifications.
2022-05-02tesseract5: init at 5.1.0Anselm Schüler2-0/+45
2022-05-02tesseract4.tessdata: 4.0.0 -> 4.1.0Anselm Schüler1-133/+132
2022-05-02tesseract: add wrapper testErik Arvstedt1-3/+29
2022-05-02tesseract: use multi-line build inputs formatErik Arvstedt2-4/+27
2022-05-02tesseract: switch to SRI hash formatErik Arvstedt4-115/+116
2022-05-02tesseract: fix `fetch-language-hashes`Erik Arvstedt1-1/+1
- Don't match spaces in language names - Remove duplicate languages
2021-11-14tesseract4: apply patches to fix build on aarch64-darwinArtturin1-1/+13
2021-02-19treewide: makeWrapper buildInputs to nativeBuildInputsBen Siraphob1-1/+1
2021-01-25treewide: remove stdenv where not neededPavol Rusnak1-1/+1
2021-01-19treewide: pkgs.pkgconfig -> pkgs.pkg-config, move pkgconfig to alias.nixJonathan Ringer2-4/+4
continuation of #109595 pkgconfig was aliased in 2018, however, it remained in all-packages.nix due to its wide usage. This cleans up the remaining references to pkgs.pkgsconfig and moves the entry to aliases.nix. python3Packages.pkgconfig remained unchanged because it's the canonical name of the upstream package on pypi.
2021-01-16treewide: stdenv.lib -> libBen Siraphob3-10/+10
2020-04-10treewide: Per RFC45, remove all unquoted URLsMichael Reilly2-2/+2
2020-01-17tesseract: 4.1.0 -> 4.1.1Erik Arvstedt1-2/+2
2019-08-15treewide: name -> pname (easy cases) (#66585)volth2-2/+2
treewide replacement of stdenv.mkDerivation rec { name = "*-${version}"; version = "*"; to pname
2019-07-17tesseract4: 4.0.0 -> 4.1.0R. RyanTM1-2/+2
Semi-automatic update generated by https://github.com/ryantm/nixpkgs-update tools. This update was made based on information from https://repology.org/metapackage/tesseract/versions
2019-06-16treewide: remove unused variables (#63177)volth1-1/+1
* treewide: remove unused variables * making ofborg happy
2018-12-19tesseract: add tesseract3 top-level attrErik Arvstedt1-1/+1
2018-12-19tesseract: rename to tesseract4, add aliasErik Arvstedt1-1/+1
This is more consistent with the naming of the most popular versioned pkgs.
2018-12-19tesseract: add separate language derivationsErik Arvstedt3-20/+290
This frees users from downloading all languages when building Tesseract with a custom set of languages. `enableLanguagesHash` is now obsolete.
2018-12-19tesseract: add a wrapper to setup languagesErik Arvstedt5-83/+130
Tesseract is now decoupled from the tessdata language corpus. This avoids recompilation when building Tesseract with a custom set of languages. Update k2pdfopt to use the new wrapper interface.
2018-12-19tesseract: change file layoutErik Arvstedt3-63/+74
Rename default.nix -> tesseract3.nix Rename 4.x.nix -> tesseract4.nix This is needed for the following commits.
2018-11-24tesseract_4: 4.00.00alpha-git-20170410 -> 4.0.0Ryan Mulligan1-5/+5
The 4.0.0 stable release is out. Changelog: https://github.com/tesseract-ocr/tesseract/wiki/4.0x-Changelog
2018-06-19tesseract: make tessdata a fix output derivation (#41227)symphorien1-30/+34
the full tessdata is nearly a GB, so sparing a copy each time we need to rebuild tesseract without updating tessdata is worth it.
2017-08-22treewide: homepage URL fixes (#28475)Matthew Justin Bauer2-2/+2
* pgadmin: use https homepage * msn-pecan: move homepage to github google code is now unavailable * pidgin-latex: use https for homepage * pidgin-opensteamworks: use github for homepage google code is unavailable * putty: use https for homepage * ponylang: use https for homepage * picolisp: use https for homepage * phonon: use https for homepage * pugixml: use https for homepage * pioneer: use https for homepage * packer: use https for homepage * pokerth: usee https for homepage * procps-ng: use https for homepage * pycaml: use https for homepage * proot: move homepage to .github.io * pius: use https for homepage * pdfread: use https for homepage * postgresql: use https for homepage * ponysay: move homepage to new site * prometheus: use https for homepage * powerdns: use https for homepage * pm-utils: use https for homepage * patchelf: move homepage to https * tesseract: move homepage to github * quodlibet: move homepage from google code * jbrout: move homepage from google code * eiskaltdcpp: move homepage to github * nodejs: use https to homepage * nix: use https for homepage * pdf2djvu: move homepage from google code * game-music-emu: move homepage from google code * vacuum: move homepae from google code
2017-04-23tesseract: supports darwinMatthew Bauer1-1/+1
2017-04-11tesseract: Package version 4.x from Git masteraszlig1-0/+61
Tesseract 4 has got a new long short-term memory neural networking based OCR engine which really helps a lot in terms of accuracy and our VM tests. I ran the new version across a bunch of different screenshots and comparing the results to the 3.x branch and it really makes a big difference, especially with various font rendering settings. The only downside of this is that version 4 hasn't been released yet and is in alpha state right now, but it will eventually get there and the only solutions that came into my mind sticking to version 3 were really sub-par: * Use several passes with different color negation on the screenshots. * Train Tesseract 3 specifically for screenshots. This is sub-par because we'd need to do it for Tesseract 4 from scratch again. * Change the test systems so that it specifically uses *only* OCR an font when displaying. I've actually tried this but this also isn't accurate enough with our default font rendering setup. * Turn off special font rendering settings for our tests. In conjunction with changing to an OCR font this might work but it won't catch all the cases, because applications might use their own font rendering. Given that version 4 is faster[1] when it comes to OCR detection and also the points just mentioned I think even using the alpha version just for tests isn't going to hurt anybody. [1]: https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance Signed-off-by: aszlig <aszlig@redmoonstudios.org>
2017-04-11tesseract: 3.04.01 -> 3.05.00aszlig1-5/+5
Upstream changelog: * Made some fine tuning to the hOCR output. * Added TSV as another optional output format. * Fixed ABI break introduced in 3.04.00 with the AnalyseLayout() method. * text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer. * Training tools - Replaced asserts with tprintf() and exit(1). * Fixed Cygwin compatibility. * Improved multipage tiff processing. * Improved the embedded pdf font (pdf.ttf). * Enable selection of OCR engine mode from command line. * Changed tesseract command line parameter '-psm' to '--psm'. * Added new C API for orientation and script detection, removed the old one. * Increased minimum autoconf version to 2.59. * Removed dead code. * Fixed many compiler warning. * Fixed memory and resource leaks. * Fixed some issues with the 'Cube' OCR engine. * Fixed some openCL issues. * Added option to build Tesseract with CMake build system. * Implemented CPPAN support for easy Windows building. The upstream URL of the change log is: https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.00 Tested by building against the following packages that directly depend on it: * vapoursynth (with ocrSupport = true) * pyocr (fails) * vobsub2srt Also tested against the following NixOS VM tests that have OCR enabled: * nixos/tests/chromium.nix -A stable * nixos/tests/emacs-daemon.nix * nixos/tests/installer.nix -A luksroot * nixos/tests/lightdm.nix * nixos/tests/plasma5.nix * nixos/tests/sddm.nix All of the packages and tests except pyocr build/succeed on x86_64-linux. Fixing pyocr is outside of the scope of this commit and will happen very soon. Signed-off-by: aszlig <aszlig@redmoonstudios.org>
2017-04-11tesseract: Reintroduce enableLanguagesaszlig1-1/+27
I've removed that attribute in 68bc260ca2d71a676dd6afdb3524d4fff483016b, because the language files no longer were distributed as seperate files, but if we for example only want to use the English training data, the closure size of Tesseract gets quite large (around 1.2 GB), which is a bit much just to be able to run NixOS VM tests. For this reason I've also switched the VM tests back to using only the English language. Tested using the following VM tests (the ones that have OCR enabled) on x86_64-linux: * nixos/tests/chromium.nix -A stable * nixos/tests/emacs-daemon.nix * nixos/tests/installer.nix -A luksroot * nixos/tests/lightdm.nix * nixos/tests/plasma5.nix * nixos/tests/sddm.nix Signed-off-by: aszlig <aszlig@redmoonstudios.org>
2016-12-19tesseract: 3.02.02 -> 3.04.01aszlig1-40/+18
From the upstream changelog: * Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting). So let's move over to the GitHub repository, where the organisation also includes a full repository for tessdata, so we no longer need to fetch it one-by-one. The build also got significantly simpler, because we no longer need to run autoconf, neither do we need to patch the configure script for Leptonica headers. This also has the advantage that we don't need to use the enableLanguages attribute for the test runner anymore. Full upstream changelog can be found at: https://github.com/tesseract-ocr/tesseract/blob/c4d273d33cc36e/ChangeLog Tested against all NixOS tests with enabled OCR (chromium, emacs-daemon, installer.luksroot and lightdm). Signed-off-by: aszlig <aszlig@redmoonstudios.org> Cc: @viric
2016-03-05Use general hardening flag toggle listsFranz Pletz1-1/+1
The following parameters are now available: * hardeningDisable To disable specific hardening flags * hardeningEnable To enable specific hardening flags Only the cc-wrapper supports this right now, but these may be reused by other wrappers, builders or setup hooks. cc-wrapper supports the following flags: * fortify * stackprotector * pie (disabled by default) * pic * strictoverflow * format * relro * bindnow
2016-02-20tesseract: turn off format hardeningRobin Gloster1-0/+2
2015-05-23tesseract: fix postInstallMateusz Kowalczyk1-1/+1
We needed to separate each of the unpack commands.
2015-05-22tesseract: Allow to specify a subset of languages.aszlig1-19/+24
Especially useful for our OCR based VM tests, where we only need the english language. By default the argument is null so all languages are included. If a list of language name is passed only those languages are enabled, for example: tesseract.override { enableLanguages = [ "eng" "spa" ]; }; To only enable support for English and Spanish languages. Signed-off-by: aszlig <aszlig@redmoonstudios.org>