
Impact of log4j CVE-2021-44228 on heritrix3? #522 - GitHub
Dec 10, 2021 · This is an issue to track the impact of a recent log4j remote exploit (CVE-2021-44228) in the context of heritrix3. My brief read of the situation is that log4j versions 2.0.x through 2.14.x (see …
Summarize web archive capture index (CDX) files. - GitHub
Summarize web archive capture index (CDX) files. Contribute to internetarchive/cdx-summary development by creating an account on GitHub.
Impact of log4j CVE-2021-44228 on heritrix3? · Issue #451 ... - GitHub
Dec 10, 2021 · This is an issue to track the impact of a recent log4j remote exploit (CVE-2021-44228) in the context of heritrix3. My brief read of the situation is that log4j versions 2.0.x through 2.14.x (see …
Long-lived cookies might have unintended consequences on a
In a recent investigation we found that about half of the Twitter pages are archived in non-English languages. Also, about half of those non-English captures are in Kannada language alone. The root...
Re-instate biblio.com affiliate link (s) · Issue #960 - GitHub
May 11, 2018 · Some of them seemed to simply exploit OL links to harvest user data/drop cookies, etc. Not sure if that was the case with biblio. There ought to be a less invasive sort of affiliation possible …
KeyError warc-target-uri · Issue #19 - GitHub
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Improve OCR quality · Issue #348 · internetarchive/openlibrary
Oct 11, 2016 · ebook edition quality (generally terrible, but not usually due to the underlying technology) and devising strategies to improve it, based on some of the work done at the UMass CIIR to exploit …
Flagging Edition constraint violation for automated or manual split
A data clean-up task which I have not yet found an open issue addressing is developing and checking constraints for unique identifiers / unique values: This specifically targets all OL*M values tha...
Warcprox - WARC writing MITM HTTP/S proxy - GitHub
Warcprox is an HTTP proxy designed for web archiving applications. When used in parallel with brozzler it supports a comprehensive, modern, and distributed archival web capture system. Warcprox stores …
Improve markup accessibility: Aria, Schema.org, fb open graph
May 28, 2018 · Corporate actors are duty bound to exploit any information they get in the interest of their shareholders, we couldn't expect them not to advertise based on what they can learn from library …