![]() |
FOSSology Advancing open source analysis and development |
|
Table of Contents
Interpreting the License Analysis Report (0.6.1)License Hierarchy DescriptionLicense analysis is performed by comparing an unknown file (that contains zero or more license sections) with a set of license templates. The comparison algorithm used by the license agent looks for groups of the similar words in a similar ordering. The algorithm does not mind if individual words are placed. For example, “This is the Gnu public license and you can share it” matches “This is the Neal public license and you can share it”. Changing a single word in the license usually does not change the meaning of the license. Individual word changes (or small groups of words) are very common. (It appears that plagiarism does not apply to license text.) While some open source projects use “standard” licenses, such as GPL or BSD, other projects create their own licenses by merging in parts from different licenses. For example, a license may contain the three requirements found in the BSD license along with the warranty disclaimer from GPL and the distribution requirements from the MIT license. There are also many cases where a well-known license is simply renamed. One of the most common is the use of the LGPL license, where “GNU Lesser General Public License” is renamed after a company or project. Similarly, many projects take the GNU Library General Public License (GLGPL) and replace “library” with “program” or an application’s name. None of this changes the license requirements or the template that it matches; it only changes the license name and the percentage of the match. The worst-case scenarios happen when projects take a non-GPL license and simply replace the license name with “GPL“. The question becomes, did the author mean “GPL” or did they mean they wanted their own license rules? Fortunately, this is an issue for the lawyers to resolve. The license analyzer makes no legal interpretation about the semantic meaning of the license. It only matches text against license templates and identifies the percentage of the match. Under the user interface, you can select a project and click on the license tab. This shows a histogram of the discovered licenses. Each type of license is listed as well as the number of files containing the license. For example:
Each of the licenses has a distinct name and identifies a distinct license. However, “Phrase” is a catch-all category. License that are unknown by the analysis system are usually identified by common phrases, such as “is distributed under...”. Phrases that are potentially associated with licenses are listed the Phrase category. You can click on each of the license types and see a list of files that contain the license.
The files are ordered by the percentage of match. In the example, the file “COPYING” has a section of text that includes a 97% match with a section of the Intel-OSL license. By clicking on the file name, you can see the actual text of the file with the matching license text highlighted. At the top of the file contents is an index table that lists the licenses in the file, a link to the instance (click on “view”), a link to the actual license (click on “ref”), and a color – each identified license is color coded. Items without a “ref” denote Phrases that are identified as possible license text. The actual matched text within the document are highlighted to match the license key. Words that are not included in the match are not highlighted. In this example, the attribution of the license has been changed to say “University of Cambridge” and the owner’s name has been replaced with “COPYRIGHT OWNER”. Outside of these specific changes, the license text matches the Intel-OSL license. TemplatesThe license templates are categorized into families with similar text. It is important to note that “similar text” does not mean “similar purpose”. Your company may consider one type of license to be “bad” but a similar text license to be “good”. Any interpretation is left up to you; the groupings are strictly by similar text. For example, the generic Academic Free License (AFL) appears to have similar text to the Open Software License (OSL). Based on the word usage, OSL was probably derived from AFL (or vice versa). This creates the hierarchy “AFL/AFL/” and “AFL/OSL/”. Under AFL/AFL/ are different versions of the AFL license: 1.1, 1.2, 2.0, etc. Similarly, the BSD license comes in two flavors: the old and new BSL license. BSD/BSD.new/ contains the actual BSD “new” license (BSD/BSD.new/BSD_new) as well as derivatives such as the Apache, Cryptix, and SSLeay licenses. Each license is different, but they contain enough similar text to be derivatives from a central license. The names for the license text attempt to describe the license. For example, GPL-based licenses contain “GPL” in the template name or family, and the Free Software Federation’s license family is denoted by “FSF”. However, some licenses do not have formal names. For the templates, these have been named after the general purpose, such as “Free/Free Use No Change” and “Free/Beerware” (Beerware is a real license, but it is categorized under the general-purpose “Free” family.) To re-emphasize: the naming is relatively arbitrary and should not be interpreted as legal advice. When there are multiple ways to present a license, the different variants are numbered. For example, “Corporate/Sun/Sun Microsystems variant 1” and “Corporate/Sun/Sun Microsystems variant 2” include different text that means the same thing. Licenses are not always included in their entirety. Files frequently contain references to the license rather than the actual license. License references are included in the templates, such as “AFL/OSL/Open Software License 1.0 reference”. In some cases, there may be multiple common reference templates, so these are numbered such as “GPL/v2/GPLv2 reference 2” and “GPL/v2/GPLv2 reference 3”. Besides references, licenses may include shortened versions that summarize the license. For example, “Adobe/Adobe short” is a variation of “Adobe/Adobe”. Similarly, licenses may contain sections such as suppliments and appendices. License PhrasesWhen possible, text is matched against license templates. However, not every template matches every license. This can happen when a new license (not in the list of templates) is identified. Similarly, licenses may be single sentences, such as “This is free, enjoy.” Single sentences usually have legal meanings even if there is no formal license associated with the file. The license analyzer (technically, the filter_license agent) identifies potential license phrases. Any sentence containing these phrases and not found in a license template is flagged as a potential license phrase:
As an aside, there is an intentional spelling error “licenced” in this mix because it appears in far too many files. (Nobody ever said that engineers could spell.) While matches against license templates are very accurate (very few false-positives and very few false-negatives), license phrase matching is less accurate. For example, the sentence “mimic existing proprietary applications for instance” likely matches source code, while “license from proprietary software” is probably important for a legal interpretation. (And “Throw away proprietary and site licenses” could be from source code or be legally important.) Each of these examples comes from a real code analysis. Due to the lower accuracy level for phrases, it is important to review each case. Percent of MatchLicenses are matched based on a percentage of similar tokens. Tokens are simply words or punctuation. For example, consider a file that has a potential license section that contains 500 tokens. If 400 of the tokens matches a section of a license template that contains 2000 tokens, then it matches 400/500 tokens, or an 80% match. Since 20% of the text does not match, it could indicate a new license clause, alternate wording, or simply a replaced term. When viewing the license under the UI, the matched tokens are highlighted. Any word (or character) not highlighted was not part of the match. The highlighting allows users to quickly determine what was changed. It could be as simple as spelling out “General Public License” instead of “GPL“, or it could be the inclusion of the word “not” (a small, but very critical word for legal interpretation). A “100%” match indicates that the entire potential section matched something in the template, but does not necessarily mean that the entire template matched the section. For example, a license section may have a 100% match with BSD/BSD.new/BSD_new, but only match the warranty clause. License TemplatesLicense templates are arranged in directories that denote similar text. The organization is strictly based on text similarities and not semantics. Each template has a unique name – the user interface only displays the name and not the hierarchical path. The current list of license templates are as follows: Adaptive/Adaptive 1.0 Adaptive/Adaptive 1.0 Appendix A Adobe/Adobe Adobe/Adobe short AFL/AFL/Academic Free License 1.1 AFL/AFL/Academic Free License 1.2 AFL/AFL/Academic Free License 2.0 AFL/AFL/Academic Free License 2.1 AFL/AFL/Academic Free License 3.0 AFL/OSL/Open Software License 1.0 AFL/OSL/Open Software License 1.0 reference AFL/OSL/Open Software License 1.1 AFL/OSL/Open Software License 2.0 AFL/OSL/Open Software License 2.1 AFL/OSL/Open Software License 3.0 APSL/Apple Public Source License 1.0 APSL/Apple Public Source License 1.1 APSL/Apple Public Source License 1.2 APSL/Apple Public Source License 2.0 Artistic/Artistic 1.0 Artistic/Artistic 1.0 short Artistic/Artistic 2.0 Artistic/Artistic 2.0beta4 BSD/BSD.new/Apache/Apache Software License 1.0 BSD/BSD.new/Apache/Apache Software License 1.1 BSD/BSD.new/Apache/Apache Software License 2.0 BSD/BSD.new/Apache/Apache Software License 2.0 reference BSD/BSD.new/BSD new BSD/BSD.new/BSD new short BSD/BSD.new/Cryptix BSD/BSD.new/Entessa Public License BSD/BSD.new/Maia Mailguard License BSD/BSD.new/Naumen Public License BSD/BSD.new/OpenPBS BSD/BSD.new/Phorum BSD/BSD.new/PHP/PHP 3.0 BSD/BSD.new/SSLeay BSD/BSD.new/Vovida Software License 1.0 BSD/BSD.new/Zend BSD/BSD.old/Attribution Assurance License BSD/BSD.old/BSD As-Is clause BSD/BSD.old/BSD Harvard BSD/BSD.old/BSD NRL BSD/BSD.old/BSD old BSD/BSD.old/BSD UCRegents BSD/BSD.old/BSD UCRegents 2 BSD/BSD.old/BSD zlib BSD/BSD.old/FreeBSD BSD/BSD.old/Intel-OSL BSD/BSD.old/OpenLDAP BSD/BSD.old/OpenSSL BSD/BSD.old/Sleepycat BSD/BSD.old/Sleepycat short BSD/BSD.old/Zope/Zope 1.0 BSD/BSD.old/Zope/Zope 2.0 CDDL/CDDL 1.0 Corporate/Apple/Apple Common Documentation License 1.0 Corporate/Apple/Apple Squeak Corporate/CA/TOSL/Computer Associates Trusted Open Source License 1.1 Corporate/HP/Hewlett-Packard Corporate/HP/HP-UX Java Corporate/HP/HP-UX JRE Corporate/IBM/IBM JRE Corporate/IBM/IBM reciprocal Corporate/Logica/Logica Open Source License Version 1.0 Corporate/Lucent/Lucent Public License 1.0 Corporate/Lucent/Lucent Public License 1.02 Corporate/Microsoft/Microsoft EULA Corporate/Microsoft/Microsoft EULA 2003 Corporate/Microsoft/Microsoft EULA Software Corporate/Motorola Corporate/NCD/Network Computing Devices 1993 Corporate/NetComponents/NetComponents Corporate/Nokia/Nokia Open Source License 1.0a Corporate/Nvidia Corporate/RSA/RSA MD5 Corporate/SGI/SGI CID 1.0 Corporate/SGI/SGI GPX 1.0 Corporate/Skype Corporate/Sun/Bigelow&Holmes Corporate/Sun/Sun Microsystems Binary Code License Corporate/Sun/Sun Microsystems Binary Code License supplement Corporate/Sun/Sun Microsystems Free with Copyright 1 Corporate/Sun/Sun Microsystems Free with Copyright 2 Corporate/Sun/Sun Microsystems Sun Public License Corporate/Sun/Sun Microsystems variant 1 Corporate/Sun/Sun Microsystems variant 2 Corporate/Sun/Sun Solaris Source Code License Foundation Release CPL/Common Public License 1.0 CPL/IBM/IBM_PL/IBM Public License 1.0 Creative_Commons/Creative Commons GPL Creative_Commons/Creative Commons LGPL Creative_Commons/Creative Commons Public Domain Creative_Commons/Creative Commons Public License Edu/CMU/Carnegie Mellon University 1998 Edu/CMU/Carnegie Mellon University 2000 Edu/CWI (Center for Mathematics and Computer Science, Netherlands) Edu/Educational Community License Edu/University of Utah Public License Edu/Univ of Cambridge Edu/Univ of Edinburgh Edu/Univ of Notre Dame Eiffel/Eiffel Forum License 1 Eiffel/Eiffel Forum License 2 FreeArtLicense/Free Art License 1.2 Free/Beerware Free/Fair License Free/Free clause Free/Free clause variant 2 Free/Free clause variant 3 Free/Free use no change clause Free/FreeWithCopyright/Free with copyright clause variant 1 Free/FreeWithCopyright/Free with copyright clause variant 10 Free/FreeWithCopyright/Free with copyright clause variant 3 Free/FreeWithCopyright/Free with copyright clause variant 4 Free/FreeWithCopyright/Free with copyright clause variant 5 Free/FreeWithCopyright/Free with copyright clause variant 8 Free/FreeWithCopyright/Free with copyright clause variant 9 Free/FreeWithCopyright/UC Regents free with copyright clause Free/FreeWithCopyright/Unidex Free/FreeWithCopyright/variant.11 Free/Free with files clause FreeType/FreeType FreeType/FreeType reference Free/WTFPL FSF/FSF FSF/FSF variant 1 FSF/FSF variant 2 FSF/FSF variant 3 FSF/FSF variant 4 Gov/CeCILL-B_V1-en Gov/CeCILL-B_V1-fr Gov/CeCILL-C_V1-en Gov/CeCILL-C_V1-fr Gov/CeCILL_V1.1-US Gov/CeCILL_V1-fr Gov/CeCILL_V2-en Gov/CeCILL_V2-fr Gov/Government clause Gov/MITRE Collaborative Virtual Workspace License Gov/NASA Open Source 1.3 Gov/Starndard ML of New Jersey GPL/Affero/Affero GPL GPL/CopyLeft reference GPL/Dual MPL GPL GPL/Exception/GPL exception clause 1 GPL/Exception/GPL exception clause 2 GPL/GFDL/GNU Free Documentation License 1.1 reference 1 GPL/GFDL/GNU Free Documentation License 1.1 reference 2 GPL/GFDL/GNU Free Documentation License 1.2 GPL/GFDL/GNU Free Documentation License 1.2 reference GPL/GPL for Computer Programs of the Public Administration GPL/GPL from FSF reference GPL/GPL reference GPL/LGPL/LGPL 2.0 GPL/LGPL/LGPL 2.0 reference GPL/LGPL/LGPL 2.0 with exceptions GPL/LGPL/LGPL 2.1 GPL/LGPL/LGPL 2.1 reference GPL/LGPL/LGPL 3.0 GPL/LGPL/LGPL gettext library variant GPL/LGPL/LGPL GNU C Library variant GPL/LGPL/LGPL wxWindows Library Licence 3.0 variant GPL/v1/GPLv1 GPL/v1/GPLv1 reference GPL/v2/eCos GPL/v2/Free with copyright clause GPL/v2/GPL from FSF reference 1 GPL/v2/GPL from FSF reference 2 GPL/v2/GPLv2 GPL/v2/GPLv2 Java Index Serialization Package variant GPL/v2/GPLv2 reference GPL/v2/GPLv2 reference 2 GPL/v2/GPLv2 reference 3 GPL/v2/GPLv2 reference 4 GPL/v2/McKornik Jr. Public License GPL/v2/RealNetworks/RealNetworks Community Source Licensing GPL/v2/RealNetworks/RealNetworks Public Source License 1.0 GPL/v2/RealNetworks/RealNetworks Public Source License 1.0 reference GPL/v2/Sybase Open Watcom Public License 1.0 GPL/v3/GPLv3 GPL/v3/GPLv3 reference 1 GPL/v3/GPLv3 reference 2 GPL/W3C/World Wide Web Consortium 2001 GPL/W3C/World Wide Web Consortium 2002 Historical/Historical free with copyright clause Historical/Historical Permission Notice and Disclaimer ICU/ICU 1.8.1 ICU/ICU 1.8.1 variant IETF/IETF IETF/IETF variant MiscOSS/Aladdin Free Public License MiscOSS/Bitstream MiscOSS/BitTorrent MiscOSS/BitTorrent reference MiscOSS/Catharon Open Source License MiscOSS/C_Migemo License MiscOSS/Condor MiscOSS/Copy clause MiscOSS/EU DataGrid Software License MiscOSS/Frameworx Open License 1.0 MiscOSS/Giftware MiscOSS/Glide MiscOSS/gnuplot MiscOSS/Hacktivismo Enhanced-Source Software License Agreement MiscOSS/IJG MiscOSS/iMatix MiscOSS/Internet Software Consortium MiscOSS/Jabber Open Source License 1.0 MiscOSS/Jahia Community Source License MiscOSS/LaTeX Project Public License 1.3a MiscOSS/mecab-ipadic MiscOSS/Motosoto Open Source License MiscOSS/MSNTP License MiscOSS/Nethack General Public License MiscOSS/OpenContent License MiscOSS/Open Motif Public End User License MiscOSS/Pine License MiscOSS/qmail License MiscOSS/Q Public License 1.0 MiscOSS/Ruby MiscOSS/Scilab License MiscOSS/TCL MiscOSS/Vim MiscOSS/zlib/InfoZip MiscOSS/zlib/zLib MIT/Imlib2 MIT/JasPer MIT/MIT Bigelow&Holmes Luxi font variant MIT/MIT CMU style MIT/MIT Free with copyright clause MIT/MIT HP-DEC variant MIT/MIT MLton variant MIT/MIT (modern) MIT/MIT (modern) with sublicense MIT/MIT New Jersey variant MIT/MIT (oldstyle) MIT/MIT (oldstyle) no ads clause MIT/MIT (oldstyle) with disclaimer 1 MIT/MIT (oldstyle) with disclaimer 2 MIT/MIT (oldstyle) with disclaimer 3 MIT/MIT Unicode variant MIT/NCSA MIT/X11 MIT/X.Net License MPL/CUA Office Public License 1.0 MPL/Dual MPL MIT MPL/Interbase MPL/MPL 1.0 MPL/MPL 1.1 MPL/MPL 1.1 reference MPL/MPL contributor clause with dual license MPL/Netizen Open Source License MPL/NPL 1.1 MPL/NPL 1.1 reference MPL/NPL contributor clause with dual license MPL/Ricoh Source Code Public License MPL/SISSL/SISSL 1.1 MPL/SISSL/SISSL 1.1 reference 1 MPL/SISSL/SISSL 1.1 reference 2 OCLC/OCLC Research Public License 2.0 OpenGroup/Open Group OpenGroup/Open Group Test Suite License OpenPublicationLicense/Open Publication License 1.0 OpenPublicationLicense/Open Publication License reference Python/PSF/Python Software Foundation 2.1.1 Python/PSF/Python Software Foundation 2.2 Python/Python BeOpen Python/Python CNRI Python/Python CWI Python/Python InfoSeek variant RedHat/Red Hat EULA RedHat/Red Hat reference Adding a License TemplateThe license templates are a set of known licenses used for the analysis. When any part of the known license (called a raw template) match, then the license is marked as a match. However, new or unknown licenses may not match well. In some cases, you may want to add in your own license. Currently, adding licenses is not user-friendly. It requires re-running the build and modifying the database.
There are some general guidelines when selecting raw license text:
Adding a License PhraseOne sentence license phrases (1SL) are phrases commonly associated with licenses. When no license template is found, 1SL phrases are displayed. Currently, all 1SL templates are hard-coded into the Filter_License agent. The file is: trunk/fossology/agents/foss_license_agent/Filter_License/1sl.c At the very beginning of the code is a global array named “List1SL”. The array ends with NULL,NULL. If you want to add in your own one-sentence-license (1SL), then just add the pattern before the NULL’s. Each 1SL array entry has two parts:
As an example one-sentence license phrase: < * * proprietary % > *.*|*,*|*;*|*:*|*$*|*(*|*)*|*{*|*}*
This expression looks for any set of words containing the word “proprietary”. It returns 2 words before “proprietary” and then any number of words until it finds an end-of-phrase character (period, comma, semi-colon, etc.). This will match phrases like: This software contains proprietary source code. (The expression returns “software contains proprietary source code”) This gets around proprietary Microsoft APIs. (A real phrase from some open source projects.) Maybe this should be proprietary? (Returns “should be proprietary”) After changing this file, use “make” in the Filter_License/ directory to make sure that it builds. Then use “make” from trunk/fossology/ to build the code and “make install” to install it. Re-analyzing LicensesChanges to the license templates or 1SL system are not currently applied to previous analysis results. As a result, files that are already processed will not be reprocessed with the new license. There are a few manual workarounds to reset the database so files will be re-analyzed. Choose one of these two options:
FOSSology Project documentation is licensed under the GNU Free Documentation License Version 1.2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||