Prechádzať zdrojové kódy

Signed-off-by: Meco Man <920369182@qq.com>

Meco Man 5 rokov pred
commit
dfe2dda003
6 zmenil súbory, kde vykonal 2161 pridanie a 0 odobranie
  1. 339 0
      LICENCE
  2. 13 0
      README.md
  3. 32 0
      qfpio.h
  4. 868 0
      qfpio.s
  5. 54 0
      qfplib.h
  6. 855 0
      qfplib.s

+ 339 - 0
LICENCE

@@ -0,0 +1,339 @@
+		    GNU GENERAL PUBLIC LICENSE
+		       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+			    Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+		    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+			    NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+		     END OF TERMS AND CONDITIONS
+
+	    How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License along
+    with this program; if not, write to the Free Software Foundation, Inc.,
+    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.

+ 13 - 0
README.md

@@ -0,0 +1,13 @@
+Qfplib is open source, licensed under version 2 of the GNU GPL. A copy
+of that licence is included in this archive. The archive also contains:
+
+- qfplib.s, the source code to qfplib. The GNU assembler syntax is used.
+
+- qfplib.h, a C header file giving prototypes for the qfplib functions.
+
+- qfpio.s, the source code to qfpio, routines for converting between
+strings and floating-point values.
+
+- qfpio.h, a C header file giving prototypes for the qfpio functions.
+
+Visit http://www.quinapalus.com/qfplib.html for more information.

+ 32 - 0
qfpio.h

@@ -0,0 +1,32 @@
+// Copyright 2015 Mark Owen
+// http://www.quinapalus.com
+// E-mail: qfp@quinapalus.com
+//
+// This file is free software: you can redistribute it and/or modify
+// it under the terms of version 2 of the GNU General Public License
+// as published by the Free Software Foundation.
+//
+// This file is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with this file.  If not, see <http://www.gnu.org/licenses/> or
+// write to the Free Software Foundation, Inc., 51 Franklin Street,
+// Fifth Floor, Boston, MA  02110-1301, USA.
+
+#ifndef _QFPIO_H_
+#define _QFPIO_H_
+
+#ifdef __cplusplus
+  extern "C" {
+#endif
+
+extern void qfp_float2str(float f,char*s,unsigned int fmt);
+extern int qfp_str2float(float*f,char*p,char**endptr);
+
+#ifdef __cplusplus
+  } // extern "C"
+#endif
+#endif

+ 868 - 0
qfpio.s

@@ -0,0 +1,868 @@
+@ Copyright 2015 Mark Owen
+@ http://www.quinapalus.com
+@ E-mail: qfp@quinapalus.com
+@
+@ This file is free software: you can redistribute it and/or modify
+@ it under the terms of version 2 of the GNU General Public License
+@ as published by the Free Software Foundation.
+@
+@ This file is distributed in the hope that it will be useful,
+@ but WITHOUT ANY WARRANTY; without even the implied warranty of
+@ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+@ GNU General Public License for more details.
+@
+@ You should have received a copy of the GNU General Public License
+@ along with this file.  If not, see <http://www.gnu.org/licenses/> or
+@ write to the Free Software Foundation, Inc., 51 Franklin Street,
+@ Fifth Floor, Boston, MA  02110-1301, USA.
+
+.syntax unified
+.cpu cortex-m0
+.thumb
+
+@ exported symbols
+
+.global qfp_float2str
+.global qfp_str2float
+
+@ C code in comments is intended to give an idea of the function
+@ of the following assembler code. The translation is not exact.
+
+@ // multiply by 128/125: used by conversions in both directions
+@ unsigned int div125(unsigned int u) {
+@   unsigned int a,b,c,k0=0x4189; // 0x4189~=128/125 Q14
+@   a=u>>14;
+@   a=a*k0;               // calculate first approximation to answer, good to about 14 bits
+@   b=((a>>1)+(a>>2))>>4;
+@   b=a-(b>>1)-(b&1);     // find error in approximation
+@   c=(u-b)*k0;
+@   return a+(c>>14)+1;   // result good to about 28 bits
+@   }
+
+div125:
+ push {r1-r4,r14}
+ ldr r4,=#0x4189  @ k0=0x4189;
+ lsrs r1,r0,#14   @ a=u>>14;
+ muls r1,r4       @ a=a*k0;
+ lsrs r2,r1,#1    @ a>>1
+ lsrs r3,r1,#2    @ a>>2
+ add r2,r3        @ (a>>1)+(a>>2)
+ lsrs r2,#4       @ b=((a>>1)+(a>>2))>>4;
+ subs r0,r1       @ u-a
+ lsrs r2,#1       @ b>>1
+ adcs r0,r2       @ u-a+(b>>1)+(b&1)
+ muls r0,r4       @ c=(u-b)*k0;
+ lsrs r0,#14      @ c>>14
+ add r0,r1        @ a+(c>>14)
+ adds r0,#1       @ a+(c>>14)+1
+ pop {r1-r4,r15}
+
+.ltorg
+
+opoint: @ output decimal point
+ adds r5,#2
+ movs r3,#'.'
+ b och
+ozero: @ output '0'
+ movs r3,#0
+odig:  @ output one digit from r3
+ adds r3,#'0'
+och:   @ output one character from r3
+ strb r3,[r1]
+ adds r1,#1
+ bx r14
+
+naninf: @ r4=0 for Inf, otherwise NaN
+ ldr r3,=#0x00666e49 @ "fnI"
+ cmp r4,#0
+ beq 1f
+ ldr r3,=#0x004e614e @ "NaN"
+1:
+ bl och
+ lsrs r3,#8
+ bne 1b
+ b 10f
+
+@ fmt is format control word:
+@ b7..b0: number of significant figures
+@ b15..b8: -(minimum exponent printable in F format)
+@ b23..b16: maximum exponent printable in F format-1
+@ b24: output positive mantissas with ' '
+@ b25: output positive mantissas with '+'
+@ b26: output positive exponents with ' '
+@ b27: output positive exponents with '+'
+@ b28: suppress traling zeros in fraction
+@ b29: fixed-point output: b7..0 give number of decimal places
+@ default: 0x18060406
+@ Note that if b28 is set (as it is in the default format value) the code will
+@ write the trailing decimal point and zeros to the output buffer before truncating
+@ the string. Thus it is essential that the output buffer is large enough to accommodate
+@ these characters temporarily.
+@
+@ Overall accuracy is sufficient to print all exactly-representable integers up to 10^8 correctly
+@ in 0x18160408 format.
+@ 
+@ void float2str(float f,char*s,unsigned int fmt) {
+
+qfp_float2str:
+ push {r4-r7,r14}
+
+@   if(fmt==0) fmt=0x18060406; // default format
+
+ cmp r2,#0
+ bne 1f
+ ldr r2,=#0x18060406
+1:
+
+@   i=*(int*)&f;
+@   if(i&0x80000000) { // output sign of mantissa
+@     *p++='-';
+@     i&=0x7fffffff;
+@   } else {
+@     if(fmt&0x01000000) *p++=' ';
+@     else if(fmt&0x02000000) *p++='+';
+@     }
+
+ movs r3,#'-'
+ lsls r0,#1
+ bcs 2f
+ movs r3,#' '
+ lsrs r4,r2,#25
+ bcs 2f
+ movs r3,#'+'
+ lsrs r4,r2,#26
+ bcc 3f
+2:
+ bl och
+3:
+
+@   e2=(i>>23)-127; // get binary exponent e2
+
+ movs r4,#0
+ lsrs r3,r0,#24
+ beq 1f  @ treat zero case specially
+ subs r3,#127
+
+@   m=((i&0x7fffff)|0x800000)<<8; // get mantissa, restore implied 1, make Q31
+
+ lsls r4,r0,#8
+ cmp r3,#128
+ beq naninf @ handle NaN/Inf cases
+ adds r4,#1
+
+@   if(e2==-127) {e2=0; m=0;} // flush denormals to zero
+
+ movs r5,#1
+ rors r4,r5
+1:
+ movs r0,r4
+
+@ now binary exponent e2 in r3, mantissa in r0
+
+@   e10=0;  // decimal exponent
+@ overall plan is to manipulate m, e2 and e10 so as to take e2 to zero, while maintaining the
+@ invariant m * 2^e2 * 10^e10
+
+ movs r4,#0
+
+@   while(e2>0) { // add 3 to e10, take 10 off e2, multiply m by 1024/1000=128/125
+@     if(m>=0xf0000000) m>>=1,e2++;
+@     m=div125(m);
+@     e2-=10;
+@     e10+=3;
+@     } // now e2<=0
+
+ b 2f
+1:
+ lsrs r5,r0,#28
+ cmp r5,#0x0f
+ blo 3f
+ lsrs r0,#1
+ adds r3,#1
+3:
+ bl div125
+ subs r3,#10
+ adds r4,#3
+2:
+ cmp r3,#0
+ bgt 1b
+
+@   while(e2<=-10) { // take 3 off e10, add 10 to e2, multiply m by 1000/1024=125/128
+@     m0=(m>>5)+(m>>6);
+@     m-=(m0>>1)+(m0&1); // *125/128, more accurate than using multiply instruction
+@     e2+=10;
+@     e10-=3;
+@     } // now -10 < e2 <= 0
+
+ b 2f
+1:
+ lsrs r5,r0,#5
+ lsrs r6,r0,#6
+ add r5,r6
+ movs r6,#0
+ lsrs r5,#1
+ adcs r5,r6
+ subs r0,r5
+ subs r4,#3
+2:
+ adds r3,#10
+ ble 1b
+ subs r3,#10
+
+@   m>>=1; // Q30; make sure m will not overflow
+
+ lsrs r0,#1
+
+@   while(e2<=-3) { // take 1 off e10, add 3 to e2, multiply m by 10/8
+@     m0=m>>1;
+@     m+=(m0>>1)+(m0&1); // *10/8
+@     e2+=3;
+@     e10--;
+@     } // now -3 < e2 <=0
+
+ b 2f
+1:
+ lsrs r5,r0,#1
+ lsrs r5,#1
+ adcs r0,r5
+ subs r4,#1
+2:
+ adds r3,#3
+ ble 1b
+ subs r3,#3
+
+@   while(e2<0) { // add 1 to e2, halve m
+@     m>>=1; // *1/2
+@     e2++;
+@     } // now e2==0
+
+ b 2f
+1:
+ lsrs r0,#1
+2:
+ adds r3,#1
+ ble 1b
+ subs r3,#1
+
+@   if(m>=0x40000000) m>>=2; // convert Q30 to Q28
+@   else {
+@     m=(m<<1)+(m>>1)+(m&1); // multiply by 10 (maintaining accuracy) if result will not overflow, compensate e10
+@     e10--;
+@     }
+
+ lsrs r5,r0,#30
+ beq 1f
+ lsrs r0,#2
+ b 2f
+1:
+ lsls r5,r0,#1
+ lsrs r0,#1
+ adcs r0,r5
+ subs r4,#1
+2:
+
+@ now all of binary exponent has been transferred to decimal exponent
+@ we have 
+@ r0: mantissa m, Q28, 1<=m<10
+@ r1: output pointer
+@ r2: format
+@ r3: 0 (was binary exponent)
+@ r4: decimal exponent e10
+
+@   sf=fmt&0xff; // number of significant figures
+
+ uxtb r3,r2 @ e2 is no longer used
+
+@   ff=0; // flag to indicate that output is in "F" format (i.e., will not use "E" notation)
+
+ movs r5,#0
+
+@   d0=e10; // first digit output has significance 10^d0 wrt output '.'
+@   d1=d0-sf; // last digit output has significance 10^(d1+1) wrt output '.'
+
+ movs r6,r4
+ subs r7,r6,r3
+
+@ r0: mantissa m, Q28, 1<=m<10
+@ r1: output pointer
+@ r2: format
+@ r3: sf
+@ r4: decimal exponent e10
+@ r5b0: ff
+@ r6: d0
+@ r7: d1
+
+@   if(fmt&0x20000000) { // forced "F" output format?
+@     d1=-(fmt&0xff)-1;
+@     sf=d0-d1;
+@     ff=1;
+@     }
+
+ push {r1,r2}
+ lsrs r1,r2,#30
+ bcc 1f
+ mvns r7,r3
+ subs r3,r6,r7
+ movs r5,#1
+1:
+
+@   m0=0x08000000; // 0.5 Q28
+@   for(i=1;i<sf;i++) { // calculate amount to add to m for decimal rounding
+@     m0+=m0>>1; // multiply by 0.1
+@     m0+=m0>>4;
+@     m0+=m0>>8;
+@     m0+=m0>>16;
+@     m0>>=4;
+@     }
+@   m+=m0; // rounding
+
+ push {r3}
+ movs r1,#8
+ lsls r1,#24
+2:
+ subs r3,#1
+ ble 1f
+ lsrs r2,r1,#1
+ add r1,r2
+ lsrs r2,r1,#4
+ add r1,r2
+ lsrs r2,r1,#8
+ add r1,r2
+ lsrs r2,r1,#16
+ add r1,r2
+ lsrs r1,#4
+ b 2b
+1:
+ add r0,r1
+ pop {r3}
+
+@   if(m>=0xa0000000) { // has rounding pushed m to 10 (Q28)? if so, set to 1 and increment decimal exponent
+@     m=0x10000000;
+@     e10++;
+@     d0++;
+@     if((fmt&0x20000000)==0) d1++;
+@     }
+
+ lsrs r1,r0,#28
+ cmp r1,#0x0a
+ pop {r1,r2}
+ blo 1f
+ lsrs r0,r2,#30
+ bcs 2f
+ adds r7,#1
+2:
+ movs r0,#0x10
+ lsls r0,#24
+ adds r4,#1
+ adds r6,#1
+1:
+
+@   if(d0>=-(int)((fmt>>8)&0xff)&&d0<(int)((fmt>>16)&0xff)) ff=1; // in range for F format?
+
+ push {r4}
+ lsrs r4,r2,#8
+ uxtb r4,r4
+ adds r4,r6
+ blt 1f
+ lsrs r4,r2,#16
+ uxtb r4,r4
+ cmp r6,r4
+ bge 1f
+ movs r5,#1
+1:
+
+@   if(!ff) d0=0,d1=-sf; // for E format we have one digit before the decimal point
+
+ cmp r5,#0
+ bne 1f
+ movs r6,#0
+ rsbs r7,r3,#0
+1:
+
+@ sf (r3) no longer used
+
+@   f0=0; // flag to indicate whether we have we output a '.'
+
+@ f0 in r5b1
+
+@   if(d0<0) *p++='0',*p++='.',f0=1,i=-1; // value <1, so output "0."
+@   else i=d0;
+
+ mov r4,r6
+ cmp r6,#0
+ bge 1f
+ bl ozero
+ bl opoint
+ movs r4,#0
+ mvns r4,r4
+1:
+
+@   while(i>d0&&i>d1) *p++='0',i--; // output leading zeros before significand as necessary
+
+2:
+ cmp r4,r6
+ ble 1f
+ cmp r4,r7
+ ble 1f
+ bl ozero
+ subs r4,#1
+ b 2b
+1:
+
+@ d0 (r6) no longer used
+
+@   for(;i>d1;i--) {          // now output digits of significand
+@     *p++='0'+(m>>28);       // output integer part of Q28 value
+@     m&=0x0fffffff;          // fractional part of Q28 value
+@     m=(m<<1)+(m<<3);        // multiply by 10
+@     if(i==0) *p++='.',f0=1; // output decimal point as significance goes through 10^0
+@     }
+
+2:
+ cmp r4,r7
+ ble 1f
+ lsrs r3,r0,#28
+ bl odig
+ lsls r0,#4
+ lsrs r0,#1
+ lsrs r3,r0,#2
+ add r0,r3
+ subs r4,#1
+ bcs 2b
+ bl opoint
+ b 2b
+1:
+
+@ m (r0) no longer used
+@ d1 (r7) no longer used
+
+@   for(;i>=0;i--) *p++='0'; // output remaining zeros of integer part
+
+2:
+ cmp r4,#0
+ blt 1f
+ bl ozero
+ subs r4,#1
+ b 2b
+1:
+
+@ i (r4) no longer used
+
+@   if(f0) { // remove trailing zeros and decimal point?
+@     if(fmt&0x10000000) while(p[-1]=='0') p--;
+@     if(p[-1]=='.') p--;
+@     *p=0;
+@     }
+
+ lsrs r4,r5,#2
+ bcc 1f
+ lsrs r4,r2,#29
+ bcc 2f
+3:
+ subs r1,#1
+ ldrb r4,[r1]
+ cmp r4,#'0'
+ beq 3b
+ adds r1,#1
+2:
+ subs r1,#1
+ ldrb r4,[r1]
+ cmp r4,#'.'
+ beq 4f
+ adds r1,#1
+4:
+1:
+ pop {r4}
+
+@ now:
+@ r0
+@ r1: output pointer
+@ r2: format
+@ r3
+@ r4: decimal exponent e10
+@ r5b0: ff
+@ r6:
+@ r7:
+
+@   if(!ff) { // output exponent?
+
+ lsrs r5,#1
+ bcs 10f
+
+@     *p++='E';
+
+ movs r3,#'E'
+ bl och
+
+@     if(e10<0) *p++='-',e10=-e10; // output exponent sign
+@     else {
+@            if(fmt&0x04000000) *p++=' ';
+@       else if(fmt&0x08000000) *p++='+';
+@       }
+
+ cmp r4,#0
+ bge 2f
+ rsbs r4,#0
+ movs r3,#'-'
+ b 3f
+2:
+ movs r3,#' '
+ lsrs r6,r2,#27
+ bcs 3f
+ movs r3,#'+'
+ lsrs r6,r2,#28
+ bcc 4f
+3:
+ bl och
+4:
+
+@     m=(e10*0xcd)>>11; // tens digit of exponent
+@     *p++='0'+m;
+@     e10-=m*10;        // units digit of exponent
+@     *p++='0'+e10;
+
+ movs r3,#0xcd
+ muls r3,r4
+ lsrs r3,#11
+ movs r0,#10
+ muls r0,r3
+ bl odig
+ subs r3,r4,r0
+ bl odig
+
+@     }
+
+10:
+
+@   *p++=0;
+
+ movs r3,#0
+ bl och
+
+@   }
+
+ pop {r4-r7,r15}
+
+
+
+
+
+@ Convert string pointed to by p into float, stored at f. On failure
+@ return 1; on success, return 0 and store pointer to first non-converted
+@ character at endptr if endptr!=0.
+
+@ #define ISDIG(x) ((x)>='0'&&(x)<='9')
+
+isdig: @ convert ASCII to digit
+ subs r2,#'0'
+ cmp r2,#10 @ clear carry if digit
+ bx r14
+
+@ int str2float(float*f,char*p,char**endptr) {
+
+qfp_str2float:
+
+@   if(*p=='+') p++;
+@   else if(*p=='-') sm=0x80000000,p++; // capture mantissa sign
+
+ push {r0,r2,r4-r7,r14}
+ movs r7,#0
+ ldrb r2,[r1]
+ cmp r2,#'+'
+ beq 1f
+ cmp r2,#'-'
+ bne 2f
+ movs r7,#1
+1:
+ adds r1,#1
+2:
+ movs r0,#0 @ mantissa
+ movs r3,#0 @ f0: have we seen a '.'?
+ movs r5,#0 @ exponent
+ movs r6,#0 @ count of mantissa digits processed
+
+@ r0: m
+@ r1: input pointer
+@ r3: f0
+@ r5: e
+@ r6: d
+@ r7b0: sm
+@ stack: output pointer, end pointer
+
+@   for(;;) {
+@     if(f0==0&&*p=='.') {f0=1; p++; continue;}
+@     if(!ISDIG(*p)) goto l0; // break out on non-digit
+@     if(m<0x10000000) { // accumulate digits (up to about 8 significant figures)
+@       m=m*10+*p-'0';
+@       if(f0==1) e--; // decrement exponent if we are past the decimal point
+@     } else if(f0==0) e++; // just increment exponent after we have captured enough significance in m
+@     d++;
+@     p++;
+@     }
+@ l0:
+
+2:
+ ldrb r2,[r1]
+ cmp r2,#'.'
+ bne 1f
+ cmp r3,#0
+ bne 1f
+ movs r3,#1
+ b 3f
+1:
+ bl isdig
+ bcs 4f
+ lsrs r4,r0,#28
+ bne 5f
+ movs r4,#10
+ muls r0,r4
+ add r0,r2
+ subs r5,#1
+5:
+ adds r5,#1
+ subs r5,r3
+ adds r6,#1
+3:
+ adds r1,#1
+ b 2b
+4:
+
+@   if(d==0) return 1; // no digits seen: error
+
+ cmp r6,#0
+ bne 1f
+ movs r0,#1
+ pop {r2-r7,r15}
+
+@ f0 (r3) no longer used
+@ d (r6) no longer used
+
+@   e10=0; // decimal exponent
+
+1:
+ movs r3,#0
+
+@   if(*p=='e'||*p=='E') { // exponent given?
+@     se=0;
+@     p++;
+@     if(*p=='+') p++; 
+@     else if(*p=='-') se=1,p++; // capture exponent sign
+@     while(ISDIG(*p)) { // capture exponent digits
+@       if(e10<0x01000000) e10=e10*10+*p-'0'; // prevent overflow
+@       p++;
+@       }
+@     if(se) e10=-e10; // apply exponent sign
+@     }
+
+ mov r6,r1 @ save r1
+ ldrb r2,[r1]
+ cmp r2,#'e'
+ beq 1f
+ cmp r2,#'E'
+ bne 2f
+1:
+ adds r1,#1
+ ldrb r2,[r1]
+ cmp r2,#'+'
+ beq 3f
+ cmp r2,#'-'
+ bne 4f
+ adds r7,#2 @ se in r7b1
+3:
+ adds r1,#1
+ ldrb r2,[r1]
+4:
+ bl isdig
+ bcc 6f
+ mov r1,r6 @ E without following digits: restore r1
+ b 2f
+6:
+ lsrs r4,r3,#24
+ bne 5f
+ movs r4,#10
+ muls r3,r4
+ add r3,r2
+5:
+ adds r1,#1
+ ldrb r2,[r1]
+ bl isdig
+ bcc 6b
+ cmp r7,#2
+ blo 2f
+ rsbs r3,#0
+2:
+
+@   if(m==0) goto l2; // zero? then we have finished
+
+ movs r2,#0
+ cmp r0,#0
+ beq 11f
+
+@   e10+=e; // offset e by captured exponent
+@   if(e10> 127) e10=127; // clip overflows: 10^127 will be converted later to Inf, 10^-128 to zero
+@   if(e10<-128) e10=-128;
+
+ add r3,r5
+ lsls r4,r3,#2 @ temporarily set e2 to e10*4: this will cause subsequent conversion to Inf/zero if required
+ sxtb r5,r3
+ cmp r5,r3
+ bne 12f @ not equal to its sign-extended version?
+
+@ e (r5) no longer used
+
+@ r0: m
+@ r1: input pointer
+@ r3: e10
+@ r7b0: sm
+@ stack: output pointer, end pointer
+
+@   e2=31; // binary exponent
+@ plan is to manipulate m, e2 and e10 so as to take e10 to zero, while maintaining the
+@ invariant m * 2^e2 * 10^e10
+
+ movs r4,#31
+
+@   while(m<0x40000000) m+=m,e2--; // normalise so m is now 0x40000000..0xa0000000
+
+2:
+ lsrs r2,r0,#30
+ bne 1f
+ lsls r0,#1
+ subs r4,#1
+ b 2b
+1:
+
+@   while(e10<0) { // add 3 to e10, take 10 off e2 and multiply m by 1024/1000=128/125
+@     m=div125(m);
+@     e10+=3; e2-=10;
+@     if(m>=0x80000000) m>>=1,e2++;
+@     } // now e10 >= 0
+
+2:
+ cmp r3,#0
+ bge 1f
+ bl div125
+ adds r3,#3
+ subs r4,#10
+ lsrs r2,r0,#31
+ beq 2b
+ lsrs r0,#1
+ adds r4,#1
+ b 2b
+1:
+
+@   while(e10>2) { // take 3 off e10, add 10 to e2 and multiply m by 1000/1024=125/128
+@     m0=(m>>6)+(m>>5);
+@     m-=(m0>>1)+(m0&1); // *125/128
+@     e10-=3; e2+=10;
+@     } // now 0 <= e10 < 3
+
+2:
+ cmp r3,#2
+ ble 1f
+ lsrs r2,r0,#6
+ lsrs r5,r0,#5
+ add r2,r5
+ movs r5,#0
+ lsrs r2,#1
+ adcs r2,r5
+ subs r0,r2
+ subs r3,#3
+ adds r4,#10
+ b 2b
+1:
+
+@   while(e10>0) { // take 1off e10, add 3 to e2 and multiply m by 10/8 = 5/4
+@     m0=(m>>1);
+@     m+=(m0>>1)+(m0&1); // *5/4
+@     e10-=1; e2+=3;
+@     } // now e10==0
+
+2:
+ cmp r3,#0
+ ble 1f
+ lsrs r2,r0,#1
+ lsrs r2,#1
+ adcs r0,r2
+ subs r3,#1
+ adds r4,#3
+ b 2b
+1:
+
+@ e10 (r3) no longer used
+
+@   while(m<0x80000000) m+=m,e2--; // renormalise m so MSB is set
+
+ cmp r0,#0
+ blt 1f
+2:
+ subs r4,#1
+ adds r0,r0
+ bpl 2b
+1:
+
+@   m=((m>>7)+1)>>1; // to 24 bits, with rounding
+
+ lsrs r0,#7
+ adds r0,#1
+ lsrs r0,#1
+
+@   if(m==0x01000000) m>>=1,e2++; // has rounding pushed m to 25 bits? renormalise if so
+
+ lsrs r2,r0,#24
+ beq 1f
+ lsrs r0,#1
+ adds r4,#1
+1:
+
+@   e2+=127; // add exponent offset
+
+12:
+ movs r2,#0
+ movs r3,#0
+ adds r4,#127
+
+@   if(e2<=0) {m=0; goto l1;} // too small? flush to zero
+
+ ble 10f
+
+@   if(e2>=255) {m=0x7f800000; goto l1;} // too big? make infinity
+
+ movs r3,#255
+ cmp r4,#255
+ bge 10f
+
+@   m&=0x007fffff; // remove implied 1
+
+ lsls r2,r0,#9
+ lsrs r2,#9
+ mov r3,r4
+
+@   m|=e2<<23; // insert exponent bits
+
+10:
+ lsls r3,#23
+ orrs r2,r3
+
+@   m|=sm; // apply mantissa sign
+
+11:
+ lsls r7,#31
+ orrs r2,r7
+
+@   *f=*(float*)&m; // write output
+@   if(end) *end=p;
+
+ pop {r0,r3}
+ str r2,[r0]
+ cmp r3,#0
+ beq 1f
+ str r1,[r3]
+1:
+
+@   return 0;
+
+ movs r0,#0
+ pop {r4-r7,r15}
+
+@   }

+ 54 - 0
qfplib.h

@@ -0,0 +1,54 @@
+// Copyright 2015 Mark Owen
+// http://www.quinapalus.com
+// E-mail: qfp@quinapalus.com
+//
+// Thanks to Bill Westfield
+//
+// This file is free software: you can redistribute it and/or modify
+// it under the terms of version 2 of the GNU General Public License
+// as published by the Free Software Foundation.
+//
+// This file is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with this file.  If not, see <http://www.gnu.org/licenses/> or
+// write to the Free Software Foundation, Inc., 51 Franklin Street,
+// Fifth Floor, Boston, MA  02110-1301, USA.
+
+#ifndef _QFPLIB_H_
+#define _QFPLIB_H_
+
+#ifdef __cplusplus
+  extern "C" {
+#endif
+
+extern          float qfp_fadd(float x,float y);
+extern          float qfp_fsub(float x,float y);
+extern          float qfp_fmul(float x,float y);
+extern          float qfp_fdiv(float x,float y);
+extern          float qfp_fdiv_fast(float x,float y);
+extern          int   qfp_float2int(float x);
+extern          int   qfp_float2fix(float x,int y);
+extern unsigned int   qfp_float2uint(float x);
+extern unsigned int   qfp_float2ufix(float x,int y);
+extern          float qfp_int2float(int x);
+extern          float qfp_fix2float(int x,int y);
+extern          float qfp_uint2float(unsigned int x);
+extern          float qfp_ufix2float(unsigned int x,int y);
+extern          int   qfp_fcmp(float x,float y);
+extern          float qfp_fcos(float x);
+extern          float qfp_fsin(float x);
+extern          float qfp_ftan(float x);
+extern          float qfp_fatan2(float y,float x);
+extern          float qfp_fexp(float x);
+extern          float qfp_fln(float x);
+extern          float qfp_fsqrt(float x);
+extern          float qfp_fsqrt_fast(float x);
+
+#ifdef __cplusplus
+  } // extern "C"
+#endif
+#endif

+ 855 - 0
qfplib.s

@@ -0,0 +1,855 @@
+@ Copyright 2015-2020 Mark Owen
+@ http://www.quinapalus.com
+@ E-mail: qfp@quinapalus.com
+@
+@ This file is free software: you can redistribute it and/or modify
+@ it under the terms of version 2 of the GNU General Public License
+@ as published by the Free Software Foundation.
+@
+@ This file is distributed in the hope that it will be useful,
+@ but WITHOUT ANY WARRANTY; without even the implied warranty of
+@ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+@ GNU General Public License for more details.
+@
+@ You should have received a copy of the GNU General Public License
+@ along with this file.  If not, see <http://www.gnu.org/licenses/> or
+@ write to the Free Software Foundation, Inc., 51 Franklin Street,
+@ Fifth Floor, Boston, MA  02110-1301, USA.
+
+@ Thanks to Li Ling for optimisation suggestions and discussion.
+
+@.equ include_faster,0        @ include fast divide and square root?
+@.equ include_conversions,1   @ include float <-> fixed point conversion functions?
+@.equ include_scientific,1    @ include trignometic, exponential etc. functions?
+
+.ifndef include_faster
+.equ include_faster,1
+.endif
+
+.ifndef include_conversions
+.equ include_conversions,1
+.endif
+
+.ifndef include_scientific
+.equ include_scientific,1
+.endif
+
+.if include_scientific
+.equ include_conversions,1
+.endif
+
+.syntax unified
+.cpu cortex-m0
+.thumb
+
+@ exported symbols
+
+.global qfp_fadd
+.global qfp_fsub
+.global qfp_fmul
+.global qfp_fdiv
+.global qfp_fcmp
+.if include_conversions
+.global qfp_float2int
+.global qfp_float2fix
+.global qfp_float2uint
+.global qfp_float2ufix
+.global qfp_int2float
+.global qfp_fix2float
+.global qfp_uint2float
+.global qfp_ufix2float
+.endif
+.if include_scientific
+.global qfp_fcos
+.global qfp_fsin
+.global qfp_ftan
+.global qfp_fatan2
+.global qfp_fexp
+.global qfp_fln
+.global qfp_fsqrt
+.endif
+
+.if include_faster
+.global qfp_fdiv_fast
+.global qfp_fsqrt_fast
+.endif
+
+@ exchange r0<->r1, r2<->r3
+xchxy:
+ push {r0,r2,r14}
+ mov r0,r1
+ mov r2,r3
+ pop {r1,r3,r15}
+
+@ IEEE single precision floats in r0,r1-> mantissae in r1,r0 exponents in r3,r2 *respectively*
+@ trashes r4
+unpackxy:
+ push {r14}
+ bl unpackx
+ bl xchxy
+ pop {r4}
+ mov r14,r4
+
+@ IEEE single in r0-> signed (two's complemennt) mantissa in r0 9Q23 (24 significant bits), signed exponent (bias removed) in r2
+@ trashes r4; zero, denormal -> mantissa=+/-1, exponent=-380; Inf, NaN -> mantissa=+/-1, exponent=+640
+unpackx:
+ lsrs r2,r0,#23 @ save exponent and sign
+ lsls r0,#9     @ extract mantissa
+ lsrs r0,#9
+ movs r4,#1
+ lsls r4,#23
+ orrs r0,r4     @ reinstate implied leading 1
+ cmp r2,#255    @ test sign bit
+ uxtb r2,r2     @ clear it
+ bls 1f         @ branch on positive
+ rsbs r0,#0     @ negate mantissa
+1:
+ subs r2,#1
+ cmp r2,#254    @ zero/denormal/Inf/NaN?
+ bhs 2f
+ subs r2,#126   @ remove exponent bias: can now be -126..+127
+ bx r14
+
+2:              @ here with special-case values
+ cmp r0,#0
+ mov r0,r4      @ set mantissa to +1
+ bpl 3f
+ rsbs r0,#0     @ zero/denormal/Inf/NaN: mantissa=+/-1
+3:
+ subs r2,#126   @ zero/denormal: exponent -> -127; Inf, NaN: exponent -> 128
+ lsls r2,#2     @ zero/denormal: exponent -> -508; Inf, NaN: exponent -> 512
+ adds r2,#128   @ zero/denormal: exponent -> -380; Inf, NaN: exponent -> 640
+ bx r14
+
+@ normalise and pack signed mantissa in r0 nominally 3Q29, signed exponent in r2-> IEEE single in r0
+@ trashes r4, preserves r1,r3
+@ r5: "sticky bits", must be zero iff all result bits below r0 are zero for correct rounding
+packx:
+ lsrs r4,r0,#31 @ save sign bit
+ lsls r4,r4,#31 @ sign now in b31
+ bpl 2f         @ skip if positive
+ cmp r5,#0
+ beq 11f
+ adds r0,#1     @ fiddle carry in to following rsb if sticky bits are non-zero
+11:
+ rsbs r0,#0     @ can now treat r0 as unsigned
+packx0:
+ bmi 3f         @ catch r0=0x80000000 case
+2:
+ subs r2,#1     @ normalisation loop
+ adds r0,r0
+ beq 1f         @ zero? special case
+ bpl 2b         @ normalise so leading "1" in bit 31
+3:
+ adds r2,#129   @ (mis-)offset exponent
+ bne 12f        @ special case: highest denormal can round to lowest normal
+ adds r0,#0x80  @ in special case, need to add 256 to r0 for rounding
+ bcs 4f         @ tripped carry? then have leading 1 in C as required
+12:
+ adds r0,#0x80  @ rounding
+ bcs 4f         @ tripped carry? then have leading 1 in C as required (and result is even so can ignore sticky bits)
+ cmp r5,#0
+ beq 7f         @ sticky bits zero?
+8:
+ lsls r0,#1     @ remove leading 1
+9:
+ subs r2,#1     @ compensate exponent on this path
+4:
+ cmp r2,#254
+ bge 5f         @ overflow?
+ adds r2,#1     @ correct exponent offset
+ ble 10f        @ denormal/underflow?
+ lsrs r0,#9     @ align mantissa
+ lsls r2,#23    @ align exponent
+ orrs r0,r2     @ assemble exponent and mantissa
+6:
+ orrs r0,r4     @ apply sign
+1:
+ bx r14
+
+5:
+ movs r0,#0xff  @ create infinity
+ lsls r0,#23
+ b 6b
+
+10:
+ movs r0,#0     @ create zero
+ bx r14
+
+7:              @ sticky bit rounding case
+ lsls r5,r0,#24 @ check bottom 8 bits of r0
+ bne 8b         @ in rounding-tie case?
+ lsrs r0,#9     @ ensure even result
+ lsls r0,#10
+ b 9b
+
+@ unpack two arguments (r0,r1) and shift one down to have common exponent, returned in r2; note that arguments are exchanged
+@ sticky bits shifted off bottom of smaller argument in r5
+@ following code is unnecessarily general for fadd, but is shared with atan2
+unpackxyalign:
+ push {r14}
+ bl unpackxy
+ lsls r0,r0,#6  @ Q29
+ lsls r1,r1,#6  @ Q29
+ subs r4,r2,r3  @ calculate shift
+ bge 1f         @ x>=y?
+ mov r2,r3      @ no: take common exponent from y
+ mov r5,r0      @ potential sticky bits from x
+ rsbs r4,#0     @ make shift positive
+ asrs r0,r4
+ cmp r4,#32
+ blo 2f
+ movs r0,#0    @ large shift, so all bits are sticky and result is zero
+ pop {r15}
+1:
+ mov r5,r1     @ common exponent from x; potential sticky bits from y
+ asrs r1,r4
+ cmp r4,#32
+ blo 2f
+ movs r1,#0    @ large shift, so all bits are sticky and result is zero
+ pop {r15}
+2:
+ rsbs r4,#0
+ adds r4,#32
+ lsls r5,r4    @ extract sticky bits
+ pop {r15}
+
+.thumb_func
+qfp_fsub:
+ movs r2,#1    @ subtract: flip sign bit of second argument and fall through to fadd
+ lsls r2,#31
+ eors r1,r2
+.thumb_func
+qfp_fadd:
+ push {r4,r5,r14}
+ bl unpackxyalign
+ adds r0,r1    @ do addition
+ bne 2f        @ not in Inf-Inf case?
+ cmp r2,#200
+ blt 2f
+ movs r0,#1
+ lsls r0,#29   @ for Inf-Inf, set mantissa to +1 to prevent zero result
+2:
+packret:       @ common return point: "pack and return"
+ bl packx
+ pop {r4,r5,r15}
+
+@ signed multiply r0 1Q23 by r1 4Q23, result in r0 7Q25, sticky bits in r5
+@ trashes r3,r4
+mul0:
+ uxth r3,r0      @ Q23
+ asrs r4,r1,#16  @ Q7
+ muls r3,r4      @ L*H, Q30 signed
+ asrs r4,r0,#16  @ Q7
+ uxth r5,r1      @ Q23
+ muls r4,r5      @ H*L, Q30 signed
+ adds r3,r4      @ sum of middle partial products
+ uxth r4,r0
+ muls r4,r5      @ L*L, Q46 unsigned
+ lsls r5,r4,#16  @ initialise sticky bits from low half of low partial product
+ lsrs r4,#16     @ Q25
+ adds r3,r4      @ add high half of low partial product to sum of middle partial products
+                 @ (cannot generate carry by limits on input arguments)
+ asrs r0,#16     @ Q7
+ asrs r1,#16     @ Q7
+ muls r0,r1      @ H*H, Q14 signed
+ lsls r0,#11     @ high partial product Q25
+ lsls r1,r3,#27  @ sticky
+ orrs r5,r1      @ collect further sticky bits
+ asrs r1,r3,#5   @ middle partial products Q25
+ adds r0,r1      @ final result
+ bx r14
+
+.thumb_func
+qfp_fcmp:
+ movs r2,#1      @ initialise result
+ lsls r3,r2,#31  @ r3=0x80000000
+ tst r0,r3       @ check sign of first argument
+ beq 1f
+ subs r0,r3,r0   @ convert to 2's complement form for direct comparison
+1:
+ tst r1,r3       @ repeat for second argument
+ beq 2f
+ subs r1,r3,r1
+2:
+ subs r0,r1     @ perform comparison
+ beq 4f         @ equal? return 0
+ bgt 3f         @ r0>r1? return +1
+ rsbs r2,#0     @ r0<r1: return -1
+3:
+ mov r0,r2
+4:
+ bx r14
+
+.thumb_func
+qfp_fmul:
+ push {r4,r5,r14}
+ bl unpackxy
+ add r2,r3      @ sum exponents
+ adds r2,#4     @ adjust exponent for pack which expects Q29
+fmul0:
+ bl mul0
+ b packret
+
+.thumb_func
+qfp_fdiv:
+ push {r4,r5,r14}
+fdiv_n:
+ bl unpackxy
+ movs r5,#1      @ result cannot fall exactly half-way between two representable numbers (exercise for reader - note that
+                 @ we do not handle denormals) so there will always be sticky bits
+ cmp r0,#0       @ check divisor sign
+ bpl 1f
+ rsbs r0,#0      @ ensure divisor positive
+ rsbs r1,#0      @ preserve sign of result
+1:
+ movs r4,#0
+ cmp r1,#0       @ check sign of dividend
+ bpl 2f
+ rsbs r1,#0      @ result will be negative
+ mvns r4,r4      @ save sign as 0x00000000 or 0xffffffff
+2:               @ now do unsigned division on unpacked values {r1,r3}/{r0,r2}
+ cmp r3,#200     @ dividend is an infinity? return it
+ bge 3f
+ rsbs r2,#0
+ cmp r2,#200     @ divisor is zero? return infinity
+ bge 3f
+ adds r2,r3      @ difference of exponents
+ movs r3,#0x40   @ termination marker (calculate enough bits to do rounding correctly)
+2:               @ division loop
+ subs r1,r0      @ trial subtraction
+ bcs 1f
+ add r1,r0       @ restore if failed
+1:
+ adcs r3,r3      @ shift in result bit
+ add r1,r1       @ shift up dividend
+ bcc 2b          @ loop until marker appears in carry
+ lsls r0,r3,#4   @ align for packing
+4:
+ eors r0,r4      @ apply sign
+ b packret
+infret:
+ movs r4,#0
+3:
+ mov r0,r1
+ movs r2,#255    @ return infinity
+ b 4b
+
+.if include_faster
+
+@ The fast divide routine uses an initial approximation to the reciprocal of the divisor based on the top four bits of the mantissa
+@ followed by three Newton-Raphson iterations, resulting in about 27 bits of accuracy. This reciprocal is then multiplied by
+@ the dividend.
+@ The (fixed-point) reciprocal calculation is carefully implemented to preserve the necessary accuracy throughout. In places
+@ the implied binary point is not within the stored value. For example where "Q47" is shown below it means that the least
+@ significant bit of the value has significance 2^-47 and hence the most significant bit has significance 2^-16. In these
+@ cases the value is known to be very close to an integer (usually 1) and so the bits of greater significance do not need
+@ to be stored.
+@ The reciprocal calculation has been tested for all possible input mantissa values.
+.thumb_func
+qfp_fdiv_fast:
+ push {r4,r5,r14}
+fdiv_fast_n:
+ bl unpackxy
+ cmp r0,#0
+ bpl 1f
+ rsbs r0,#0
+ rsbs r1,#0      @ ensure divisor positive
+1:
+ cmp r3,#200
+ bge infret      @ dividend is an infinity? return it
+ rsbs r2,#0
+ cmp r2,#200     @ divisor is zero?
+ bge infret      @ return infinity
+ adds r2,r3      @ difference of exponents
+ adr r4,rcpapp-8 @ the first 8 elements of the table are never accessed because of the mantissa's leading 1
+ lsrs r3,r0,#20  @ y Q23; y>>20 Q7
+ ldrb r4,[r4,r3] @ m=rcpapp[(y>>20)&7]; // Q8, .5<m<1
+
+ lsls r3,r4,#2   @ m<<2         // Q10  first Newton-Raphson iteration
+ muls r3,r0      @ s=y*(m<<2);  // Q33
+ lsls r4,#8      @ m<<=8;       // Q16
+ asrs r3,#21     @ s>>=21;      // Q12
+ muls r3,r4      @ s*=m;        // Q28
+ asrs r3,#12     @ s>>=12;      // Q16
+ subs r4,r3      @ m=m-s;       // Q16
+
+ mov r3,r4       @ s=y*m        // Q39 second Newton-Raphson iteration
+ muls r4,r0      @ ...
+ asrs r4,#16     @ s>>=16;      // Q23
+ muls r4,r3      @ s*=m;        // Q39
+ lsls r3,#8      @ m<<=8;       // Q24
+ asrs r4,#15     @ s>>=15;      // Q24
+ subs r3,r4      @ m=m-s;       // Q24
+
+ lsls r4,r3,#7   @ \/ s=y*m;    // Q47 third Newton-Raphson iteration
+ muls r3,r0      @ /\ m<<=7;    // Q31
+ asrs r3,#15     @ s>>=15;      // Q32
+ lsrs r0,r4,#16  @ s*=(m>>16);  // Q47
+ muls r3,r0      @ ...
+ asrs r3,#16     @ s>>=16;      // Q31
+ subs r0,r4,r3   @ m=m-s;       // Q31
+div0:
+ adds r0,#7      @ rounding; reduce systematic error
+ lsrs r0,#4      @ Q27
+ b fmul0         @ drop into multiplication code to calculate result
+
+@ The fast square root routine uses an initial approximation to the reciprocal of the square root of the argument based
+@ on the top four bits of the mantissa (possibly shifted one place to make the exponent even). It then performs three
+@ Newton-Raphson iterations, resulting in about 28-29 bits of accuracy. This reciprocal is then multiplied by
+@ the original argument to produce the result.
+@ Again, the fixed-point calculation is carefully implemented to preserve accuracy, and similar comments to those
+@ made above on the fast division routine apply.
+@ The reciprocal square root calculation has been tested for all possible (possibly shifted) input mantissa values.
+.thumb_func
+qfp_fsqrt_fast:
+ push {r4,r5,r14}
+ bl unpackx
+ movs r1,r0
+ bmi infret       @ negative? return -Inf
+ asrs r0,r2,#1    @ check LSB of exponent
+ bcc 1f
+ lsls r1,#1       @ was odd: double mantissa; mantissa y now 1..4 Q23
+1:
+ adds r2,#4       @ correction for packing
+ adr r4,rsqrtapp-4@ first four table entries are never accessed because of the mantissa's leading 1
+ lsrs r3,r1,#21   @ y>>21 Q2
+ ldrb r4,[r4,r3]  @ initial approximation to reciprocal square root m Q8
+
+ lsrs r0,r1,#7    @ y>>7             // Q16 first Newton-Raphson iteration
+ muls r0,r4       @ m*y
+ muls r0,r4       @ s=m*y*y          // Q32
+ asrs r0,#12      @ s>>12
+ muls r0,r4       @ m*s              // Q28
+ asrs r0,#13      @ m*s              // Q15
+ lsls r4,#8       @ m                // Q16
+ subs r4,r0       @ m=(m<<8)-(s>>13) // Q16-Q15/2 -> Q16
+
+ mov r0,r4        @                  // second Newton-Raphson iteration
+ muls r0,r0       @ u=m*m            // Q32
+ lsrs r0,#16      @ u>>16            // Q16
+ lsrs r3,r1,#7    @ y>>7             // Q16
+ muls r0,r3       @ s=u*(y>>7)       // Q32
+ asrs r0,#12      @ s>>12            // Q20
+ muls r0,r4       @ s*m              // Q36
+ asrs r0,#21      @ s*m              // Q15
+ subs r4,r0       @ m=m-s            // Q16-Q15/2
+
+ mov r0,r4        @                  // third Newton-Raphson iteration
+ muls r0,r0       @ u=m*m            // Q32
+ lsrs r3,r0,#12   @ now multiply u and y in two parts: u>>12
+ muls r3,r1       @ first partial product (u>>12)*y Q43
+ lsls r0,#20
+ lsrs r0,#20      @ u&0xfff
+ lsrs r5,r1,#12   @ y>>12
+ muls r0,r5       @ second partial product (u&0xfff)*(y>>12) Q43
+ add r0,r3        @ s=u*y            // Q43
+ asrs r0,#15      @ s>>15            // Q28
+ muls r0,r4       @ (s>>15)*m        // Q44
+ lsls r4,#13      @ m<<13            // Q29
+ asrs r0,#16      @ s>>16            // Q28
+ subs r0,r4,r0    @                  // Q29-Q28/2
+
+ asrs r2,#1       @ halve exponent
+ bcc div0         @ was y shifted?
+ lsrs r0,#1
+ lsls r1,#1       @ shift y back
+ b div0           @ round and complete with multiplication
+
+.align 2
+
+@ round(2^15./[136:16:248])
+rcpapp:
+.byte 0xf1,0xd8,0xc3,0xb2, 0xa4,0x98,0x8d,0x84
+
+@ round(sqrt(2^22./[72:16:248]))
+rsqrtapp:
+.byte 0xf1,0xda,0xc9,0xbb, 0xb0,0xa6,0x9e,0x97, 0x91,0x8b,0x86,0x82
+
+.endif
+
+.if include_conversions
+
+@ convert float to signed int, rounding towards -Inf, clamping
+.thumb_func
+qfp_float2int:
+ movs r1,#0      @ fall through
+
+@ convert float in r0 to signed fixed point in r0, clamping
+.thumb_func
+qfp_float2fix:
+ push {r4,r14}
+ bl unpackx
+ add r2,r1       @ incorporate binary point position into exponent
+ subs r2,#23     @ r2 is now amount of left shift required
+ blt 1f          @ requires right shift?
+ cmp r2,#7       @ overflow?
+ ble 4f
+3:               @ overflow
+ asrs r1,r0,#31  @ +ve:0 -ve:0xffffffff
+ mvns r1,r1      @ +ve:0xffffffff -ve:0
+ movs r0,#1
+ lsls r0,#31
+5:
+ eors r0,r1      @ +ve:0x7fffffff -ve:0x80000000 (unsigned path: 0xffffffff)
+ pop {r4,r15}
+1:
+ rsbs r2,#0      @ right shift for r0, >0
+ cmp r2,#32
+ blt 2f          @ more than 32 bits of right shift?
+ movs r2,#32
+2:
+ asrs r0,r0,r2
+ pop {r4,r15}
+
+@ unsigned version
+.thumb_func
+qfp_float2uint:
+ movs r1,#0      @ fall through
+.thumb_func
+qfp_float2ufix:
+ push {r4,r14}
+ bl unpackx
+ add r2,r1       @ incorporate binary point position into exponent
+ movs r1,r0
+ bmi 5b          @ negative? return zero
+ subs r2,#23     @ r2 is now amount of left shift required
+ blt 1b          @ requires right shift?
+ mvns r1,r0      @ ready to return 0xffffffff
+ cmp r2,#8       @ overflow?
+ bgt 5b
+4:
+ lsls r0,r0,r2   @ result fits, left shifted
+ pop {r4,r15}
+
+@ convert signed int to float, rounding
+.thumb_func
+qfp_int2float:
+ movs r1,#0      @ fall through
+
+@ convert signed fix to float, rounding; number of r0 bits after point in r1
+.thumb_func
+qfp_fix2float:
+ push {r4,r5,r14}
+1:
+ movs r2,#29
+ subs r2,r1      @ fix exponent
+packretns:       @ pack and return, sticky bits=0
+ movs r5,#0
+ b packret
+
+@ unsigned version
+.thumb_func
+qfp_uint2float:
+ movs r1,#0      @ fall through
+.thumb_func
+qfp_ufix2float:
+ push {r4,r5,r14}
+ cmp r0,#0
+ bge 1b          @ treat <2^31 as signed
+ movs r2,#30
+ subs r2,r1      @ fix exponent
+ lsls r5,r0,#31  @ one sticky bit
+ lsrs r0,#1
+ b packret
+
+.endif
+
+.if include_scientific
+
+@ All the scientific functions are implemented using the CORDIC algorithm. For notation,
+@ details not explained in the comments below, and a good overall survey see
+@ "50 Years of CORDIC: Algorithms, Architectures, and Applications" by Meher et al.,
+@ IEEE Transactions on Circuits and Systems Part I, Volume 56 Issue 9.
+
+@ Register use:
+@ r0: x
+@ r1: y
+@ r2: z/omega
+@ r3: coefficient pointer
+@ r4,r12: m
+@ r5: i (shift)
+
+cordic_start: @ initialisation
+ movs r5,#0   @ initial shift=0
+ mov r12,r4
+ b 5f
+
+cordic_vstep: @ one step of algorithm in vector mode
+ cmp r1,#0    @ check sign of y
+ bgt 4f
+ b 1f
+cordic_rstep: @ one step of algorithm in rotation mode
+ cmp r2,#0    @ check sign of angle
+ bge 1f
+4:
+ subs r1,r6   @ negative rotation: y=y-(x>>i)
+ rsbs r7,#0
+ adds r2,r4   @ accumulate angle
+ b 2f
+1:
+ adds r1,r6   @ positive rotation: y=y+(x>>i)
+ subs r2,r4   @ accumulate angle
+2:
+ mov r4,r12
+ muls r7,r4   @ apply sign from m
+ subs r0,r7   @ finish rotation: x=x{+/-}(y>>i)
+5:
+ ldmia r3!,{r4}   @ fetch next angle from table and bump pointer
+ lsrs r4,#1   @ repeated angle?
+ bcs 3f
+ adds r5,#1   @ adjust shift if not
+3:
+ mov r6,r0
+ asrs r6,r5   @ x>>i
+ mov r7,r1
+ asrs r7,r5   @ y>>i
+ lsrs r4,#1   @ shift end flag into carry
+ bx r14
+
+@ CORDIC rotation mode
+cordic_rot:
+ push {r6,r7,r14}
+ bl cordic_start   @ initialise
+1:
+ bl cordic_rstep
+ bcc 1b            @ step until table finished
+ asrs r6,r0,#14    @ remaining small rotations can be linearised: see IV.B of paper referenced above
+ asrs r7,r1,#14
+ asrs r2,#3
+ muls r6,r2        @ all remaining CORDIC steps in a multiplication
+ muls r7,r2
+ mov r4,r12
+ muls r7,r4
+ asrs r6,#12
+ asrs r7,#12
+ subs r0,r7        @ x=x{+/-}(yz>>k)
+ adds r1,r6        @ y=y+(xz>>k)
+cordic_exit:
+ pop {r6,r7,r15}
+
+@ CORDIC vector mode
+cordic_vec:
+ push {r6,r7,r14}
+ bl cordic_start   @ initialise
+1:
+ bl cordic_vstep
+ bcc 1b            @ step until table finished
+4:
+ cmp r1,#0         @ continue as in cordic_vstep but without using table; x is not affected as y is small
+ bgt 2f            @ check sign of y
+ adds r1,r6        @ positive rotation: y=y+(x>>i)
+ subs r2,r4        @ accumulate angle
+ b 3f
+2:
+ subs r1,r6        @ negative rotation: y=y-(x>>i)
+ adds r2,r4        @ accumulate angle
+3:
+ asrs r6,#1
+ asrs r4,#1        @ next "table entry"
+ bne 4b
+ b cordic_exit
+
+.thumb_func
+qfp_fsin:            @ calculate sin and cos using CORDIC rotation method
+ push {r4,r5,r14}
+ movs r1,#24
+ bl qfp_float2fix    @ range reduction by repeated subtraction/addition in fixed point
+ ldr r4,pi_q29
+ lsrs r4,#4          @ 2pi Q24
+1:
+ subs r0,r4
+ bge 1b
+1:
+ adds r0,r4
+ bmi 1b              @ now in range 0..2pi
+ lsls r2,r0,#2       @ z Q26
+ lsls r5,r4,#1       @ pi Q26 (r4=pi/2 Q26)
+ ldr r0,=#0x136e9db4 @ initialise CORDIC x,y with scaling
+ movs r1,#0
+1:
+ cmp r2,r4           @ >pi/2?
+ blt 2f
+ subs r2,r5          @ reduce range to -pi/2..pi/2
+ rsbs r0,#0          @ rotate vector by pi
+ b 1b
+2:
+ lsls r2,#3          @ Q29
+ adr r3,tab_cc       @ circular coefficients
+ movs r4,#1          @ m=1
+ bl cordic_rot
+ adds r1,#9          @ fiddle factor to make sin(0)==0
+ movs r2,#0          @ exponents to zero
+ movs r3,#0
+ movs r5,#0          @ no sticky bits
+ bl packx            @ pack cosine
+ bl xchxy
+ b packretns         @ pack sine
+
+.thumb_func
+qfp_fcos:
+ push {r14}
+ bl qfp_fsin
+ mov r0,r1           @ extract cosine result
+ pop {r15}
+
+
+.thumb_func
+qfp_ftan:
+ push {r4,r5,r14}
+ bl qfp_fsin         @ sine in r0/r2, cosine in r1/r3
+.if include_faster
+ b fdiv_fast_n       @ sin/cos
+.else
+ b fdiv_n
+
+.endif
+
+.thumb_func
+qfp_fexp:            @ calculate cosh and sinh using rotation method; add to obtain exp
+ push {r4,r5,r14}
+ movs r1,#24
+ bl qfp_float2fix    @ Q24: covers entire valid input range
+ asrs r1,r0,#16      @ Q8
+ ldr r2,=#5909       @ log_2(e) Q12
+ muls r1,r2          @ estimate exponent of result Q20
+ asrs r1,#19         @ Q1
+ adds r1,#1          @ rounding
+ asrs r1,#1          @ rounded estimate of exponent of result
+ push {r1}           @ save for later
+ lsls r2,r0,#5       @ Q29
+ ldr r0,=#0x162e42ff @ ln(2) Q29
+ muls r1,r0          @ accurate contribution of estimated exponent
+ subs r2,r1          @ residual to be exponentiated, approximately -.5..+.5 Q29
+ ldr r0,=#0x2c9e15ca @ initialise CORDIC x,y with scaling
+ movs r1,#0
+ adr r3,tab_ch       @ hyperbolic coefficients
+ mvns r4,r1          @ m=-1
+ bl cordic_rot       @ calculate cosh and sinh
+ add r0,r1           @ exp=cosh+sinh
+ pop {r2}            @ recover exponent
+ b packretns         @ pack result
+
+.thumb_func
+qfp_fsqrt:           @ calculate sqrt and ln using vector method
+ push {r4,r5,r14}
+ bl unpackx
+ movs r1,r0          @ -ve argument?
+ bmi 3f              @ return -Inf, -Inf
+ ldr r1,=#0x0593C2B9 @ scale factor for CORDIC
+ bl mul0             @ Q29
+ asrs r1,r2,#1       @ halve exponent
+ bcc 1f
+ adds r1,#1          @ was odd: add 1 and shift mantissa
+ asrs r0,#1
+1:
+ push {r1}           @ save exponent/2 for later
+ mov r1,r0
+ ldr r3,=#0x0593C2B9 @ re-use constant
+ lsls r3,#2
+ adds r0,r3          @ "a+1"
+ subs r1,r3          @ "a-1"
+ movs r2,#0
+ adr r3,tab_ch       @ hyperbolic coefficients
+ mvns r4,r2          @ m=-1
+ bl cordic_vec
+ mov r1,r2           @ keep ln result
+ pop {r2}            @ retrieve exponent/2
+2:
+ movs r3,r2
+ b packretns         @ pack sqrt result
+
+3:
+ movs r2,#255
+ b 2b
+
+.thumb_func
+qfp_fln:
+ push {r4,r5,r14}
+ bl qfp_fsqrt            @ get unpacked ln in r1/r3; exponent has been halved
+ cmp r3,#70              @ ln(Inf)?
+ bgt 3b                  @ return Inf
+ rsbs r3,#0
+ cmp r3,#70
+ bgt 1f                  @ ln(0)? return -Inf
+ ldr r0,=#0x0162e430     @ ln(4) Q24
+ muls r0,r3              @ contribution from negated, halved exponent
+ adds r1,#8              @ round result of ln
+ asrs r1,#4              @ Q24
+ subs r0,r1,r0           @ add in contribution from (negated) exponent
+ movs r2,#5              @ pack expects Q29
+ b packretns
+1:
+ mvns r0,r0              @ make result -Inf
+ b 3b
+
+.thumb_func
+qfp_fatan2:
+ push {r4,r5,r14}
+ bl unpackxyalign        @ convert to fixed point (ensure common exponent, which is discarded)
+ asrs r0,#1
+ asrs r1,#1
+ movs r2,#0              @ initial angle
+ cmp r0,#0               @ x negative
+ bge 5f
+ rsbs r0,#0              @ rotate to 1st/4th quadrants
+ rsbs r1,#0
+ ldr r2,pi_q29           @ pi Q29
+5:
+ adr r3,tab_cc           @ circular coefficients
+ movs r4,#1              @ m=1
+ bl cordic_vec           @ also produces magnitude (with scaling factor 1.646760119), which is discarded
+ mov r0,r2               @ result here is -pi/2..3pi/2 Q29
+ ldr r2,pi_q29           @ pi Q29
+ adds r4,r0,r2           @ attempt to fix -3pi/2..-pi case
+ bcs 6f                  @ -pi/2..0? leave result as is
+ subs r4,r0,r2           @ <pi? leave as is
+ bmi 6f
+ subs r0,r4,r2           @ >pi: take off 2pi
+6:
+ subs r0,#1              @ fiddle factor so atan2(0,1)==0
+ movs r2,#0              @ exponent for pack
+ b packretns
+
+.align 2
+.ltorg
+
+@ first entry in following table is pi Q29
+pi_q29:
+@ circular CORDIC coefficients: atan(2^-i), b0=flag for preventing shift, b1=flag for end of table
+tab_cc:
+.word 0x1921fb54*4+1     @ no shift before first iteration
+.word 0x0ed63383*4+0
+.word 0x07d6dd7e*4+0
+.word 0x03fab753*4+0
+.word 0x01ff55bb*4+0
+.word 0x00ffeaae*4+0
+.word 0x007ffd55*4+0
+.word 0x003fffab*4+0
+.word 0x001ffff5*4+0
+.word 0x000fffff*4+0
+.word 0x0007ffff*4+0
+.word 0x00040000*4+0
+.word 0x00020000*4+0+2   @ +2 marks end
+
+@ hyperbolic CORDIC coefficients: atanh(2^-i), flags as above
+tab_ch:
+.word 0x1193ea7b*4+0
+.word 0x1193ea7b*4+1   @ repeat i=1
+.word 0x082c577d*4+0
+.word 0x04056247*4+0
+.word 0x0200ab11*4+0
+.word 0x0200ab11*4+1   @ repeat i=4
+.word 0x01001559*4+0
+.word 0x008002ab*4+0
+.word 0x00400055*4+0
+.word 0x0020000b*4+0
+.word 0x00100001*4+0
+.word 0x00080001*4+0
+.word 0x00040000*4+0
+.word 0x00020000*4+0
+.word 0x00020000*4+1+2 @ repeat i=12
+
+.endif
+
+qfp_lib_end: