PyFITS FAQ

Contents

General Questions

What is PyFITS?

PyFITS is a library written in, and for use with the Python programming language for reading, writing, and manipulating FITS formatted files. It includes a high-level interface to FITS headers with the ability for high and low-level manipulation of headers, and it supports reading image and table data as Numpy arrays. It also supports more obscure and non-standard formats found in some FITS files.

PyFITS includes two command-line utilities for working with FITS files: fitscheck, which can verify and write FITS checksums; and fitsdiff, which can analyze and display the differences between two FITS files.

Although PyFITS is written mostly in Python, it includes an optional module written in C that’s required to read/write compressed image data. However, the rest of PyFITS functions without this extension module.

What is the development status of PyFITS?

PyFITS is written and maintained by the Science Software Branch at the Space Telescope Science Institute, and is licensed by AURA under a 3-clause BSD license (see LICENSE.txt in the PyFITS source code).

PyFITS’ current primary developer and active maintainer is Erik Bray, though patch submissions are welcome from anyone. It has a Trac site where the source code can be browsed, and where bug reports may be submitted. The source code resides primarily in an SVN repository which allows anonymous checkouts, though a Git mirror also exists. PyFITS also has a GitHub site.

The current stable release series is 3.0.x. Each 3.0.x release tries to contain only bug fixes, and to not introduce any significant behavioral or API changes (though this isn’t guaranteed to be perfect). The upcoming 3.1 release will contain new features and some API changes, though will try maintain as much backwards-compatibility as possible. After the 3.1 release there may be further 3.0.x releases for bug fixes only where possible. Older versions of PyFITS (2.4 and earlier) are no longer actively supported.

PyFITS is also included as a major component of upcoming Astropy project as the astropy.io.fits module. The goal is for Astropy to eventually serve as a drop-in replacement for PyFITS (it even includes a legacy-compatibility mode where the astropy.io.fits module can still be imported as pyfits. However, for the time being PyFITS will still be released as an independent product as well, until such time that the Astropy project proves successful and widely-adopted.

Build and Installation Questions

Is PyFITS available on Windows?

Yes–the majority of PyFITS is pure Python, and can be installed and used on any platform that supports Python (>=2.5). However, PyFITS includes an optional C extension module for reading/writing compressed image HDUs. As most Windows systems are not configured to compile C source code, binary installers are also available for Windows. Though PyFITS can also be installed from source even on Windows systems without a compiler by disabling the compression module. See How do I install PyFITS from source on Windows? for more details.

Where is the Windows installer for version X of PyFITS on version Y of Python?

Every official PyFITS build for Windows is eventually uploaded to PyPI. This includes builds for every major Python release from 2.5.x and up, except for 3.0 as there is no official Numpy release for Python 3.0 on Windows. The one binary module included in these builds was linked with Numpy 1.6.1, though it should work with other recent Numpy versions.

Sometimes the Windows binary installers don’t go up immediately after every PyFITS release. But if they appear missing they should go up within another business day or two. This has gotten better with recent releases thanks to some automation.

Why is the PyFITS installation failing on Windows?

The most likely cause of installation failure on Windows is if building/ installing from source fails due to the lack of a compiler for the optional C extension module. Such a failure would produce an error that looks something like:

building 'pyfits.compression' extension
error: Unable to find vcvarsall.bat

Your best bet in cases like this is to install from one of the binary executable installers available for Windows on PyPI. However, there are still cases where you may need to install from source: For example, it’s difficult to use the binary installers with virtualenv. See How do I install PyFITS from source on Windows? for more detailed instructions on building on Windows.

See below for a few answers to other specific questions about Windows installation. For other installation errors not mentioned by this FAQ, please contact help@stsci.edu with a description of the problem.

On Windows Vista or later why can’t the installer find Python in the registry?

This is a common issue with Windows installers for Python packages that do not support the new User Access Control (UAC) framework added in Windows Vista and later. In particular, when a Python is installed “for all users” (as opposed to for a single user) it adds entries for that Python installation under the HKEY_LOCAL_MACHINE (HKLM) hierarchy and not under the HKEY_CURRENT_USER (HKCU) hierarchy. However, depending on your UAC settings, if the PyFITS installer is not executed with elevated privileges it will not be able to check in HKLM for the required information about your Python installation.

In short: If you encounter this problem it’s because you need the appropriate entries in the Windows registry for Python. You can download this script and execute it with the same Python as the one you want to install PyFITS into. For example to add the missing registry entries to your Python 2.7:

C:\>C:\Python27\python.exe C:\Path\To\Downloads\win_register_python.py

How do I install PyFITS from source on Windows?

There are a few options for building/installing PyFITS from source on Windows.

First of all, as mentioned elsewhere, most of PyFITS is pure-Python. Only the C extension module for reading/writing compressed images needs to be compiled. If you don’t need compressed image support, PyFITS can be installed without it.

In future releases this will hopefully be even easier, but for now it’s necessary to edit one file in order to disable the extension module. Locate the setup.cfg file at the root of the PyFITS source code. This is the file that describes what needs to be installed for PyFITS. Find the line that reads [extension=pyfits.compression]. This is the section that lists what needs to be compiled for the extension module. Comment out every line in the extension section by prepending it with a # character (stopping at the [build_ext] line). It should look like this:

...
scripts = scripts/fitscheck

#[extension=pyfits.compression]
#sources =
#    src/compress.c
#    src/fits_hcompress.c
#    src/fits_hdecompress.c
#    src/fitsio.c
#    src/pliocomp.c
#    src/compressionmodule.c
#    src/quantize.c
#    src/ricecomp.c
#    src/zlib.c
#    src/inffast.c
#    src/inftrees.c
#    src/trees.c
#include_dirs = numpy
# Squelch a handful of warnings (which actually cause pip to break in tox and
# other environments due to gcc outputting non-ASCII characters in some
# terminals; see python issue6135)
#extra_compile_args =
#    -Wno-unused-function
#    -Wno-strict-prototypes

[build_ext]
...

With these lines properly commented out, rerun python setup.py install, and it should skip building/installing the compression module. PyFITS will work fine with out it, but will issue warnings when encountering a compressed image that it can’t read.

If you do need to compile the compression module, this can still be done on Windows with just a little extra work. By default, Python tries to compile extension modules with the same compiler that Python itself was compiled with.

To check what compiler Python was built with, the easiest way is to run:

python -c "import platform; print platform.python_compiler()"

For the official builds of recent Python versions this should be something like:

MSC v.1500 32 bit (Intel)

For unofficial Windows distributions of Python, such as ActiveState, EPD, or Cygwin, your mileage may vary.

As it so happens, MSC v.15xx is the compiler version included with Visual C++ 2008. Luckily, Microsoft distributes a free version of this as Visual C++ Express Edition. So for building Python extension modules on Windows this is one of the simpler routes. Just install the free VC++ 2008. It should install a link to the Start Menu at All Programs->Microsoft Visual C++ Express Edition->Visual Studio Tools->Visual Studio 2008 Command Prompt.

If you run that link, it should launch a command prompt with reasonable environment variables set up for using Visual C++. Then change directories to your copy of the PyFITS source code and re-run python setup.py install. You may also need to comment out the extra_compile_args option in the setup.cfg file (its value is the two lines under it after the equal sign). Though the need to manually disable this option for MSC will be fixed in a future PyFITS version.

Another option is to use gcc through MinGW, which is in fact how the PyFITS releases for Windows are currently built. This article provides a good overview of how to set this up: http://seewhatever.de/blog/?p=217

Is PyFITS available for Mac OSX?

Yes, but there is no binary package specifically for OSX (such as a .dmg, for example). For OSX just download, build, and install the source package. This is generally easier on OSX than it is on Windows, thanks to the more developer-friendly environment.

The only major problem with building on OSX seems to occur for some users of 10.7 Lion, with misconfigured systems. See the next question for details on that.

To build PyFITS without the optional compression module, follow the instructions in How do I install PyFITS from source on Windows?.

Why is the PyFITS installation failing on OSX Lion (10.7)?

There is a common problem that affects all Python packages with C extension modules (not just PyFITS) for some users of OSX 10.7. What usually occurs is that when building the package several errors will be output, ending with something like:

unable to execute gcc-4.2: No such file or directory
error: command 'gcc-4.2' failed with exit status 1

There are a few other errors like it that can occur. The problem is that when you build a C extension, by default it will try to use the same compiler that your Python was built with. In this case, since you’re using the 32-bit universal build of Python it’s trying to use the older gcc-4.2 and is trying to build with PPC support, which is no longer supported in Xcode.

In this case the best solution is to install the x86-64 build of Python for OSX (http://www.python.org/ftp/python/2.7.2/python-2.7.2-macosx10.6.dmg for 2.7.2). In fact, this is the build version officially supported for use on Lion. Other, unofficial Python builds such as from MacPorts may also work.

How do I find out what version of PyFITS I have installed?

To output the PyFITS version from the command line, run:

$ python -c 'import pyfits; print(pyfits.__version__)'

When PyFITS is installed with stsci_python, it is also possible to check the installed SVN revision by importing pyfits.svn_version. Then use dir(pyfits.svn_version) to view a list of available attributes. A feature like this will be available soon in standalone versions of PyFITS as well.

How do I run the tests for PyFITS?

Currently the best way to run the PyFITS tests is to download the source code, either from a source release or from version control, and to run the tests out of the source. It is not necessary to install PyFITS to run the tests out of the source code.

The PyFITS tests require nose to run. nose can be installed on any Python version using pip or easy_install. See the nose documentation for more details.

With nose installed, it is simple to run the tests on Python 2.x:

$ python setup.py nosetests

If PyFITS has not already been built, this will build it automatically, then run the tests. This does not cause PyFITS to be installed.

On Python 3.x the situation is a little more complicated. This is due to the fact that PyFITS’ source code is not Python 3-compatible out of the box, but has to be run through the 2to3 converter. Normally when you build/install PyFITS on Python 3.x, the 2to3 conversion is performed automatically. Unfortunately, nose does not know to use the 2to3’d source code, and will instead try to import and test the unconverted source code.

To work around this, it is necessary to first build PyFITS (which will run the source through 2to3):

$ python setup.py build

Then run the nosetests command, but pointing it to the build tree where the 2to3’d source code and tests reside, using the -w switch:

$ python setup.py nosetests -w build/lib.linux-x86_64-3.2

where the exact path of the build/lib.* directory will vary depending on your platform and Python version.

How can I build a copy of the PyFITS documentation for my own use?

First of all, it’s worth pointing out that the documentation for the latest version of PyFITS can always be downloaded in PDF form or browsed online in HTML. There are also plans to make the docs for older versions of PyFITS, as well as up-to-date development docs available online.

Otherwise, to build your own version of the docs either for offline use, or to build the development version of the docs there are a few reqirements. The most import requirement is Sphinx, which is the toolkit used to generate the documentation. Use pip install sphinx or easy_install sphinx to install Sphinx. Using pip or easy_install will install the correct versions of Sphinx’s basic dependencies, which include docutils, Jinja2, and Pygments.

Next, the docs require STScI’s custom Sphinx theme, stsci.sphinxext. It’s a simple pure-Python pacakge and can be installed with pip or easy_install.

The next requirement is numpydoc, which is not normally installed with Numpy itself. Install it with pip or easy_install. Numpy is also required, though it is of course a requirement of PyFITS itself.

Finally, it is necessary to have matplotlib, specifically for matplotlib.sphinxext. This is perhaps the most onerous requirement if you do not already have it instaled. Please refer to the matplotlib documentation for details on downloading and installing matplotlib.

It is also necessary to install PyFITS itself in order to generate the API documentation. For this reason, it is a good idea to install Sphinx and PyFITS into a virtualenv in order to build the development version of the docs (see below).

With all the requirements installed, change directories into the docs/ directory in the PyFITS source code, and run:

$ make html

to build the HTML docs, which will be output to build/html. To build the docs in other formats, please refer to the Sphinx documentation.

To summarize, assuming that you already have Numpy and Matplotlib on your Python installation, perform the following steps from within the PyFITS source code:

$ virtualenv --system-site-packages pyfits-docs
$ source pyfits-docs/bin/activate
$ pip install sphinx
$ pip install numpydoc
$ pip install stsci.sphinxext
$ python setup.py install pyfits
$ cd docs/
$ make html

Usage Questions

Something didn’t work as I expected. Did I do something wrong?

Possibly. But if you followed the documentation and things still did not work as expected, it is entirely possible that there is a mistake in the documentation, a bug in the code, or both. So feel free to report it as a bug. There are also many, many corner cases in FITS files, with new ones discovered almost every week. PyFITS is always improving, but does not support all cases perfectly. There are some features of the FITS format (scaled data, for example) that are difficult to support correctly and can sometimes cause unexpected behavior.

For the most common cases, however, such as reading and updating FITS headers, images, and tables, PyFITS should be very stable and well-tested. Before every PyFITS release it is ensured that all its tests pass on a variety of platforms, and those tests cover the majority of use-cases (until new corner cases are discovered).

PyFITS crashed and output a long string of code. What do I do?

This listing of code is what is knows as a stack trace (or in Python parlance a “traceback”). When an unhandled exception occurs in the code, causing the program to end, this is a way of displaying where the exception occurred and the path through the code that led to it.

As PyFITS is meant to be used as a piece in other software projects, some exceptions raised by PyFITS are by design. For example, one of the most common exceptions is a KeyError when an attempt is made to read the value of a non-existent keyword in a header:

>>> import pyfits
>>> h = pyfits.Header()
>>> h['NAXIS']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/pyfits/header.py", line 125, in __getitem__
    return self._cards[self._cardindex(key)].value
  File "/path/to/pyfits/header.py", line 1535, in _cardindex
    raise KeyError("Keyword %r not found." % keyword)
KeyError: "Keyword 'NAXIS' not found."

This indicates that something was looking for a keyword called “NAXIS” that does not exist. If an error like this occurs in some other software that uses PyFITS, it may indicate a bug in that software, in that it expected to find a keyword that didn’t exist in a file.

Most “expected” exceptions will output a message at the end of the traceback giving some idea of why the exception occurred and what to do about it. The more vague and mysterious the error message in an exception appears, the more likely that it was caused by a bug in PyFITS. So if you’re getting an exception and you really don’t know why or what to do about it, feel free to report it as a bug.

Why does opening a file work in CFITSIO, ds9, etc. but not in PyFITS?

As mentioned elsewhere in this FAQ, there are many unusual corner cases when dealing with FITS files. It’s possible that a file should work, but isn’t support due to a bug. Sometimes it’s even possible for a file to work in an older version of PyFITS, but not a newer version due to a regression that isn’t tested for yet.

Another problem with the FITS format is that, as old as it is, there are many conventions that appear in files from certain sources that do not meet the FITS standard. And yet they are so common-place that it is necessary to support them in any FITS readers. CONTINUE cards are one such example. There are non-standard conventions supported by PyFITS that are not supported by CFITSIO and vice-versa. You may have hit one of those cases.

If PyFITS is having trouble opening a file, a good way to rule out whether not the problem is with PyFITS is to run the file through the fitsverify. For smaller files you can even use the online FITS verifier. These use CFITSIO under the hood, and should give a good indication of whether or not there is something erroneous about the file. If the file is malformatted, fitsverify will output errors and warnings.

If fitsverify confirms no problems with a file, and PyFITS is still having trouble opening it (especially if it produces a traceback) then it’s likely there is a bug in PyFITS.

How do I turn off the warning messages PyFITS keeps outputting to my console?

PyFITS uses Python’s built-in warnings subsystem for informating about exceptional conditions in the code that are recoverable, but that the user may way to be informed of. One of the most common warnings in PyFITS occurs when updating a header value in such a way that the comment must be truncated to preserve space:

Card is too long, comment is truncated.

Any console output generated by PyFITS can be assumed to be from the warnings subsystem. Fortunately there are two easy ways to quiet these warnings:

  1. Using the -W option to the python executable. Just start Python like:

    $ python -Wignore <scriptname>
    

    or for short:

    $ python -Wi <scriptname>
    

    and all warning output will be silenced.

  2. Warnings can be silenced programatically from anywhere within a script. For example, to disable all warnings in a script, add something like:

    import warnings
    warnings.filterwarnings('ignore')
    

Another option, instead of ignore is once, which causes any warning to be output only once within the session, rather than repeatedly (such as in a loop). There are many more ways to filter warnings with -W and the warnings module. For example, it is possible to silence only specific warning messages. Please refer to the Python documentation for more details, or ask at help@stsci.edu.

How can I check if my code is using deprecated PyFITS features?

PyFITS 3.0 included a major reworking of the code and some of the APIs. Most of the differences are just renaming functions to use a more consistent naming scheme. For example the createCard() function was renamed to create_card() for consistency with a lower_case_underscore naming scheme for functions.

There are a few other functions and attributes that were deprecated either because they were renamed to something simpler or more consistent, or because they were redundant or replaced.

Eventually all deprecated features will be removed in future PyFITS versions (though there will be significant warningsin advance). It is best to check whether your code is using deprecatd features sooner rather than later.

On Python 2.5, all deprecation warnigns are displayed by default, so you may have already discovered them. However, on Python 2.6 and up, deprecation warnings are not displayed by default. To show all deprecation warnings, start Python like:

$ python -Wd <scriptname>

Most deprecation issues can be fixed with a simple find/replace. The warnings displayed will let you know how to replace the old interface.

If you have a lot of old code that was written for older versions of PyFITS it would be worth doing this. PyFITS 3.1 introduces a significant rewrite of the Header interface, and contains even more deprecations.

What convention does PyFITS use for indexing, such as of image coordinates?

All arrays and sequences in PyFITS use a zero-based indexing scheme. For example, the first keyword in a header is header[0], not header[1]. This is in accordance with Python itself, as well as C, on which PyFITS is based.

This may come as a surprise to veteran FITS users coming from IRAF, where 1-based indexing is typically used, due to its origins in FORTRAN.

Likewise, the top-left pixel in an N x N array is data[0,0]. The indices for 2-dimensional arrays are row-major order, in that the first index is the row number, and the second index is the column number. Or put in terms of axes, the first axis is the y-axis, and the second axis is the x-axis. This is the opposite of column-major order, which is used by FORTRAN and hence FITS. For example, the second index refers to the axis specified by NAXIS1 in the FITS header.

In general, for N-dimensional arrays, row-major orders means that the right-most axis is the one that varies the fastest while moving over the array data linearly. For example, the 3-dimensional array:

[[[1, 2],
  [3, 4]],
 [[5, 6],
  [7, 8]]]

is represented linearly in row-major order as:

[1, 2, 3, 4, 5, 6, 7, 8]

Since 2 immediately follows 1, you can see that the right-most (or inner-most) axis is the one that varies the fastest.

The discrepancy in axis-ordering may take some getting used to, but it is a necessary evil. Since most other Python and C software assumes row-major ordering, trying to enforce column-major ordering in arrays returned by PyFITS is likely to cause more difficulties than it’s worth.

How do I open a very large image that won’t fit in memory?

Prior to PyFITS 3.1, when the data portion of an HDU is accessed, the data is read into memory in its entirety. For example:

>>> hdul = pyfits.open('myimage.fits')
>>> hdul[0].data
...

reads the entire image array from disk into memory. For very large images or tables this is clearly undesirable, if not impossible given the available resources.

However, pyfits.open() has an option to access the data portion of an HDU by memory mapping using mmap. What this means is that accessing the data as in the example above only reads portions of the data into memory on demand. For example, if I request just a slice of the image, such as hdul[0].data[100:200], then just rows 100-200 will be read into memory. This happens transparently, as though the entire image were already in memory. This works the same way for tables. For most cases this is your best bet for working with large files.

To use memory mapping, just add the memmap=True argument to pyfits.open().

In PyFITS 3.1, the mmap support is improved enough that memmap=True is the default for all pyfits.open() calls. The default can also be controlled through an environment variable called PYFITS_USE_MEMMAP. Setting this to 0 will disable mmap by default.

Unfortunately, memory mapping does not currently work as well with scaled image data, where BSCALE and BZERO factors need to be applied to the data to yield physical values. Currently this requires enough memory to hold the entire array, though this is an area that will see improvement in the future.

An alternative, which currently only works for image data (that is, non-tables) is the sections interface. It is largely replaced by the better support for memmap, but may still be useful on systems with more limited virtual-memory space, such as on 32-bit systems. Support for scaled image data is flakey with sections too, though that will be fixed. See the PyFITS documentation for more details on working with sections.

How can I create a very large FITS file from scratch?

This is a very common issue, but unfortunately PyFITS does not come with any built-in facilities for creating large files (larger than will fit in memory) from scratch (though it may in the future).

Normally to create a single image FITS file one would do something like:

>> data = numpy.zeros((40000, 40000), dtype=numpy.float64)
>> hdu = pyfits.PrimaryHDU(data=data)
>> hdu.writeto('large.fits')

However, a 40000 x 40000 array of doubles is nearly twelve gigabytes! Most systems won’t be able to create that in memory just to write out to disk. In order to create such a large file efficiently requires a little extra work, and a few assumptions.

First, it is helpful to anticpate about how large (as in, how many keywords) the header will have in it. FITS headers must be written in 2880 byte blocks–large enough for 36 keywords per block (including the END keyword in the final block). Typical headers have somewhere between 1 and 4 blocks, though sometimes more.

Since the first thing we write to a FITS file is the header, we want to write enough header blocks so that there is plenty of padding in which to add new keywords without having to resize the whole file. Say you want the header to use 4 blocks by default. Then, excluding the END card which PyFITS will add automatically, create the header and pad it out to 36 * 4 cards like so:

>>> data = numpy.zeros((100, 100), dtype=numpy.float64)
# This is a stub array that we'll be using the initialize the HDU; its
# exact size is irrelevant, as long as it has the desired number of
# dimensions
>>> hdu = pyfits.PrimaryHDU(data=data)
>>> header = hdu.header
>>> while len(header) < (36 * 4 - 1):
...     header.append()  # Adds a blank card to the end

Now adjust the NAXISn keywords to the desired size of the array, and write only the header out to a file. Using the hdu.writeto() method will cause PyFITS to “helpfully” reset the NAXISn keywords to match the size of the dummy array:

>>> header['NAXIS1'] = 40000
>>> header['NAXIS2'] = 40000
>>> header.tofile('large.fits')

Finally, we need to grow out the end of the file to match the length of the data (plus the length of the header). This can be done very efficiently on most systems by seeking past the end of the file and writing a single byte, like so:

>>> with open('large.fits', 'rb+') as fobj:
...     fobj.seek(len(header.tostring()) + (40000 * 40000 * 8) - 1)
...     # The -1 is to account for the final byte that we are about to
...     # write
...     fobj.write('\0')

On modern operating systems this will cause the file (past the header) to be filled with zeros out to the ~12GB needed to hold a 40000 x 40000 image. On filesystems that support sparse file creation (most Linux filesystems, but not HFS+) this is a very fast, efficient operation. On other systems your mileage may vary.

This isn’t the only way to build up a large file, but probably one of the safest. This method can also be used to create large multi-extension FITS files, with a little care.

For creating very large tables, this method may also be used. Though it can be difficult to determine ahead of time how many rows a table will need. In general, use of PyFITS is discouraged for the creation and manipulation of large tables. The FITS format itself is not designed for efficient on-disk or in-memory manipulation of table structures. For large, heavy-duty table data it might be better too look into using HDF5 through the PyTables library.

PyTables makes use of Numpy under the hood, and can be used to write binary table data to disk in the same format required by FITS. It is then possible to serialize your table to the FITS format for distribution. At some point this FAQ might provide an example of how to do this.

How do I create a multi-extension FITS file from scratch?

When you open a FITS file with pyfits.open(), a pyfits.HDUList object is returned, which holds all the HDUs in the file. This HDUList class is a subclass of Python’s builtin list, and can be created from scratch and used as such:

>>> new_hdul = pyfits.HDUList()
>>> new_hdul.append(pyfits.ImageHDU())
>>> new_hdul.append(pyfits.ImageHDU())
>>> new_hdul.writeto('test.fits')

That will create a new multi-extension FITS file with two empty IMAGE extensions (a default PRIMARY HDU is prepended automatically if one was not provided manually).

Why is an image containing integer data being converted unexpectedly to floats?

If the header for your image contains non-trivial values for the optional BSCALE and/or BZERO keywords (that is, BSCALE != 1 and/or BZERO != 0), then the raw data in the file must be rescaled to its physical values according to the formula:

physical_value = BZERO + BSCALE * array_value

As BZERO and BSCALE are floating point values, the resulting value must be a float as well. If the original values were 16-bit integers, the resulting values are single-precision (32-bit) floats. If the original values were 32-bit integers the resulting values are double-precision (64-bit floats).

This automatic scaling can easily catch you of guard if you’re not expecting it, because it doesn’t happen until the data portion of the HDU is accessed (to allow things like updating the header without rescaling the data). For example:

>>> hdul = pyfits.open('scaled.fits')
>>> image = hdul['SCI', 1]
>>> image.header['BITPIX']
32
>>> image.header['BSCALE']
2.0
>>> data = image.data  # Read the data into memory
>>> data.dtype
dtype('float64')  # Got float64 despite BITPIX = 32 (32-bit int)
>>> image.header['BITPIX']  # The BITPIX will automatically update too
-64
>>> 'BSCALE' in image.header  # And the BSCALE keyword removed
False

The reason for this is that once a user accesses the data they may also manipulate it and perform calculations on it. If the data were forced to remain as integers, a great deal of precision is lost. So it is best to err on the side of not losing data, at the cost of causing some confusion at first.

If the data must be returned to integers before saving, use the scale() method:

>>> image.scale('int32')
>>> image.header['BITPIX']
32

Alternatively, if a file is opened with mode='update' along with the scale_back=True argument, the original BSCALE and BZERO scaling will be automatically re-applied to the data before saving. Usually this is not desirable, especially when converting from floating point back to unsigned integer values. But this may be useful in cases where the raw data needs to be modified corresponding to changes in the physical values.

To prevent rescaling from occurring at all (good for updating headers–even if you don’t intend for the code to access the data, it’s good to err on the side of caution here), use the do_not_scale_image_data argument when opening the file:

>>> hdul = pyfits.open('scaled.fits', do_not_scale_image_data=True)
>>> image = hdul['SCI', 1]
>>> image.data.dtype
dtype('int32')

Why am I losing precision when I assign floating point values in the header?

The FITS standard allows two formats for storing floating-point numbers in a header value. The “fixed” format requires the ASCII representation of the number to be in bytes 11 through 30 of the header card, and to be right-justified. This leaves a standard number of characters for any comment string.

The fixed format is not wide enough to represent the full range of values that can be stored in a 64-bit float with full precision. So FITS also supports a “free” format in which the ASCII representation can be stored anywhere, using the full 70 bytes of the card (after the keyword).

Currently PyFITS only supports writing fixed format (it can read both formats), so all floating point values assigned to a header are stored in the fixed format. There are plans to add support for more flexible formatting.

In the meantime it is possible to add or update cards by manually formatting the card image:

>>> c = pyfits.Card.fromstring('FOO     = 1234567890.123456789')
>>> h = pyfits.Header()
>>> h.append(c)
>>> h
FOO     = 1234567890.123456789

As long as you don’t assign new values to ‘FOO’ via h['FOO'] = 123, PyFITS will maintain the header value exactly as you formatted it.

I’m opening many FITS files in a loop and getting OSError: Too many open files

Say you have some code like:

>>> import pyfits
>>> for filename in filenames:
...     hdul = pyfits.open(filename)
...     for hdu in hdul:
...         hdu_data = hdul.data
...         # Do some stuff with the data
...     hdul.close()
...

The details may differ, but the qualitative point is that the data to many HDUs and/or FITS files are being accessed in a loop. This may result in an exception like:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
OSError: [Errno 24] Too many open files: 'my_data.fits'

As explained in the note on working with large files, because PyFITS uses mmap by default to read the data in a FITS file, even if you correctly close a file with HDUList.close a handle is kept open to that file so that the memory-mapped data array can still be continued to be read transparently.

The way Numpy supports mmap is such that the file mapping is not closed until the overlying ndarray object has no references to it and is freed memory. However, when looping over a large number of files (or even just HDUs) rapidly, this may not happen immediately. Or in some cases if the HDU object persists, the data array attached to it may persist too. The easiest workaround is to manually delete the .data attribute on the HDU object so that the ndarray reference is freed and the mmap can be closed:

>>> for filename in filenames:
...     hdul = pyfits.open(filename)
...     for hdu in hdul:
...         hdu_data = hdul.data
...         # Do some stuff with the data
...         # ...
...         # Don't need the data anymore; delete all references to it
...         # so that it can be garbage collected
...         del hdu_data
...         del hdu.data
...     hdul.close()
...

In some extreme cases files are opened and closed fast enough that Python’s garbage collector does not free them (and hence free the file handles) often enough. To mitigate this your code can manually force a garbage collection by calling gc.collect() at the end of the loop.

In a future release it will be easier to automatically perform this sort of cleanup when closing FITS files, where needed.