pymarc package

Submodules

pymarc.constants module

Constants for pymarc.

pymarc.exceptions module

Exceptions for pymarc.

exception pymarc.exceptions.BaseAddressInvalid[source]

Bases: pymarc.exceptions.PymarcException

exception pymarc.exceptions.BaseAddressNotFound[source]

Bases: pymarc.exceptions.PymarcException

exception pymarc.exceptions.FieldNotFound[source]

Bases: pymarc.exceptions.PymarcException

exception pymarc.exceptions.NoActiveFile[source]

Bases: pymarc.exceptions.PymarcException

exception pymarc.exceptions.NoFieldsFound[source]

Bases: pymarc.exceptions.PymarcException

exception pymarc.exceptions.PymarcException[source]

Bases: exceptions.Exception

exception pymarc.exceptions.RecordDirectoryInvalid[source]

Bases: pymarc.exceptions.PymarcException

exception pymarc.exceptions.RecordLeaderInvalid[source]

Bases: pymarc.exceptions.PymarcException

exception pymarc.exceptions.RecordLengthInvalid[source]

Bases: pymarc.exceptions.PymarcException

exception pymarc.exceptions.WriteNeedsRecord[source]

Bases: pymarc.exceptions.PymarcException

pymarc.field module

The pymarc.field file.

class pymarc.field.Field(tag, indicators=None, subfields=None, data=u'')[source]

Bases: six.Iterator

Field() pass in the field tag, indicators and subfields for the tag.

field = Field(

tag = ‘245’, indicators = [‘0’,‘1’], subfields = [

‘a’, ‘The pragmatic programmer : ‘, ‘b’, ‘from journeyman to master /’, ‘c’, ‘Andrew Hunt, David Thomas.’,

])

If you want to create a control field, don’t pass in the indicators and use a data parameter rather than a subfields parameter:

field = Field(tag=‘001’, data=’fol05731351’)
add_subfield(code, value)[source]

Adds a subfield code/value pair to the field.

field.add_subfield(‘u’, ‘http://www.loc.gov’)
as_marc(encoding)[source]

used during conversion of a field to raw marc

as_marc21(encoding)

used during conversion of a field to raw marc

delete_subfield(code)[source]

Deletes the first subfield with the specified ‘code’ and returns its value:

field.delete_subfield(‘a’)

If no subfield is found with the specified code None is returned.

format_field()[source]

Returns the field as a string without tag, indicators, and subfield indicators. Like pymarc.Field.value(), but prettier (adds spaces, formats subject headings).

get_subfields(*codes)[source]

get_subfields() accepts one or more subfield codes and returns a list of subfield values. The order of the subfield values in the list will be the order that they appear in the field.

print field.get_subfields(‘a’) print field.get_subfields(‘a’, ‘b’, ‘z’)
is_control_field()[source]

Returns true or false if the field is considered a control field. Control fields lack indicators and subfields.

is_subject_field()[source]

Returns True or False if the field is considered a subject field. Used by format_field.

value()[source]

Returns the field as a string without tag, indicators, and subfield indicators.

class pymarc.field.RawField(tag, indicators=None, subfields=None, data=u'')[source]

Bases: pymarc.field.Field

MARC field that keeps data in raw, undecoded byte strings.

Should only be used when input records are wrongly encoded.

as_marc(encoding=None)[source]

used during conversion of a field to raw marc

pymarc.field.map_marc8_field(f)[source]

pymarc.marc8 module

pymarc marc8.py file.

class pymarc.marc8.MARC8ToUnicode(G0=66, G1=69, quiet=False)[source]

Converts MARC-8 to Unicode. Note that currently, unicode strings aren’t normalized, and some codecs (e.g. iso8859-1) will fail on such strings. When I can require python 2.3, this will go away.

Warning: MARC-8 EACC (East Asian characters) makes some distinctions which aren’t captured in Unicode. The LC tables give the option of mapping such characters either to a Unicode private use area, or a substitute character which (usually) gives the sense. I’ve picked the second, so this means that the MARC data should be treated as primary and the Unicode data used for display purposes only. (If you know of either of fonts designed for use with LC’s private-use Unicode assignments, or of attempts to standardize Unicode characters to allow round-trips from EACC, or if you need the private-use Unicode character translations, please inform me, asl2@pobox.com.

ansel = 69
basic_latin = 66
translate(marc8_string)[source]
pymarc.marc8.marc8_to_unicode(marc8, hide_utf8_warnings=False)[source]

Pass in a string, and get back a Unicode object.

print marc8_to_unicode(record.title())

pymarc.marc8_mapping module

pymarc.marcxml module

pymarc marcxml file.

class pymarc.marcxml.XmlHandler(strict=False, normalize_form=None)[source]

Bases: xml.sax.handler.ContentHandler

You can subclass XmlHandler and add your own process_record method that’ll be passed a pymarc.Record as it becomes available. This could be useful if you want to stream the records elsewhere (like to a rdbms) without having to store them all in memory.

characters(chars)[source]
endElementNS(name, qname)[source]
process_record(record)[source]
startElementNS(name, qname, attrs)[source]
pymarc.marcxml.map_xml(function, *files)[source]

map a function onto the file, so that for each record that is parsed the function will get called with the extracted record

def do_it(r):
print(r)

map_xml(do_it, ‘marc.xml’)

pymarc.marcxml.parse_xml(xml_file, handler)[source]

parse a file with a given subclass of xml.sax.handler.ContentHandler

pymarc.marcxml.parse_xml_to_array(xml_file, strict=False, normalize_form=None)[source]

parse an xml file and return the records as an array. If you would like the parser to explicitly check the namespaces for the MARCSlim namespace use the strict=True option. Valid values for normalize_form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’. See unicodedata.normalize info.

pymarc.marcxml.record_to_xml(record, quiet=False, namespace=False)[source]
pymarc.marcxml.record_to_xml_node(record, quiet=False, namespace=False)[source]

converts a record object to a chunk of xml

# include the marcxml namespace in the root tag (default: False) record_to_xml(record, namespace=True)

pymarc.reader module

class pymarc.reader.JSONReader(marc_target, encoding='utf-8', stream=False)[source]

Bases: pymarc.reader.Reader

class pymarc.reader.MARCReader(marc_target, to_unicode=True, force_utf8=False, hide_utf8_warnings=False, utf8_handling='strict', file_encoding='iso8859-1')[source]

Bases: pymarc.reader.Reader

An iterator class for reading a file of MARC21 records.

Simple usage:

from pymarc import MARCReader

## pass in a file object reader = MARCReader(file(‘file.dat’)) for record in reader:

## pass in marc in transmission format reader = MARCReader(rawmarc) for record in reader:

If you would like to have your Record object contain unicode strings use the to_unicode parameter:

reader = MARCReader(file(‘file.dat’), to_unicode=True)

This will decode from MARC-8 or UTF-8 depending on the value in the MARC leader at position 9.

If you find yourself in the unfortunate position of having data that is utf-8 encoded without the leader set appropriately you can use the force_utf8 parameter:

reader = MARCReader(file(‘file.dat’), to_unicode=True,
force_utf8=True)

If you find yourself in the unfortunate position of having data that is mostly utf-8 encoded but with a few non-utf-8 characters, you can also use the utf8_handling parameter, which takes the same values (‘strict’, ‘replace’, and ‘ignore’) as the Python Unicode codecs (see http://docs.python.org/library/codecs.html for more info).

Although, it’s not legal in MARC-21 to use anything but MARC-8 or UTF-8, but if you have a file in incorrect encode and you know what it is, you can try to use your encode in parameter “file_encoding”.

close()[source]
class pymarc.reader.Reader[source]

Bases: six.Iterator

A base class for all iterating readers in the pymarc package.

pymarc.reader.map_records(f, *files)[source]

Applies a given function to each record in a batch. You can pass in multiple batches.

>>> def print_title(r):
>>>     print(r['245'])
>>>
>>> map_records(print_title, file('marc.dat'))

pymarc.record module

class pymarc.record.Record(data='', to_unicode=True, force_utf8=False, hide_utf8_warnings=False, utf8_handling='strict', leader=' ', file_encoding='iso8859-1')[source]

Bases: six.Iterator

A class for representing a MARC record. Each Record object is made up of multiple Field objects. You’ll probably want to look at the docs for Field to see how to fully use a Record object.

Basic usage:

field = Field(

tag = ‘245’, indicators = [‘0’,‘1’], subfields = [

‘a’, ‘The pragmatic programmer : ‘, ‘b’, ‘from journeyman to master /’, ‘c’, ‘Andrew Hunt, David Thomas.’,

])

record.add_field(field)

Or creating a record from a chunk of MARC in transmission format:

record = Record(data=chunk)

Or getting a record as serialized MARC21.

raw = record.as_marc()

You’ll normally want to use a MARCReader object to iterate through MARC records in a file.

add_field(*fields)[source]

add_field() will add pymarc.Field objects to a Record object. Optionally you can pass in multiple fields.

add_grouped_field(*fields)[source]

add_grouped_field() will add pymarc.Field objects to a Record object, attempting to maintain a loose numeric order per the MARC standard for “Organization of the record” (http://www.loc.gov/marc/96principl.html) Optionally you can pass in multiple fields.

add_ordered_field(*fields)[source]

add_ordered_field() will add pymarc.Field objects to a Record object, attempting to maintain a strict numeric order. Optionally you can pass in multiple fields.

addedentries()[source]

Note: Fields 790-799 are considered “local” added entry fields but occur with some frequency in OCLC and RLIN records.

as_dict()[source]

Turn a MARC record into a dictionary, which is used for as_json.

as_json(**kwargs)[source]

Serialize a record as JSON according to http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/

as_marc()[source]

returns the record serialized as MARC21

as_marc21()

returns the record serialized as MARC21

author()[source]
decode_marc(marc, to_unicode=True, force_utf8=False, hide_utf8_warnings=False, utf8_handling='strict', encoding='iso8859-1')[source]

decode_marc() accepts a MARC record in transmission format as a a string argument, and will populate the object based on the data found. The Record constructor actually uses decode_marc() behind the scenes when you pass in a chunk of MARC data to it.

get_fields(*args)[source]

When passed a tag (‘245’), get_fields() will return a list of all the fields in a record with a given tag.

title = record.get_fields(‘245’)

If no fields with the specified tag are found then an empty list is returned. If you are interested in more than one tag you can pass in a list:

subjects = record.get_fields(‘600’, ‘610’, ‘650’)

If no tag is passed in to get_fields() a list of all the fields will be returned.

isbn()[source]

Returns the first ISBN in the record or None if one is not present. The returned ISBN will be all numeric, except for an x/X which may occur in the checksum position. Dashes and extraneous information will be automatically removed. If you need this information you’ll want to look directly at the 020 field, e.g. record[‘020’][‘a’]

location()[source]
notes()[source]

Return all 5xx fields in an array.

physicaldescription()[source]

Return all 300 fields in an array

publisher()[source]

Note: 264 field with second indicator ‘1’ indicates publisher.

pubyear()[source]
remove_field(*fields)[source]

remove_field() will remove one or more pymarc.Field objects from a Record object.

remove_fields(*tags)[source]

Remove all the fields with the tags passed to the function:

self.remove_fields(‘200’, ‘899’)

will remove all the fields marked with tags ‘200’ or ‘899’.

series()[source]

Note: 490 supersedes the 440 series statement which was both series statement and added entry. 8XX fields are added entries.

subjects()[source]

Note: Fields 690-699 are considered “local” added entry fields but occur with some frequency in OCLC and RLIN records.

sudoc()[source]

Returns a Superintendent of Documents (SuDoc) classification number held in the 086 MARC tag. Classification number will be made up of a variety of dashes, dots, slashes, and colons. More information can be found at the following URL: https://www.fdlp.gov/file-repository/gpo-cataloging/1172-gpo-classification-manual

title()[source]

Returns the title of the record (245 $a an $b).

uniformtitle()[source]
pymarc.record.map_marc8_record(r)[source]

pymarc.writer module

class pymarc.writer.JSONWriter(file_handle)[source]

Bases: pymarc.writer.Writer

A class for writing records as an array of MARC-in-JSON objects.

IMPORTANT: You must the close a JSONWriter, otherwise you will not get valid JSON.

Simple usage:

>>> from pymarc import JSONWriter

### writing to a file
>>> writer = JSONWriter(open('file.json','wt'))
>>> writer.write(record)
>>> writer.close()  # Important!

### writing to a string
>>> string = StringIO()
>>> writer = JSONWriter(string)
>>> writer.write(record)
>>> writer.close(close_fh=False)  # Important!
>>> print string
close(close_fh=True)[source]

Closes the writer.

If close_fh is False close will also close the underlying file handle that was passed in to the constructor. The default is True.

write(record)[source]

Writes a record.

class pymarc.writer.MARCWriter(file_handle)[source]

Bases: pymarc.writer.Writer

A class for writing MARC21 records in transmission format.

Simple usage:

>>> from pymarc import MARCWriter

### writing to a file
>>> writer = MARCWriter(open('file.dat','wb'))
>>> writer.write(record)
>>> writer.close()

### writing to a string (Python 2 only)
>>> string = StringIO()
>>> writer = MARCWriter(string)
>>> writer.write(record)
>>> writer.close(close_fh=False)
>>> print string

### writing to memory (Python 3 only)
>>> memory = BytesIO()
>>> writer = MARCWriter(memory)
>>> writer.write(record)
>>> writer.close(close_fh=False)
write(record)[source]

Writes a record.

class pymarc.writer.TextWriter(file_handle)[source]

Bases: pymarc.writer.Writer

A class for writing records in prettified text MARCMaker format.

A blank line separates each record.

Simple usage:

>>> from pymarc import TextWriter

### writing to a file
>>> writer = TextWriter(open('file.txt','wt'))
>>> writer.write(record)
>>> writer.close()

### writing to a string
>>> string = StringIO()
>>> writer = TextWriter(string)
>>> writer.write(record)
>>> writer.close(close_fh=False)
>>> print string
write(record)[source]

Writes a record.

class pymarc.writer.Writer(file_handle)[source]

Bases: object

close(close_fh=True)[source]

Closes the writer.

If close_fh is False close will also close the underlying file handle that was passed in to the constructor. The default is True.

write(record)[source]
class pymarc.writer.XMLWriter(file_handle)[source]

Bases: pymarc.writer.Writer

A class for writing records as a MARCXML collection.

IMPORTANT: You must then close an XMLWriter, otherwise you will not get a valid XML document.

Simple usage:

>>> from pymarc import XMLWriter

### writing to a file
>>> writer = XMLWriter(open('file.xml','wb'))
>>> writer.write(record)
>>> writer.close()  # Important!

### writing to a string (Python 2 only)
>>> string = StringIO()
>>> writer = XMLWriter(string)
>>> writer.write(record)
>>> writer.close(close_fh=False)  # Important!
>>> print string

### writing to memory (Python 3 only)
>>> memory = BytesIO()
>>> writer = XMLWriter(memory)
>>> writer.write(record)
>>> writer.close(close_fh=False)  # Important!
close(close_fh=True)[source]

Closes the writer.

If close_fh is False close will also close the underlying file handle that was passed in to the constructor. The default is True.

write(record)[source]

Writes a record.

Module contents

The pymarc module provides an API for reading, writing and modifying MARC records. MARC (MAchine Readable Cataloging) is a metadata format for bibliographic data. More about MARC can be found at the Library of Congress: http://lcweb.loc.gov/marc

Below are some common examples of how you might want to use pymarc. If you run across an example that you think should be here please contribute it by writing to the author.

  1. Reading a batch of records and printing out the 245 subfield a. If you are curious this example uses the batch file available in the distribution.

    >>> from pymarc import MARCReader
    >>> reader = MARCReader(open('test/marc.dat', 'rb'))
    >>> for record in reader:
    ...    print record['245']['a']
    The pragmatic programmer :
    Programming Python /
    Learning Python /
    Python cookbook /
    Python programming for the absolute beginner /
    Web programming :
    Python programming on Win32 /
    Python programming :
    Python Web programming /
    Core python programming /
    Python and Tkinter programming /
    Game programming with Python, Lua, and Ruby /
    Python programming patterns /
    Python programming with the Java class libraries :
    Learn to program using Python :
    Programming with Python /
    BSD Sockets programming from a multi-language perspective /
    Design patterns :
    Introduction to algorithms /
    ANSI Common Lisp /
    
  2. Creating a record and writing it out to a file.

    >>> from pymarc import Record, Field
    >>> record = Record()
    >>> record.addField(
    ...     Field(
    ...         tag = '245',
    ...         indicators = ['0','1'],
    ...         subfields = [
    ...             'a', 'The pragmatic programmer : ',
    ...             'b', 'from journeyman to master /',
    ...             'c', 'Andrew Hunt, David Thomas.'
    ...         ]))
    >>> out = open('file.dat', 'wb')
    >>> out.write(record.asMARC21())
    >>> out.close()