Pymarc

Release v

Pymarc is a Python 3 library for working with bibliographic data encoded in MARC21.

Starting with version 5.0.0 it requires python 3.7 and up. It provides an API for reading, writing and modifying MARC records. It was mostly designed to be an emergency eject seat, for getting your data assets out of MARC and into some kind of saner representation. However over the years it has been used to create and modify MARC records, since despite repeated calls for it to die as a format, MARC seems to be living quite happily as a zombie.

Below are some common examples of how you might want to use pymarc. If you run across an example that you think should be here please send a pull request.

Reading

Most often you will have some MARC data and will want to extract data from it. Here’s an example of reading a batch of records and printing out the title. If you are curious this example uses the batch file available here in pymarc repository:

from pymarc import MARCReader

with open('test/marc.dat', 'rb') as fh:
    reader = MARCReader(fh)
    for record in reader:
        print(record.title)

The pragmatic programmer : from journeyman to master /
Programming Python /
Learning Python /
Python cookbook /
Python programming for the absolute beginner /
Web programming : techniques for integrating Python, Linux, Apache, and MySQL /
Python programming on Win32 /
Python programming : an introduction to computer science /
Python Web programming /
Core python programming /
Python and Tkinter programming /
Game programming with Python, Lua, and Ruby /
Python programming patterns /
Python programming with the Java class libraries : a tutorial for building Web
and Enterprise applications /
Learn to program using Python : a tutorial for hobbyists, self-starters, and all
who want to learn the art of computer programming /
Programming with Python /
BSD Sockets programming from a multi-language perspective /
Design patterns : elements of reusable object-oriented software /
Introduction to algorithms /
ANSI Common Lisp /

Sometimes MARC data contains an errors of some kind. In this case reader returns None instead of record object and two reader’s properties current_exception and current_chunk can help the user to take a corrective action and continue or stop the reading:

from pymarc import MARCReader
from pymarc import exceptions as exc

with open('test/marc.dat', 'rb') as fh:
    reader = MARCReader(fh)
    for record in reader:
        if record:
            # consume the record:
            print(record.title)
        elif isinstance(reader.current_exception, exc.FatalReaderError):
            # data file format error
            # reader will raise StopIteration
            print(reader.current_exception)
            print(reader.current_chunk)
        else:
            # fix the record data, skip or stop reading:
            print(reader.current_exception)
            print(reader.current_chunk)
            # break/continue/raise

FatalReaderError happens when reader can’t determine record’s boundaries in the data stream. To avoid data misinterpretation it stops. In case of other errors (wrong encodind etc.) reader continues to the next record.

A pymarc.Record object has a few handy properties like title for getting at bits of a bibliographic record, others include: author, isbn, subjects, location, notes, physicaldescription, publisher, pubyear. But really, to work with MARC data you need to understand the numeric field tags and subfield codes that are used to designate various bits of information. There is a lot more hiding in a MARC record than these properties provide access to. For example the title property extracts the information from the 245 field, subfields a and b. You can access 245a like so:

print(record['245']['a'])

Some fields like subjects can repeat. In cases like that you will want to use get_fields to get all of them as pymarc.Field objects, which you can then interact with further:

for f in record.get_fields('650'):
    print(f)

If you are new to MARC fields, “Understanding MARC” (http://www.loc.gov/marc/umb/) is a pretty good primer, and the “MARC 21 Formats” (http://www.loc.gov/marc/marcdocz.html) page at the Library of Congress is a good reference once you understand the basics.

Note: New in v5.0.0, Subfield is used to create subfields. Prior to v5, subfields were constructed as a list of strings, e.g., [code, value, code, value]. This has been changed to organize the subfields into a list of tuples, e.g., [(code, value), (code, value)]. The Subfield is implemented as a NamedTuple so that the tuples can be constructed as Subfield(code=code, value=value). See the

code below for an example of how this is used.

The old style of creating subfields is no longer supported. Passing a list of strings to the subfields parameter for the Field constructor will raise a ValueError.

For convenience, a class method is provided to convert the legacy list of strings into a list of `Subfield`s. An example of how to do this is given below.

Writing

Here’s an example of creating a record and writing it out to a file.

from pymarc import Record, Field, Subfield, Indicators

record = Record()
record.add_field(
    Field(
        tag = '245',
        indicators = Indicators('0','1'),
        subfields = [
            Subfield(code='a', value='The pragmatic programmer : '),
            Subfield(code='b', value='from journeyman to master /'),
            Subfield(code='c', value='Andrew Hunt, David Thomas.')
        ]))
with open('file.dat', 'wb') as out:
    out.write(record.as_marc())

Updating

Updating works the same way, you read it in, modify it, and then write it out again:

from pymarc import MARCReader

with open('test/marc.dat', 'rb') as fh:
   reader = MARCReader(fh)
   record = next(reader)
   record['245']['a'] = 'The Zombie Programmer : '
with open('file.dat', 'wb') as out:
   out.write(record.as_marc())

JSON, XML and Text

If you find yourself using MARC data a fair bit, and distributing it, you may make other developers a bit happier by using the JSON or XML serializations. pymarc has support for both. The main benefit here is that the UTF8 character encoding is used, rather than the frustratingly archaic MARC8 encoding. Also they will be able to use JSON and XML tools to get at the data they want instead of some crazy MARC processing library like, ahem, pymarc.

MARCMaker is also supported as a reading and writing format.

API Docs

Reader

Pymarc Reader.

class pymarc.reader.JSONReader(marc_target: bytes | str, encoding: str = 'utf-8', stream: bool = False)[source]

Bases: Reader

JSON Reader.

file_handle: IO

class pymarc.reader.MARCMakerReader(target: bytes | str, encoding: str = 'utf-8')[source]

Bases: Reader

MARCMaker Reader.

Converts a MARCMaker textual representation of a Marc 21 record into a pymarc Record. see Record.__str__() for more information.

Simple usage:

from pymarc import MARCMakerReader

## pass in a file object
reader = MARCMakerReader(open('file.mrk', 'r'))
for record in reader:
    ...

## pass a string
reader = MARCReader("=LDR xxx\n=022  ##$a0000-0000\n\n=LDR yyy")
for record in reader:
    ...

class pymarc.reader.MARCReader(marc_target: BinaryIO | bytes, to_unicode: bool = True, force_utf8: bool = False, hide_utf8_warnings: bool = False, utf8_handling: str = 'strict', file_encoding: str = 'iso8859-1', permissive: bool = False)[source]

Bases: Reader

An iterator class for reading a file of MARC21 records.

Simple usage:

from pymarc import MARCReader

## pass in a file object
reader = MARCReader(open('file.dat', 'rb'))
for record in reader:
    ...

## pass in marc in transmission format
reader = MARCReader(rawmarc)
for record in reader:
    ...

If you would like to have your Record object contain unicode strings use the to_unicode parameter:

reader = MARCReader(open('file.dat', 'rb'), to_unicode=True)

This will decode from MARC-8 or utf-8 depending on the value in the MARC leader at position 9. Upon serialization of the Record object to MARC21, the resulting output will be utf-8 encoded and the value in the MARC leader at position 9 will be set appropriately to indicate the change of character encoding.

If you find yourself in the unfortunate position of having data that is utf-8 encoded without the leader set appropriately you can use the force_utf8 parameter:

reader = MARCReader(open('file.dat', 'rb'), to_unicode=True,
    force_utf8=True)

If you find yourself in the unfortunate position of having data that is mostly utf-8 encoded but with a few non-utf-8 characters, you can also use the utf8_handling parameter, which takes the same values (‘strict’, ‘replace’, and ‘ignore’) as the Python Unicode codecs (see http://docs.python.org/library/codecs.html for more info).

Although, it’s not legal in MARC-21 to use anything but MARC-8 or UTF-8, but if you have a file in incorrect encode and you know what it is, you can try to use your encode in parameter “file_encoding”.

MARCReader parses data in a permissive way and gives the user full control on what to do in case wrong record is encountered. Whenever any error is found reader returns None instead of regular record object. The exception information and corresponding data are available through reader.current_exception and reader.current_chunk properties:

reader = MARCReader(open('file.dat', 'rb'))
for record in reader:
    if record is None:
        print(
            "Current chunk: ",
            reader.current_chunk,
            " was ignored because the following exception raised: ",
            reader.current_exception
        )
    else:
        # do something with record

close() → None[source]: Close the handle.

property current_chunk: Current chunk.

property current_exception: Current exception.

file_handle: IO

class pymarc.reader.Reader[source]

Bases: object

A base class for all iterating readers in the pymarc package.

pymarc.reader.map_records(f: Callable, *files: BytesIO) → None[source]

Applies a given function to each record in a batch.

You can pass in multiple batches.

def print_title(r):
    print(r['245'])
map_records(print_title, file('marc.dat'))

Record

Pymarc Record.

class pymarc.record.Record(data: str = '', to_unicode: bool = True, force_utf8: bool = False, hide_utf8_warnings: bool = False, utf8_handling: str = 'strict', leader: str = ' ', file_encoding: str = 'iso8859-1')[source]

Bases: object

A class for representing a MARC record.

Each Record object is made up of multiple Field objects. You’ll probably want to look at the docs for Field to see how to fully use a Record object.

Basic usage:

field = Field(
    tag = '245',
    indicators = Indicators('0','1'),
    subfields = [
        Subfield(code='a', value='The pragmatic programmer : '),
        Subfield(code='b', value='from journeyman to master /'),
        Subfield(code='c', value='Andrew Hunt, David Thomas.'),
    ])

record.add_field(field)

Or creating a record from a chunk of MARC in transmission format:

record = Record(data=chunk)

Or getting a record as serialized MARC21.

raw = record.as_marc()

You’ll normally want to use a MARCReader object to iterate through MARC records in a file.

add_field(*fields)[source]

Add pymarc.Field objects to a Record object.

Optionally you can pass in multiple fields.

add_grouped_field(*fields) → None[source]

Add pymarc.Field objects to a Record object and sort them “grouped”.

Which means, attempting to maintain a loose numeric order per the MARC standard for “Organization of the record” (http://www.loc.gov/marc/96principl.html). Optionally you can pass in multiple fields.

add_ordered_field(*fields) → None[source]

Add pymarc.Field objects to a Record object and sort them “ordered”.

Which means, attempting to maintain a strict numeric order. Optionally you can pass in multiple fields.

property addedentries: List[Field]

Returns Added entries fields.

Note: Fields 790-799 are considered “local” added entry fields but occur with some frequency in OCLC and RLIN records.

as_dict() → Dict[str, str][source]: Turn a MARC record into a dictionary, which is used for as_json.

as_json(**kwargs) → str[source]

Serialize a record as JSON.

See: https://web.archive.org/web/20151112001548/http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json

as_marc() → bytes[source]: Returns the record serialized as MARC21.

as_marc21() → bytes: Returns the record serialized as MARC21.

property author: str | None: Returns the author from field 100, 110 or 111.

decode_marc(marc, to_unicode: bool = True, force_utf8: bool = False, hide_utf8_warnings: bool = False, utf8_handling: str = 'strict', encoding: str = 'iso8859-1') → None[source]

Populate the object based on the marc` record in transmission format.

The Record constructor actually uses decode_marc() behind the scenes when you pass in a chunk of MARC data to it.

fields: List

force_utf8: bool

get(tag: str, default: Field | None = None) → Field | None[source]

Implements a dict-like get with a default value.

If tag is not found, then the default value will be returned. The default value should be a Field instance.

get_fields(*args) → List[Field][source]

Return a list of all the fields in a record tags matching args.

title = record.get_fields('245')

If no fields with the specified tag are found then an empty list is returned. If you are interested in more than one tag you can pass it as multiple arguments.

subjects = record.get_fields('600', '610', '650')

If no tag is passed in to get_fields() a list of all the fields will be returned.

get_linked_fields(field: Field) → List[Field][source]: Given a field that is not an 880, retrieve a list of any linked 880 fields.

property isbn: str | None

Returns the first ISBN in the record or None if one is not present.

The returned ISBN will be all numeric, except for an x/X which may occur in the checksum position. Dashes and extraneous information will be automatically removed. If you need this information you’ll want to look directly at the 020 field, e.g. record[‘020’][‘a’]. Values that do not match the regex will not be returned.

property issn: str | None: Returns the ISSN number [022][‘a’] in the record or None.

property issn_title: str | None: Returns the key title of the record (222 $a and $b).

property issnl: str | None: Returns the ISSN-L number [022][‘l’] of the record or None.

leader: Any

property location: List[Field]: Returns location field (852).

property notes: List[Field]: Return notes fields (all 5xx fields).

property physicaldescription: List[Field]: Return physical description fields (300).

pos: int

property publisher: str | None

Return publisher from 260 or 264.

Note: 264 field with second indicator ‘1’ indicates publisher.

property pubyear: str | None: Returns publication year from 260 or 264.

remove_field(*fields) → None[source]: Remove one or more pymarc.Field objects from a Record object.

remove_fields(*tags) → None[source]

Remove all the fields with the tags passed to the function.

# remove all the fields marked with tags '200' or '899'.
self.remove_fields('200', '899')

property series: List[Field]

Returns series fields.

Note: 490 supersedes the 440 series statement which was both series statement and added entry. 8XX fields are added entries.

property subjects: List[Field]

Returns subjects fields.

Note: Fields 690-699 are considered “local” added entry fields but occur with some frequency in OCLC and RLIN records.

property sudoc: str | None

Returns a Superintendent of Documents (SuDoc) classification number.

Note: More information can be found at the following URL: https://www.fdlp.gov/classification-guidelines/introduction-to-the-classification-guidelines

property title: str | None: Returns the title of the record (245 $a and $b).

to_unicode: bool

property uniformtitle: str | None: Returns the uniform title from field 130 or 240.

pymarc.record.map_marc8_record(record: Record) → Record[source]: Map MARC-8 record.

pymarc.record.normalize_subfield_code(subfield) → Tuple[Any, int][source]: Normalize subfield code.

Writer

Pymarc Writer.

class pymarc.writer.JSONWriter(file_handle: IO)[source]

Bases: Writer

A class for writing records as an array of MARC-in-JSON objects.

IMPORTANT: You must the close a JSONWriter, otherwise you will not get valid JSON.

Simple usage:

.. code-block:: python

from pymarc import JSONWriter

# writing to a file writer = JSONWriter(open(‘file.json’,’wt’)) writer.write(record) writer.close() # Important!

# writing to a string string = StringIO() writer = JSONWriter(string) writer.write(record) writer.close(close_fh=False) # Important! print(string)

close(close_fh: bool = True) → None[source]

Closes the writer.

If close_fh is False close will also close the underlying file handle that was passed in to the constructor. The default is True.

write(record: Record) → None[source]: Writes a record.

class pymarc.writer.MARCWriter(file_handle: IO)[source]

Bases: Writer

A class for writing MARC21 records in transmission format.

Simple usage:

.. code-block:: python

from pymarc import MARCWriter

# writing to a file writer = MARCWriter(open(‘file.dat’,’wb’)) writer.write(record) writer.close()

# writing to a string (Python 2 only) string = StringIO() writer = MARCWriter(string) writer.write(record) writer.close(close_fh=False) print(string)

# writing to memory (Python 3 only)

memory = BytesIO() writer = MARCWriter(memory) writer.write(record) writer.close(close_fh=False)

write(record: Record) → None[source]: Writes a record.

class pymarc.writer.TextWriter(file_handle: IO)[source]

Bases: Writer

A class for writing records in prettified text MARCMaker format.

A blank line separates each record.

Simple usage:

from pymarc import TextWriter

# writing to a file
writer = TextWriter(open('file.txt','wt'))
writer.write(record)
writer.close()

# writing to a string
string = StringIO()
writer = TextWriter(string)
writer.write(record)
writer.close(close_fh=False)
print(string)

write(record: Record) → None[source]: Writes a record.

class pymarc.writer.Writer(file_handle: IO)[source]

Bases: object

Base Writer object.

close(close_fh: bool = True) → None[source]

Closes the writer.

If close_fh is False close will also close the underlying file handle that was passed in to the constructor. The default is True.

write(record: Record) → None[source]: Write.

class pymarc.writer.XMLWriter(file_handle: IO)[source]

Bases: Writer

A class for writing records as a MARCXML collection.

IMPORTANT: You must then close an XMLWriter, otherwise you will not get a valid XML document.

Simple usage:

from pymarc import XMLWriter

# writing to a file
writer = XMLWriter(open('file.xml','wb'))
writer.write(record)
writer.close()  # Important!

# writing to a string (Python 2 only)
string = StringIO()
writer = XMLWriter(string)
writer.write(record)
writer.close(close_fh=False)  # Important!
print(string)

# writing to memory (Python 3 only)
memory = BytesIO()
writer = XMLWriter(memory)
writer.write(record)
writer.close(close_fh=False)  # Important!

close(close_fh: bool = True) → None[source]

Closes the writer.

If close_fh is False close will also close the underlying file handle that was passed in to the constructor. The default is True.

write(record: Record) → None[source]: Writes a record.

Field

The pymarc field file.

class pymarc.field.Field(tag: str, indicators: Indicators | None = None, subfields: List[Subfield] | None = None, data: str | None = None)[source]

Bases: object

Field() pass in the field tag, indicators and subfields for the tag.

field = Field(
    tag = '245',
    indicators = Indicators('0','1'),
    subfields = [
        Subfield(code='a', value='The pragmatic programmer : '),
        Subfield(code='b', value='from journeyman to master /'),
        Subfield(code='c', value='Andrew Hunt, David Thomas.'),
    ])

If you want to create a control field, don’t pass in the indicators and use a data parameter rather than a subfields parameter:

field = Field(tag='001', data='fol05731351')

add_subfield(code: str, value: str, pos=None) → None[source]

Adds a subfield code/value to the end of a field or at a position (pos).

If pos is not supplied or out of range, the subfield will be added at the end.

If the field is a control field, nothing will happen.

field.add_subfield('u', 'http://www.loc.gov')
field.add_subfield('u', 'http://www.loc.gov', 0)

as_marc(encoding: str) → bytes[source]: Used during conversion of a field to raw marc.

as_marc21(encoding: str) → bytes: Used during conversion of a field to raw marc.

control_field: bool

classmethod convert_legacy_subfields(subfields: List[str]) → List[Subfield][source]

Converts older-style subfield lists into Subfield lists.

Converts the old-style list of strings into a list of Subfields. As a class method this does not actually set any fields; it simply takes a list of strings and returns a list of Subfields.

legacy_fields: list[str] = ['a', 'The pragmatic programmer : ',
                            'b', 'from journeyman to master /',
                            'c', 'Andrew Hunt, David Thomas']

coded_fields: list[Subfield] = Field.convert_legacy_subfields(legacy_fields)

myfield = Field(
    tag="245",
    indicators = ['0','1'],
    subfields=coded_fields
)

Parameters:: subfields – A list of [code, value, code, value]
Returns:: A list of Subfield named tuples

data: str | None

delete_subfield(code: str) → str | None[source]

Deletes the first subfield with the specified ‘code’ and returns its value.

value = field.delete_subfield('a')

If no subfield is found with the specified code None is returned.

format_field() → str[source]

Returns the field’s subfields (or data in the case of control fields) as a string.

Like Field.value(), but prettier (adds spaces, formats subject headings).

get(code: str, default=None)[source]

A dict-like get method with a default value.

Implements a non-raising getter for a subfield code that will return the value of the first subfield whose code is key. Returns the default value if the field is a control field.

get_subfields(*codes) → List[str][source]

Get subfields matching codes.

get_subfields() accepts one or more subfield codes and returns a list of subfield values. The order of the subfield values in the list will be the order that they appear in the field.

print(field.get_subfields('a'))
print(field.get_subfields('a', 'b', 'z'))

property indicator1: str

Indicator 1.

Returns an empty string if this is a control field.

property indicator2: str

Indicator 2.

Returns an empty string if this is a control field.

property indicators: Indicators | None: Return the field’s indicators.

is_control_field() → bool[source]

Returns true or false if the field is considered a control field.

Prefer using the control_field property directly instead of this, which has been retained for legacy compatibility.

Control fields lack indicators and subfields.

is_subject_field() → bool[source]

Returns True or False if the field is considered a subject field.

Used by format_field() .

linkage_occurrence_num() → str | None[source]: Return the ‘occurrence number’ part of subfield 6, or None if not present.

subfields: List[Subfield]

subfields_as_dict() → Dict[str, List][source]

Returns the subfields as a dictionary.

Returns an empty dictionary if the field is a control field.

The dictionary is a mapping of subfield codes and values. Since subfield codes can repeat the values are a list.

tag

value() → str[source]: Returns the field’s subfields (or data in the case of control fields) as a string.

class pymarc.field.Indicators(first: str, second: str)[source]

Bases: NamedTuple

A named tuple representing the indicators for a non-control field.

first: str: Alias for field number 0

second: str: Alias for field number 1

class pymarc.field.RawField(tag: str, indicators: Indicators | None = None, subfields: List[Subfield] | None = None, data: str | None = None)[source]

Bases: Field

MARC field that keeps data in raw, un-decoded byte strings.

Should only be used when input records are wrongly encoded.

as_marc(encoding: str | None = None)[source]: Used during conversion of a field to raw MARC.

control_field: bool

data: str | None

subfields: List[Subfield]

tag

class pymarc.field.Subfield(code, value)

Bases: tuple

code: str: Alias for field number 0

value: str: Alias for field number 1

pymarc.field.map_marc8_field(f: Field) → Field[source]: Map MARC8 field.

Exceptions

Exceptions for pymarc.

exception pymarc.exceptions.BadLeaderValue[source]

Bases: PymarcException

Error when setting a leader value.

exception pymarc.exceptions.BadSubfieldCodeWarning[source]

Bases: Warning

Warning about a non-ASCII subfield code.

exception pymarc.exceptions.BaseAddressInvalid[source]

Bases: PymarcException

Base address exceeds size of record.

exception pymarc.exceptions.BaseAddressNotFound[source]

Bases: PymarcException

Unable to locate base address of record.

exception pymarc.exceptions.EndOfRecordNotFound[source]

Bases: FatalReaderError

Unable to locate end of record marker.

exception pymarc.exceptions.FatalReaderError[source]

Bases: PymarcException

Error preventing further reading.

exception pymarc.exceptions.FieldNotFound[source]

Bases: PymarcException

Record does not contain the specified field.

exception pymarc.exceptions.MissingLinkedFields(field)[source]

Bases: PymarcException

Error when a non-880 field has a subfield 6 that cannot be matched to an 880.

exception pymarc.exceptions.NoActiveFile[source]

Bases: PymarcException

There is no active file to write to in call to write.

exception pymarc.exceptions.NoFieldsFound[source]

Bases: PymarcException

Unable to locate fields in record data.

exception pymarc.exceptions.PymarcException[source]

Bases: Exception

Base pymarc Exception.

exception pymarc.exceptions.RecordDirectoryInvalid[source]

Bases: PymarcException

Invalid directory.

exception pymarc.exceptions.RecordLeaderInvalid[source]

Bases: PymarcException

Unable to extract record leader.

exception pymarc.exceptions.RecordLengthInvalid[source]

Bases: FatalReaderError

Invalid record length.

exception pymarc.exceptions.TruncatedRecord[source]

Bases: FatalReaderError

Truncated record data.

exception pymarc.exceptions.WriteNeedsRecord[source]

Bases: PymarcException

Write requires a pymarc.Record object as an argument.

MarcXML

From XML to MARC21 and back again.

class pymarc.marcxml.XmlHandler(strict=False, normalize_form=None)[source]

Bases: ContentHandler

XML Handler.

You can subclass XmlHandler and add your own process_record method that’ll be passed a pymarc.Record as it becomes available. This could be useful if you want to stream the records elsewhere (like to a rdbms) without having to store them all in memory.

characters(chars)[source]: Append chars to _text.

endElementNS(name, qname)[source]: End element NS.

process_record(record)[source]: Append record to records.

startElementNS(name, qname, attrs)[source]: Start element NS.

pymarc.marcxml.map_xml(function, *files)[source]

Map a function onto the file.

So that for each record that is parsed the function will get called with the extracted record

def do_it(r):
    print(r)

map_xml(do_it, 'marc.xml')

pymarc.marcxml.parse_xml(xml_file, handler)[source]: Parse a file with a given subclass of xml.sax.handler.ContentHandler.

pymarc.marcxml.parse_xml_to_array(xml_file, strict=False, normalize_form=None)[source]

Parse an XML file and return the records as an array.

Instead of passing in a file path you can also pass in an open file handle, or a file like object like StringIO. If you would like the parser to explicitly check the namespaces for the MARCSlim namespace use the strict=True option. Valid values for normalize_form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’. See unicodedata.normalize for more info on these.

pymarc.marcxml.record_to_xml(record, quiet=False, namespace=False)[source]: From MARC to XML.

pymarc.marcxml.record_to_xml_node(record, quiet=False, namespace=False)[source]

Converts a record object to a chunk of XML.

If you would like to include the marcxml namespace in the root tag set namespace to True.

Constants

Constants for pymarc.

MARC-8

Handle MARC-8 files.

see http://www.loc.gov/marc/specifications/speccharmarc8.html

class pymarc.marc8.MARC8ToUnicode(G0: int = 66, G1: int = 69, quiet: bool = False)[source]

Bases: object

Converts MARC-8 to Unicode.

Note that currently, unicode strings aren’t normalized, and some codecs (e.g. iso8859-1) will fail on such strings. When I can require python 2.3, this will go away.

Warning: MARC-8 EACC (East Asian characters) makes some distinctions which aren’t captured in Unicode. The LC tables give the option of mapping such characters either to a Unicode private use area, or a substitute character which (usually) gives the sense. I’ve picked the second, so this means that the MARC data should be treated as primary and the Unicode data used for display purposes only. (If you know of either of fonts designed for use with LC’s private-use Unicode assignments, or of attempts to standardize Unicode characters to allow round-trips from EACC, or if you need the private-use Unicode character translations, please inform me, asl2@pobox.com.

ansel = 69

basic_latin = 66

translate(marc8_string)[source]: Translate.

pymarc.marc8.marc8_to_unicode(marc8, hide_utf8_warnings: bool = False) → str[source]

Pass in a string, and get back a Unicode object.

print marc8_to_unicode(record.title())

MARC-8 mapping

MARC-8 mapping.

Leader

The pymarc.leader file.

class pymarc.leader.Leader(leader: str)[source]

Bases: object

Mutable leader.

A class to manipulate a Record’s leader.

You can use the properties (record_status, bibliographic_level, etc.) or their slices/index equivalent (leader[5], leader[7], etc.) to read and write values.

See LoC’s documentation for more infos about those fields.

leader = Leader("00475cas a2200169 i 4500")
leader[0:4]  # returns "00475"
leader.record_status  # returns "c"
leader.record_status = "a"  # sets the record status to "a"
leader[5] # returns the record status "a"
leader[5] = "b" # sets the record status to "b"
str(leader)  # "00475bas a2200169 i 4500"

Usually the leader is accessed through the leader property of a record.

from pymarc import MARCReader
with open('test/marc.dat', 'rb') as fh:
    reader = MARCReader(fh)
    for record in reader:
        print(record.leader)

When creating/updating a Record please note that record_length and base_address will only be generated in the marc21 output of record.as_marc()

property base_address: str: Base address of data (12-16).

property bibliographic_level: str: Bibliographic level (07).

property cataloging_form: str: Descriptive cataloging form (18).

property coding_scheme: str: Character coding scheme (09).

property encoding_level: str: Encoding level (17).

property implementation_defined_length: str: Length of the implementation-defined portion (22).

property indicator_count: str: Indicator count (10).

property length_of_field_length: str: Length of the length-of-field portion (20).

property multipart_ressource: str: Multipart resource record level (19).

property record_length: str: Record length (00-04).

property record_status: str: Record status (05).

property starting_character_position_length: str: Length of the starting-character-position portion (21).

property subfield_code_count: str: Subfield code count (11).

property type_of_control: str: Type of control (08).

property type_of_record: str: Type of record (06).

Pymarc

Reading

Writing

Updating

JSON, XML and Text

API Docs

Reader

Record

Writer

Field

Exceptions

MarcXML

Constants

MARC-8

MARC-8 mapping

Leader

Indices and tables