pymarc package¶
Submodules¶
pymarc.constants module¶
Constants for pymarc.
pymarc.exceptions module¶
Exceptions for pymarc.
pymarc.field module¶
The pymarc.field file.
-
class
pymarc.field.
Field
(tag, indicators=None, subfields=None, data=u'')[source]¶ Bases:
six.Iterator
Field() pass in the field tag, indicators and subfields for the tag.
- field = Field(
tag = ‘245’, indicators = [‘0’,‘1’], subfields = [
‘a’, ‘The pragmatic programmer : ‘, ‘b’, ‘from journeyman to master /’, ‘c’, ‘Andrew Hunt, David Thomas.’,])
If you want to create a control field, don’t pass in the indicators and use a data parameter rather than a subfields parameter:
field = Field(tag=‘001’, data=’fol05731351’)-
add_subfield
(code, value)[source]¶ Adds a subfield code/value pair to the field.
field.add_subfield(‘u’, ‘http://www.loc.gov’)
-
as_marc21
(encoding)¶ used during conversion of a field to raw marc
-
delete_subfield
(code)[source]¶ Deletes the first subfield with the specified ‘code’ and returns its value:
field.delete_subfield(‘a’)If no subfield is found with the specified code None is returned.
-
format_field
()[source]¶ Returns the field as a string without tag, indicators, and subfield indicators. Like pymarc.Field.value(), but prettier (adds spaces, formats subject headings).
-
get_subfields
(*codes)[source]¶ get_subfields() accepts one or more subfield codes and returns a list of subfield values. The order of the subfield values in the list will be the order that they appear in the field.
print field.get_subfields(‘a’) print field.get_subfields(‘a’, ‘b’, ‘z’)
-
is_control_field
()[source]¶ Returns true or false if the field is considered a control field. Control fields lack indicators and subfields.
-
class
pymarc.field.
RawField
(tag, indicators=None, subfields=None, data=u'')[source]¶ Bases:
pymarc.field.Field
MARC field that keeps data in raw, undecoded byte strings.
Should only be used when input records are wrongly encoded.
pymarc.marc8 module¶
pymarc marc8.py file.
-
class
pymarc.marc8.
MARC8ToUnicode
(G0=66, G1=69, quiet=False)[source]¶ Converts MARC-8 to Unicode. Note that currently, unicode strings aren’t normalized, and some codecs (e.g. iso8859-1) will fail on such strings. When I can require python 2.3, this will go away.
Warning: MARC-8 EACC (East Asian characters) makes some distinctions which aren’t captured in Unicode. The LC tables give the option of mapping such characters either to a Unicode private use area, or a substitute character which (usually) gives the sense. I’ve picked the second, so this means that the MARC data should be treated as primary and the Unicode data used for display purposes only. (If you know of either of fonts designed for use with LC’s private-use Unicode assignments, or of attempts to standardize Unicode characters to allow round-trips from EACC, or if you need the private-use Unicode character translations, please inform me, asl2@pobox.com.
-
ansel
= 69¶
-
basic_latin
= 66¶
-
pymarc.marc8_mapping module¶
pymarc.marcxml module¶
pymarc marcxml file.
-
class
pymarc.marcxml.
XmlHandler
(strict=False, normalize_form=None)[source]¶ Bases:
xml.sax.handler.ContentHandler
You can subclass XmlHandler and add your own process_record method that’ll be passed a pymarc.Record as it becomes available. This could be useful if you want to stream the records elsewhere (like to a rdbms) without having to store them all in memory.
-
pymarc.marcxml.
map_xml
(function, *files)[source]¶ map a function onto the file, so that for each record that is parsed the function will get called with the extracted record
- def do_it(r):
- print(r)
map_xml(do_it, ‘marc.xml’)
-
pymarc.marcxml.
parse_xml
(xml_file, handler)[source]¶ parse a file with a given subclass of xml.sax.handler.ContentHandler
-
pymarc.marcxml.
parse_xml_to_array
(xml_file, strict=False, normalize_form=None)[source]¶ parse an xml file and return the records as an array. If you would like the parser to explicitly check the namespaces for the MARCSlim namespace use the strict=True option. Valid values for normalize_form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’. See unicodedata.normalize info.
pymarc.reader module¶
-
class
pymarc.reader.
JSONReader
(marc_target, encoding='utf-8', stream=False)[source]¶ Bases:
pymarc.reader.Reader
-
class
pymarc.reader.
MARCReader
(marc_target, to_unicode=True, force_utf8=False, hide_utf8_warnings=False, utf8_handling='strict', file_encoding='iso8859-1')[source]¶ Bases:
pymarc.reader.Reader
An iterator class for reading a file of MARC21 records.
Simple usage:
from pymarc import MARCReader
## pass in a file object reader = MARCReader(file(‘file.dat’)) for record in reader:
…## pass in marc in transmission format reader = MARCReader(rawmarc) for record in reader:
…If you would like to have your Record object contain unicode strings use the to_unicode parameter:
reader = MARCReader(file(‘file.dat’), to_unicode=True)This will decode from MARC-8 or UTF-8 depending on the value in the MARC leader at position 9.
If you find yourself in the unfortunate position of having data that is utf-8 encoded without the leader set appropriately you can use the force_utf8 parameter:
- reader = MARCReader(file(‘file.dat’), to_unicode=True,
- force_utf8=True)
If you find yourself in the unfortunate position of having data that is mostly utf-8 encoded but with a few non-utf-8 characters, you can also use the utf8_handling parameter, which takes the same values (‘strict’, ‘replace’, and ‘ignore’) as the Python Unicode codecs (see http://docs.python.org/library/codecs.html for more info).
Although, it’s not legal in MARC-21 to use anything but MARC-8 or UTF-8, but if you have a file in incorrect encode and you know what it is, you can try to use your encode in parameter “file_encoding”.
pymarc.record module¶
-
class
pymarc.record.
Record
(data='', to_unicode=True, force_utf8=False, hide_utf8_warnings=False, utf8_handling='strict', leader=' ', file_encoding='iso8859-1')[source]¶ Bases:
six.Iterator
A class for representing a MARC record. Each Record object is made up of multiple Field objects. You’ll probably want to look at the docs for Field to see how to fully use a Record object.
Basic usage:
- field = Field(
tag = ‘245’, indicators = [‘0’,‘1’], subfields = [
‘a’, ‘The pragmatic programmer : ‘, ‘b’, ‘from journeyman to master /’, ‘c’, ‘Andrew Hunt, David Thomas.’,])
record.add_field(field)
Or creating a record from a chunk of MARC in transmission format:
record = Record(data=chunk)Or getting a record as serialized MARC21.
raw = record.as_marc()You’ll normally want to use a MARCReader object to iterate through MARC records in a file.
-
add_field
(*fields)[source]¶ add_field() will add pymarc.Field objects to a Record object. Optionally you can pass in multiple fields.
-
add_grouped_field
(*fields)[source]¶ add_grouped_field() will add pymarc.Field objects to a Record object, attempting to maintain a loose numeric order per the MARC standard for “Organization of the record” (http://www.loc.gov/marc/96principl.html) Optionally you can pass in multiple fields.
-
add_ordered_field
(*fields)[source]¶ add_ordered_field() will add pymarc.Field objects to a Record object, attempting to maintain a strict numeric order. Optionally you can pass in multiple fields.
-
addedentries
()[source]¶ Note: Fields 790-799 are considered “local” added entry fields but occur with some frequency in OCLC and RLIN records.
-
as_json
(**kwargs)[source]¶ Serialize a record as JSON according to http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
-
as_marc21
()¶ returns the record serialized as MARC21
-
decode_marc
(marc, to_unicode=True, force_utf8=False, hide_utf8_warnings=False, utf8_handling='strict', encoding='iso8859-1')[source]¶ decode_marc() accepts a MARC record in transmission format as a a string argument, and will populate the object based on the data found. The Record constructor actually uses decode_marc() behind the scenes when you pass in a chunk of MARC data to it.
-
get_fields
(*args)[source]¶ When passed a tag (‘245’), get_fields() will return a list of all the fields in a record with a given tag.
title = record.get_fields(‘245’)If no fields with the specified tag are found then an empty list is returned. If you are interested in more than one tag you can pass in a list:
subjects = record.get_fields(‘600’, ‘610’, ‘650’)If no tag is passed in to get_fields() a list of all the fields will be returned.
-
isbn
()[source]¶ Returns the first ISBN in the record or None if one is not present. The returned ISBN will be all numeric, except for an x/X which may occur in the checksum position. Dashes and extraneous information will be automatically removed. If you need this information you’ll want to look directly at the 020 field, e.g. record[‘020’][‘a’]
-
remove_field
(*fields)[source]¶ remove_field() will remove one or more pymarc.Field objects from a Record object.
-
remove_fields
(*tags)[source]¶ Remove all the fields with the tags passed to the function:
self.remove_fields(‘200’, ‘899’)will remove all the fields marked with tags ‘200’ or ‘899’.
-
series
()[source]¶ Note: 490 supersedes the 440 series statement which was both series statement and added entry. 8XX fields are added entries.
-
subjects
()[source]¶ Note: Fields 690-699 are considered “local” added entry fields but occur with some frequency in OCLC and RLIN records.
-
sudoc
()[source]¶ Returns a Superintendent of Documents (SuDoc) classification number held in the 086 MARC tag. Classification number will be made up of a variety of dashes, dots, slashes, and colons. More information can be found at the following URL: https://www.fdlp.gov/file-repository/gpo-cataloging/1172-gpo-classification-manual
pymarc.writer module¶
-
class
pymarc.writer.
JSONWriter
(file_handle)[source]¶ Bases:
pymarc.writer.Writer
A class for writing records as an array of MARC-in-JSON objects.
IMPORTANT: You must the close a JSONWriter, otherwise you will not get valid JSON.
Simple usage:
>>> from pymarc import JSONWriter ### writing to a file >>> writer = JSONWriter(open('file.json','wt')) >>> writer.write(record) >>> writer.close() # Important! ### writing to a string >>> string = StringIO() >>> writer = JSONWriter(string) >>> writer.write(record) >>> writer.close(close_fh=False) # Important! >>> print string
-
class
pymarc.writer.
MARCWriter
(file_handle)[source]¶ Bases:
pymarc.writer.Writer
A class for writing MARC21 records in transmission format.
Simple usage:
>>> from pymarc import MARCWriter ### writing to a file >>> writer = MARCWriter(open('file.dat','wb')) >>> writer.write(record) >>> writer.close() ### writing to a string (Python 2 only) >>> string = StringIO() >>> writer = MARCWriter(string) >>> writer.write(record) >>> writer.close(close_fh=False) >>> print string ### writing to memory (Python 3 only) >>> memory = BytesIO() >>> writer = MARCWriter(memory) >>> writer.write(record) >>> writer.close(close_fh=False)
-
class
pymarc.writer.
TextWriter
(file_handle)[source]¶ Bases:
pymarc.writer.Writer
A class for writing records in prettified text MARCMaker format.
A blank line separates each record.
Simple usage:
>>> from pymarc import TextWriter ### writing to a file >>> writer = TextWriter(open('file.txt','wt')) >>> writer.write(record) >>> writer.close() ### writing to a string >>> string = StringIO() >>> writer = TextWriter(string) >>> writer.write(record) >>> writer.close(close_fh=False) >>> print string
-
class
pymarc.writer.
Writer
(file_handle)[source]¶ Bases:
object
-
class
pymarc.writer.
XMLWriter
(file_handle)[source]¶ Bases:
pymarc.writer.Writer
A class for writing records as a MARCXML collection.
IMPORTANT: You must then close an XMLWriter, otherwise you will not get a valid XML document.
Simple usage:
>>> from pymarc import XMLWriter ### writing to a file >>> writer = XMLWriter(open('file.xml','wb')) >>> writer.write(record) >>> writer.close() # Important! ### writing to a string (Python 2 only) >>> string = StringIO() >>> writer = XMLWriter(string) >>> writer.write(record) >>> writer.close(close_fh=False) # Important! >>> print string ### writing to memory (Python 3 only) >>> memory = BytesIO() >>> writer = XMLWriter(memory) >>> writer.write(record) >>> writer.close(close_fh=False) # Important!
Module contents¶
The pymarc module provides an API for reading, writing and modifying MARC records. MARC (MAchine Readable Cataloging) is a metadata format for bibliographic data. More about MARC can be found at the Library of Congress: http://lcweb.loc.gov/marc
Below are some common examples of how you might want to use pymarc. If you run across an example that you think should be here please contribute it by writing to the author.
Reading a batch of records and printing out the 245 subfield a. If you are curious this example uses the batch file available in the distribution.
>>> from pymarc import MARCReader >>> reader = MARCReader(open('test/marc.dat', 'rb')) >>> for record in reader: ... print record['245']['a'] The pragmatic programmer : Programming Python / Learning Python / Python cookbook / Python programming for the absolute beginner / Web programming : Python programming on Win32 / Python programming : Python Web programming / Core python programming / Python and Tkinter programming / Game programming with Python, Lua, and Ruby / Python programming patterns / Python programming with the Java class libraries : Learn to program using Python : Programming with Python / BSD Sockets programming from a multi-language perspective / Design patterns : Introduction to algorithms / ANSI Common Lisp /
Creating a record and writing it out to a file.
>>> from pymarc import Record, Field >>> record = Record() >>> record.addField( ... Field( ... tag = '245', ... indicators = ['0','1'], ... subfields = [ ... 'a', 'The pragmatic programmer : ', ... 'b', 'from journeyman to master /', ... 'c', 'Andrew Hunt, David Thomas.' ... ])) >>> out = open('file.dat', 'wb') >>> out.write(record.asMARC21()) >>> out.close()