杰瑞科技汇

Java如何处理XML文档?

Of course! Working with XML in Java is a fundamental task, and there are several ways to do it, each with its own strengths and use cases.

Java如何处理XML文档?-图1
(图片来源网络,侵删)

Here's a comprehensive guide covering the main approaches, from the classic to the modern.


Summary of Approaches

Method Key API Best For... Pros Cons
DOM org.w3c.dom Small to medium XML documents that need to be modified in memory. Full tree structure, easy to navigate, supports modification. High memory usage, slower for large files.
SAX org.xml.sax Large XML documents where memory efficiency is critical. Very low memory footprint, fast for reading-only. Read-only, complex to implement, sequential access only.
StAX javax.xml.stream A modern, efficient, and easy-to-use API for both reading and writing. Good balance of speed and memory, pull-based model, easy to learn. Less flexible than DOM for complex manipulations.
JAXB javax.xml.bind Data Binding. Converting Java objects to/from XML (Marshalling/Unmarshalling). Extremely easy for object-centric XML, standard since Java 6. Not suitable for generic XML processing (e.g., modifying an existing document structure).

The Classic DOM (Document Object Model) Approach

The DOM parser reads the entire XML document into memory and builds a tree-like structure of objects. You can then navigate, query, and modify this tree.

How it Works:

  1. Parse: The parser reads the XML and creates a Document object, which is the root of the tree.
  2. Traverse: You use methods like getElementsByTagName(), getChildNodes(), getParentNode() to navigate the tree.
  3. Modify: You can add, remove, or change nodes and their attributes.
  4. Write: You can serialize the modified Document object back to an XML file.

Example: Parsing and Reading with DOM

Let's use this sample XML file: books.xml

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
   </book>
</catalog>

Java Code:

Java如何处理XML文档?-图2
(图片来源网络,侵删)
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.File;
public class DomParserExample {
    public static void main(String[] args) {
        try {
            // 1. Create a DocumentBuilderFactory
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            // 2. Create a DocumentBuilder
            DocumentBuilder builder = factory.newDocumentBuilder();
            // 3. Parse the XML file to create a Document object
            Document document = builder.parse(new File("books.xml"));
            // 4. Normalize the document structure
            document.getDocumentElement().normalize();
            System.out.println("Root element: " + document.getDocumentElement().getNodeName());
            // 5. Get all book elements
            NodeList nodeList = document.getElementsByTagName("book");
            System.out.println("--------------------");
            // 6. Loop through the book nodes
            for (int i = 0; i < nodeList.getLength(); i++) {
                Node node = nodeList.item(i);
                if (node.getNodeType() == Node.ELEMENT_NODE) {
                    Element element = (Element) node;
                    // Get the 'id' attribute
                    String bookId = element.getAttribute("id");
                    // Get text content of child elements
                    String author = element.getElementsByTagName("author").item(0).getTextContent();
                    String title = element.getElementsByTagName("title").item(0).getTextContent();
                    String genre = element.getElementsByTagName("genre").item(0).getTextContent();
                    String price = element.getElementsByTagName("price").item(0).getTextContent();
                    System.out.println("Book ID: " + bookId);
                    System.out.println("   Author: " + author);
                    System.out.println("   Title: " + title);
                    System.out.println("   Genre: " + genre);
                    System.out.println("   Price: " + price);
                    System.out.println("--------------------");
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The SAX (Simple API for XML) Approach

SAX is an event-driven, read-only parser. It doesn't load the document into memory. Instead, it reads the XML sequentially from top to bottom and triggers events (like startElement, endElement, characters) as it encounters different parts of the document.

How it Works:

  1. Create a custom class that extends DefaultHandler and overrides its event methods.
  2. Create a SAXParserFactory and a SAXParser.
  3. Call the parse() method, passing your XML file and an instance of your custom handler.

Example: Parsing with SAX

Java Code:

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.File;
public class SaxParserExample {
    public static void main(String[] args) {
        try {
            SAXParserFactory factory = SAXParserFactory.newInstance();
            SAXParser saxParser = factory.newSAXParser();
            DefaultHandler handler = new DefaultHandler() {
                boolean inBook = false;
                boolean inAuthor = false;
                boolean inTitle = false;
                @Override
                public void startElement(String uri, String localName,
                                         String qName, Attributes attributes) throws SAXException {
                    if (qName.equalsIgnoreCase("book")) {
                        System.out.println("\nFound Book. ID: " + attributes.getValue("id"));
                        inBook = true;
                    }
                    if (qName.equalsIgnoreCase("author")) {
                        inAuthor = true;
                    }
                    if (qName.equalsIgnoreCase("title")) {
                        inTitle = true;
                    }
                }
                @Override
                public void characters(char[] ch, int start, int length) throws SAXException {
                    if (inAuthor) {
                        System.out.println("   Author: " + new String(ch, start, length));
                        inAuthor = false;
                    }
                    if (inTitle) {
                        System.out.println("   Title: " + new String(ch, start, length));
                        inTitle = false;
                    }
                }
                @Override
                public void endElement(String uri, String localName, String qName) throws SAXException {
                    if (qName.equalsIgnoreCase("book")) {
                        inBook = false;
                    }
                }
            };
            saxParser.parse(new File("books.xml"), handler);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The Modern StAX (Streaming API for XML) Approach

StAX is a pull-parsing API. Unlike SAX where the parser "pushes" events to your handler, with StAX you "pull" events from the parser one by one. This gives you more control and makes the code easier to write and understand than SAX.

How it Works:

  1. Create an XMLInputFactory.
  2. Create an XMLEventReader from a stream (e.g., FileInputStream).
  3. Loop through the events using reader.hasNext() and reader.nextEvent().
  4. Check the event type (e.g., START_ELEMENT, CHARACTERS, END_ELEMENT) and process it.

Example: Parsing with StAX

Java Code:

Java如何处理XML文档?-图3
(图片来源网络,侵删)
import javax.xml.stream.*;
import javax.xml.stream.events.*;
import java.io.FileInputStream;
public class StaxParserExample {
    public static void main(String[] args) {
        try {
            XMLInputFactory factory = XMLInputFactory.newInstance();
            XMLEventReader eventReader = factory.createXMLEventReader(new FileInputStream("books.xml"));
            while (eventReader.hasNext()) {
                XMLEvent event = eventReader.nextEvent();
                if (event.isStartElement()) {
                    StartElement startElement = event.asStartElement();
                    String qName = startElement.getName().getLocalPart();
                    if (qName.equalsIgnoreCase("book")) {
                        System.out.println("\nFound Book. ID: " + startElement.getAttributeByName(new QName("id")).getValue());
                    } else if (qName.equalsIgnoreCase("author")) {
                        event = eventReader.nextEvent(); // Move to characters event
                        System.out.println("   Author: " + event.asCharacters().getData());
                    } else if (qName.equalsIgnoreCase("title")) {
                        event = eventReader.nextEvent();
                        System.out.println("   Title: " + event.asCharacters().getData());
                    }
                }
            }
            eventReader.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The Data-Binding Approach with JAXB (Java Architecture for XML Binding)

JAXB is the easiest way to handle XML if your XML structure directly maps to a Java object model. It uses annotations to link Java fields to XML elements/attributes.

How it Works:

  1. Create Java Classes: Annotate your Java classes with @XmlRootElement, @XmlElement, etc.
  2. Unmarshal: Convert XML data into Java objects.
  3. Marshal: Convert Java objects into XML data.

Example: Using JAXB

Define Java Classes

import javax.xml.bind.annotation.*;
import java.util.ArrayList;
import java.util.List;
// This annotation maps the class to the root XML element <catalog>
@XmlRootElement(name = "catalog")
@XmlAccessorType(XmlAccessType.FIELD)
public class Catalog {
    @XmlElement(name = "book")
    private List<Book> books = new ArrayList<>();
    // Getters and Setters
    public List<Book> getBooks() { return books; }
    public void setBooks(List<Book> books) { this.books = books; }
}
@XmlAccessorType(XmlAccessType.FIELD)
public class Book {
    @XmlAttribute
    private String id;
    @XmlElement
    private String author;
    @XmlElement
    private String title;
    @XmlElement
    private String genre;
    @XmlElement
    private double price;
    // Getters and Setters
    public String getId() { return id; }
    public void setId(String id) { this.id = id; }
    public String getAuthor() { return author; }
    public void setAuthor(String author) { this.author = author; }
    // ... other getters and setters
}

Unmarshal (XML -> Java Objects)

import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import java.io.File;
public class JaxbUnmarshalExample {
    public static void main(String[] args) {
        try {
            File file = new File("books.xml");
            JAXBContext jaxbContext = JAXBContext.newInstance(Catalog.class);
            Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
            Catalog catalog = (Catalog) jaxbUnmarshaller.unmarshal(file);
            for (Book book : catalog.getBooks()) {
                System.out.println("Book ID: " + book.getId() + ", Title: " + book.getTitle());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Marshal (Java Objects -> XML)

import javax.xml.bind.JAXBContext;
import javax.xml.bind.Marshaller;
import java.io.File;
public class JaxbMarshalExample {
    public static void main(String[] args) {
        try {
            // Create a new book object
            Book newBook = new Book();
            newBook.setId("bk103");
            newBook.setAuthor("Author, New");
            newBook.setTitle("The New Java Guide");
            newBook.setGenre("Technology");
            newBook.setPrice(29.99);
            // Create a catalog and add the book
            Catalog catalog = new Catalog();
            catalog.getBooks().add(newBook);
            // Marshal to a new XML file
            JAXBContext jaxbContext = JAXBContext.newInstance(Catalog.class);
            Marshaller marshaller = jaxbContext.createMarshaller();
            marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); // Pretty print
            marshaller.marshal(catalog, new File("new_books.xml"));
            System.out.println("XML file created successfully!");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Which One Should I Choose?

  • Use JAXB if: Your primary goal is to serialize/deserialize data between XML and your Java application's object model. This is the most common use case in modern applications.
  • Use StAX if: You need to process large XML files efficiently, read or write them sequentially, and want more control than SAX with simpler code than DOM.
  • Use DOM if: The XML document is small enough to fit in memory, and you need to perform complex, random access, or frequent modifications to the document structure.
  • Use SAX if: You are working with extremely large XML files, memory is a critical constraint, and you only need to read the data once from top to bottom. It's the most memory-efficient but also the least flexible.
分享:
扫描分享到社交APP
上一篇
下一篇