XML to Avro Converter

Transform XML data into Apache Avro schema format with automatic type detection and sample data generation

About Apache Avro

Apache Avro is a data serialization system that provides rich data structures, a compact binary format, and schema evolution capabilities. This tool converts XML data into Avro schema definitions with automatic type detection.

Features

  • Automatic Type Detection: Detects int, long, double, boolean, and string types
  • Nullable Fields: Option to make all fields nullable with union types
  • Custom Schema Names: Specify custom record name and namespace
  • Sample Data: Optionally include sample JSON data for testing
  • Field Sanitization: Ensures field names are valid Avro identifiers

Supported Data Types

int: 32-bit signed integers
long: 64-bit signed integers
double: Double precision floating point
boolean: True/false values
string: Unicode text strings
null: Null values (when nullable)

Example

Input XML:

<products>
  <product>
    <id>1</id>
    <name>Laptop</name>
    <price>999.99</price>
  </product>
</products>

Output Avro Schema:

{
  "type": "record",
  "name": "Record",
  "namespace": "com.example",
  "fields": [
    {
      "name": "id",
      "type": [
        "null",
        "int"
      ],
      "default": null
    },
    {
      "name": "name",
      "type": [
        "null",
        "string"
      ],
      "default": null
    },
    {
      "name": "price",
      "type": [
        "null",
        "double"
      ],
      "default": null
    }
  ]
}

Use Cases

  • Apache Kafka message schemas
  • Hadoop data serialization
  • Apache Spark data processing
  • Data lake storage formats
  • Schema registry integration
  • Cross-language data exchange

FAQ

  • Q: How are Avro field types chosen from XML values?
    A: The tool scans all non-empty values in each column and picks the narrowest common Avro type: int, long, double, boolean, or string. Mixed-type columns fall back to string.
  • Q: What does the "Make fields nullable" option do?
    A: When enabled, each field type becomes a union like {"null", "int"} with a default of null. This matches common patterns used with schema registries and Kafka.
  • Q: How should I choose the schema name and namespace?
    A: Use a meaningful record name (for example, Product) and a reverse-DNS namespace such as com.example.catalog. These identifiers appear in your Avro files and schema registry.
  • Q: What is the purpose of the optional sample data block?
    A: The JSON array appended after the schema shows how real records would look when encoded with the generated schema. It is meant for documentation and quick testing, not as part of the .avsc file.
  • Q: Can I use the schema directly with Kafka and schema registries?
    A: Yes. Save only the top JSON object (the Avro schema) to a .avsc file and register it with your schema registry or use it in your Kafka producer/consumer configuration.