Markdown to Avro Converter

Transform Markdown tables into Apache Avro schema with type detection

About Markdown to Avro Converter

Convert Markdown tables to Apache Avro schema format with automatic type detection and data serialization. Perfect for big data applications using Hadoop, Kafka, and other Apache ecosystem tools.

Key Features

  • Automatic Type Detection: Intelligently detects int, long, double, boolean, and string types
  • Nullable Fields: All fields support null values for flexibility
  • Custom Schema Names: Configure schema name and namespace
  • Sample Data Generation: Optionally include JSON data for testing
  • Field Name Sanitization: Converts headers to valid Avro field names
  • Documentation: Preserves original header names in field docs
  • File Download: Save as .avsc schema file

How to Use

  1. Input Markdown Table: Paste your Markdown table or upload a .md file
  2. Configure Schema: Set schema name and namespace
  3. Choose Options: Toggle sample data inclusion
  4. Review Output: The Avro schema generates automatically
  5. Copy or Download: Use the Copy or Download button to save your schema

Type Detection

  • int: Positive integers
  • long: Negative integers or large numbers
  • double: Decimal numbers
  • boolean: true/false values
  • string: All other text data
  • null: Empty cells are treated as null

Example Conversion

Markdown Input:

| Name | Age | City | Active |
|------|-----|------|--------|
| John Doe | 28 | New York | true |
| Jane Smith | 34 | London | false |

Avro Schema Output:

{
  "type": "record",
  "name": "TableData",
  "namespace": "com.example",
  "doc": "Generated from Markdown table",
  "fields": [
    {
      "name": "name",
      "type": ["null", "string"],
      "doc": "Name"
    },
    {
      "name": "age",
      "type": ["null", "int"],
      "doc": "Age"
    },
    {
      "name": "city",
      "type": ["null", "string"],
      "doc": "City"
    },
    {
      "name": "active",
      "type": ["null", "boolean"],
      "doc": "Active"
    }
  ]
}

Common Use Cases

  • Hadoop Integration: Define schemas for Hadoop data processing
  • Kafka Streaming: Create schemas for Kafka message serialization
  • Data Lakes: Structure data for Apache Parquet and ORC formats
  • ETL Pipelines: Define data contracts for ETL processes
  • Documentation to Data: Convert documentation tables to big data formats
  • About Apache Avro: Learn and experiment with Avro schema design

About Apache Avro

Apache Avro is a data serialization system that provides rich data structures, a compact binary format, and schema evolution capabilities. It's widely used in big data ecosystems for efficient data storage and transmission.

FAQ

  • Does the converter infer complex Avro types like arrays or nested records?

    No. This tool focuses on simple column types inferred from Markdown table values (int, long, double, boolean, string). If you need arrays or nested records, you can use the generated schema as a starting point and manually extend it.

  • Why are all fields defined as union types with "null"?

    The schema uses ["null", "TYPE"] for each field so empty cells in your Markdown table map cleanly to null values in Avro. This is a common pattern when working with real-world data that may have missing values.

  • Can I change the field names after generation?

    Yes. The tool sanitizes headers into Avro-safe field names, but you can rename them in the generated JSON schema as long as you keep them valid identifiers and update any downstream code that relies on those names.

  • Is the sample data block required for Avro to work?

    No. The Avro schema itself is the only required part. The optional sample data array is provided for testing and documentation purposes and can be safely removed if you only need the schema.

  • Is my Markdown content uploaded anywhere?

    No. All parsing and schema generation happen locally in your browser. Your Markdown tables never leave your device.

Privacy & Security

All conversions happen locally in your browser. Your Markdown data is never uploaded to any server, ensuring complete privacy and security.