Overview

Features

  • Convert Excel/CSV/XML/YAML to JSON/Text/Bin.
  • Use Protobuf to define the structure of Excel/CSV/XML/YAML.
  • Use Golang to develop the conversion engine.
  • Support multiple programming languages, thanks to Protobuf (proto3).

Concepts

  • Importer:
    • imports a Excel/CSV file to a in-memory book of Table sheets.
    • imports a XML/YAML file to a in-memory book of Document sheets.
  • Parsers:
    • protogen: converts Excel/CSV/XML/YAML files to Protoconf files.
    • confgen: converts Excel/CSV/XML/YAML with Protoconf files to JSON/Text/Bin files.
  • Exporter:
    • protogen: exports a tableau.Workbook to a proto file.
    • confgen: exports a protobuf message to a JSON/Text/Bin file.
  • Protoconf: a dialect of Protocol Buffers (proto3) extended with tableau options, aimed to define the structure of Excel/CSV/XML/YAML.

Workflow

flowchart TD
  
  subgraph Input
    I1(Excel)
    click I1 href "https://github.com/qax-os/excelize"

    I2(CSV)
    click I2 href "https://pkg.go.dev/encoding/csv"

    I3(XML)
    click I3 href "https://github.com/antchfx/xmlquery"

    I4(YAML)
    click I4 href "https://github.com/go-yaml/yaml"
  end

  Input --> I[importer]
  click I href "https://github.com/tableauio/tableau/tree/master/internal/importer"

  subgraph Protogen
    PGP[protogen.Parser] --> E1[protogen.Exporter]
    click PGP href "https://github.com/tableauio/tableau/tree/master/internal/protogen/parser.go"
    click E1 href "https://github.com/tableauio/tableau/tree/master/internal/protogen/exporter.go"
  end

  I --> Protogen:::orangeclass

  subgraph Confgen
    CGP[confgen.Parser] --> CE[confgen.Exporter]
    click CGP href "https://github.com/tableauio/tableau/tree/master/internal/confgen/parser.go"
    click CE href "https://github.com/tableauio/tableau/tree/master/internal/confgen/mexporter"
  end

  I --> Confgen:::orangeclass
 
  Protogen --> B
  B{{Protoconf}}:::greenclass --> | protobuf descriptor | Confgen

  subgraph Output
    O1("JSON")
    click O1 href "https://developers.google.com/protocol-buffers/docs/proto3#json"

    O2("Text")
    click O2 href "https://developers.google.com/protocol-buffers/docs/text-format-spec"

    O3("Bin")
    click O3 href "https://developers.google.com/protocol-buffers/docs/encoding"
  end

  Confgen --> Output
  
  classDef orangeclass fill:#f96;
  classDef greenclass fill:#40E0D0;
  

Types

  • Scalar
  • Message(struct)
  • List
  • Map(unordered)
  • Timestamp
  • Duration

TODO

protoc plugins

  • Golang
  • C++
  • C#/.NET
  • Python
  • Lua
  • Javascript/Typescript/Node
  • Java

Metadata

  • metatable: a message to describe the worksheet’s metadata.
  • metafield: a message to describe the caption’s metadata.
  • captrow: caption row, the exact row number of captions at worksheet. Newline in caption is allowed for more readability, and will be trimmed in conversion.
  • descrow: description row, the exact row number of descriptions at worksheet.
  • datarow: data row, the start row number of data.

Newline(line break) in major operating systems:

OSAbbreviationEscape sequence
Unix (linux, OS X)LF\n
Microsoft WindowsCRLF\r\n
classic Mac OS/OS XCR\r

LF: Line Feed, CR: Carriage Return.

Mac OS X

Generator

  • generate protoconf by Excel(header): Excel -> protoconf
  • generate Excel(header) by protoconf: protoconf -> Excel

Conversion

  • Excel -> JSON(default format and human readable)
  • Excel -> protowire(small size)
  • Excel -> prototext(human debugging)
  • JSON -> Excel
  • protowire -> Excel
  • prototext -> Excel

Pretty Print

  • Multiline: every textual element on a new line
  • Indent: 4 space characters
  • JSON support
  • prototext support

EmitUnpopulated

  • JSON: EmitUnpopulated specifies whether to emit unpopulated fields.

Scalar Types

  • interger: int32, uint32, int64 and uint64
  • float: float and double
  • bool
  • string
  • bytes
  • datetime, date, time, duration

Enumerations

  • enum: The Parser accepts three enum value forms:
    • enum value number
    • enum value name
    • enum value alias name (with EnumValueOptions specified)
  • enum: validate the enum value.

Composite Types

  • message: horizontal(row direction) layout, fields located in cells.
  • message: simple in-cell message, each field must be scalar type. It is a comma-separated list of fields. E.g.: 1,test,3.0. List’s size need not to be equal to fields' size, as fields will be filled in order. Fields not configured will be filled default values due to its scalar type.
  • list: horizontal(row direction) layout, which is list’s default layout, and each item can be message or scalar.
  • list: vertical(column direction) layout. and each item should be message.
  • list: simple in-cell list, element must be scalar type. It is a comma-separated list of elements. E.g.: 1,2,3.
  • list: scalable or dynamic list size.
  • list: smart recognition of empty element at any position.
  • list
    • In-cell struct list: no need to support
    • Cross-cell horizontal scalar/enum list
    • Cross-cell horizontal incell-struct list
    • Cross-cell vertical scalar list: no need to support, use this: [Item]int32
    • Cross-cell vertical incell-struct list
  • list size
    • dynamic size: items should be present continuously, and report error if empty item is inserted.
    • fixed size
  • map: horizontal(row direction) layout.
  • map: vertical(column direction) layout, and is map’s default layout.
  • map: unordered-map or hash-map.
  • map: ordered-map supported by tableauio/loader.
    • C++
    • Golang
    • C#
  • map: simple in-cell map, both key and value must be scalar type. It is a comma-separated list of key:value pairs. E.g.: 1:10,2:20,3:30.
  • map: scalable or dynamic map size.
  • map: smart recognition of empty value at any position.
  • map
    • Cross-cell horizontal scalar map: no need to support, use this: map<int32, Item>
    • Cross-cell vertical scalar map: : no need to support, use this: map<int32, Item>
  • map size
    • dynamic size: items should be present continuously, and report error if empty item is inserted.
    • fixed size
  • nesting: unlimited nesting of message, list, and map.
  • nesting: the composite type’s first element can be composite type.

Default Values

Each scalar type’s default value is same as protobuf.

  • interger: 0
  • float: 0.0
  • bool: false
  • string: ""
  • bytes: ""
  • in-cell message: each field’s default value is same as protobuf
  • in-cell list: element’s default value is same as protobuf
  • in-cell map: both key and value’s default value are same as protobuf
  • message: all fields have default values

Empty

  • scalar: default value same as protobuf.
  • message: empty message will not be spawned if all fields are empty.
  • list: empty list will not be spawned if list’s size is 0.
  • list: empty message will not be appended if list’s element(message type) is empty.
  • map: empty map will not be spawned if map’s size is 0.
  • map: empty message will not be inserted if map’s value(message type) is empty.
  • nesting: recursively empty.

Merge

  • merge multiple workbooks
  • merge multiple worksheets

Workbook meta

workbook meta sheet @TABLEAU:

  • specify which sheets to be parsed
  • specify parser options for each sheet
SheetAliasNamelineTypeline
Sheet1ExchangeInfo22

Datetime

Understanding about RFC 3339 for Datetime and Timezone Formatting in Software Engineering

2019-10-12T07:20:50.52Z # This is acceptable in ISO 8601 and RFC 3339 (with T) 2019-10-12 07:20:50.52Z # This is only accepted in RFC 3339 (without T)

  • “Z” stands for Zero timezone or Zulu timezone UTC+0, and equal to +08:00 in the RFC 3339.
  • RFC 3339 follows the ISO 8601 DateTime format. The only difference is RFC allows us to replace “T” with “space”.

Use RFC 3339 , which is following ISO 8601.

  • Timestamp: based on google.protobuf.Timestamp, see JSON mapping
  • Timezone: see ParseInLocation
  • DST: Daylight Savings Time. There is no plan to handle this boring stuff.
  • Datetime: excel format: yyyy-MM-dd HH:mm:ss, e.g.: 2020-01-01 05:10:00
  • Date: excel format: yyyy-MM-dd or yyMMdd, e.g.: 2020-01-01 or 20200101
  • Time: excel format: HH:mm:ss or HHmmss, e.g.: 05:10:00 or 051000
  • Duration: based ongoogle.protobuf.Duration , see JSON mapping
  • Duration: excel format: form "72h3m0.5s", see golang duration string form

Transpose

  • Interchange the rows and columns of a worksheet.

Validation

  • unique: check map key uniqueness.
  • range: [left, right].
  • refer: XXXConf.ID. To be supported by tableauio/loader.

Error Message

  • Report clear and precise error messages when converter failed, please refer to the programming language compiler
  • Use golang template to define error message template
  • Multiple languages support, focused on English and Simplified Chinese

Performace

  • Stress test
  • Each goroutine process one worksheet
  • Mutiple process model