JewishGen Home Page

The Genealogical Research Division of

DIDO — Design Specification

DIDO — Design Specification
Version 1.01
Warren Blatt
April 2005
Last Update: December 2011

JewishGen needs a way to accomodate small datasets and miscellaneous lists of names in various formats, make them soundex-searchable, and integrate them into geographic-based “All Country” databases.


Introduction
Structure of the DIDO Database
    · Dataset Table
    · Data Table
    · Open Issues / Questions
Search Results Display
Procedures

Many years ago (circa 1997), we envisioned a database system for managing small sets of data, which we nicknamed “DIDO” (“Data In, Data Out”).  This system would allow for small user-contributed datasets to be placed online easily, and become searchable in a central database.

Any user can contribute a "dataset" of any size or type — even just a handful of records — as long as it conforms to the DIDO data specifications.

In some ways, DIDO is similar to Ed Rosenbaum's "Belarus Static Index", "Belarus Names Database" and "Galicia Surname Index" — except that DIDO data will be in an actual database, and thus be phonetically and soundex-searchable, sortable, filterable, etc., and with the data integrated into the JewishGen “All Country” databases.

DIDO is intended for simple lists of data — it is not for extensive lists with many data fields per record.  Examples of simple lists are: membership lists of landsmanshaftn and other organizations, voter/electors' lists, prenumerantn or other donor lists, name indexes to books, etc. — data which contains only a person's name plus perhaps one or two other data fields.  There are many such lists throughout the JewishGen web site, in static form — on the Yizkor Book Project site, JewishGen KehilaLinks pages (which the KehilaLinks coordinator is currently inventorying), on various SIG pages, etc.

[Volunteers for the JewishGen Yizkor Book Project had identified about 40,000 names on various lists on the website (see internal list) which could be included in such an index.  In July 2010, this data became the Yizkor Book Master Name Index (YBMNI), which as of July 2011 includes 57,000 names].

DIDO is to be used only for data which does not fall into one of our pre-existing categories of “All Topic” databases, i.e.: Cemetery/burial data (JOWBR), Yizkor Book Necrologies, Russian Business Directories, Polish vital records (JRI-Poland), Duma Voter Lists, Czarist Revision Lists, Czarist Vital Records (see templates), etc.  DIDO is for miscellaneous data only.


Structure of the DIDO Database

Internally, the DIDO database consists of two related tables:

  • Dataset Table — describes the datasets.  One row for each dataset.
  • Data Table — the actual individual data.  One row for each person.

This data structure organization is conceptually similar to that of the JOWBR database, as described at original JOWBR Design Spec, where there's one table for the cemeteries, and one table for the individual burials within the cemeteries.

Dataset Table:

The Dataset Table contains one row for each dataset.  Its columns are:
Dataset ID# An ID# (arbitrary, never displayed), which is used only to tie the two tables together.
Dataset Title A short description of the dataset, to be used for display.  Should be limited to 50 or so characters.
Dataset Description A long description of the dataset.
Should contain a complete and thorough description of this dataset, its source, and any interpretation needed.
URL Web address of a page with additional information about this dataset.  Optional?  Or maybe this should be part of the “Dataset Description” field.  Or maybe the entire “Dataset Description” should be an external HTML page...  TBD.
Region Used to determine which (if any) of the “All Country” databases this dataset should be incorporated, and into which sub-region(s).
See regions as defined in http://www.jewishgen.org/databases/Cemetery/JOWBR_Regions.htm.
Issue: Should this be specified as a Region Name, a set of Region Names, and/or a set of “All Country” database names?
Contributor Contact information about the person who submitted this database.
Use the submitter's JGID Number, to link to their full contact information in CURE.

Data Table:

The Data Table contains the actual data.  Because of the wide variety of data which the DIDO database can accept (membership lists, book indexes, etc.), we are keeping the number of columns to a minimum, and having the columns be extremely generic, so that all of the varied data can fit.  Any data that does not fit should be placed in the last column, "Other".

The Data Table contains one row for each individual.  Its columns are:
Dataset ID# Used to identify the dataset — A link to the information in the Dataset Table It will be the same number for all rows in the dataset.  Non-displayed; used only to create linkage to the Dataset Table info.
Name
Surname Last Name of the individual.
Given Name(s) First Name(s) of the individual.
Patronymic ??? — Could perhaps be placed in the GivenName field.
Location
Town The name of the locality associated with this record, as indicated in the original record.  If there are multiple towns, separate each with a slash ["/"].  (as per Transcription Rule I.2.d).
District ??? — The town's state / province / uyezd / gubernia — optional, as provided in the original record.
Country Country where the town was located, as of the time of the record — optional, as provided in the original record.
USBGN ??? — The town's USBGN Feature Code Number — for linkage to the "JewishGen Locality Page".  Non-displayed.
Date Date of the record.  Can be a complete date, or just a year — whatever is in the original record.  Can be blank if unknown.
The "Dataset Description" field of the corresponding row in the Dataset Table should state what this date represents, e.g.: a date of birth, date of voter registration, date of membership, date of immigration, date of publication, etc.
Other A large text field to contain all of the other data which doesn't fit into any other column.  Maximum of 254 characters.
The contents should be described in the "Dataset Description" field of the corresponding row in the Dataset Table. It could be a page number, an age, an address, amount of a donation, an occupation, etc., etc.

  • The "Dataset ID#" field will be filled in by the DIDO Database Coordinator, not by the original contributor.

  • Any of the "Location" fields could be blank, or may often all contain the same data for every row in the table, for some datasets.

Open Issues / Questions:

  1. USBGN Feature Codes:
    Perhaps the "USBGN" field in the Data Table should be filled in by the DIDO Database Coordinator, rather than by the dataset's contributor.  This will be less onerous for the contributor.  Or maybe we allow only one USBGN Feature Code for each dataset, and associate it with the Dataset, rather than with each individual item in the Data Table... TBD.

  2. "Other Surnames" and "Other Towns":
    Do we want to have an "Other Surnames" and/or "Other Towns" column in the Data Table, to include items mentioned within the "Other" column? (See Transcription Rules for JewishGen Databases, Section V).  This can complicate things for the average submitter.  We want to keep things as simple as possible in DIDO.  Perhaps these could be optional 'advanced' columns... or we could just forego them, and assume that there is only one surname and one town for each item in a DIDO record.  If multiple surnames/towns do exist, they could be entered into the existing fields, separated with slashes, as specified in Section I.1.d of the "Transcription Rules for JewishGen Databases".

  3. Regions:
    How should the associated Region(s) be specified?  And how to specify which “All Country” Database(s) the dataset is placed?  Should the "Region" column in the Dataset Table be a single Region Name, a set of Region names, and/or a set of “All Country” database names?
    Or should there be a set of boolean “All Country” and “All Topic” database flags?
    This needs to be coordinated with our existing "REGIONS" SQL database.

    What about inclusion in “All Topic” databases?  As of now, the “JewishGen Holocaust Database” is the only applicable “All Topic” database... but a "Sephardic Database" is potentially in the works.  Can this also somehow be specified within the "Regions" field... or should we use separate boolean indicators in the Dataset Table?


Search Results Display

The search results data display will contain the following columns:

Name (Surname, Given Name(s), Patronymic) "WHO"
Location (Town, District, Country) "WHERE"
Date "WHEN"
Other (Comments) "HOW"
Source (ID# → Dataset Title) "WHAT"

  • The "Source" column's data will display the dataset's "Dataset Title", and be a hyperlink to the "Dataset Description", the full description of the Dataset.

  • The "Location" column's data will be hyperlinked to that location's "JewishGen Locality Page", and have the Communities Database's Ajax mouse-over feature , if the "USBGN" field in the Data Table is filled in.

We would need a second-level display page — similar to JOWBR's "Cemetery Information" page (for example, Vienna's "Wiener Zentralfriedhof") — to display full information about the Dataset.

Integration into “All Country” and “All Topic” Databases

Component datasets of DIDO can then be incorporated into the various “All Country” and “All Topic” systems, as appropriate, as controlled by the "Region" field of the Dataset Table. (See Open Issue #3, above).

Display of entire datasets

The facility should also include a programmatic mechanism to display an entire dataset, i.e. a search based on the "Dataset ID#" — so that a KehilaLinks page could link to a list, displayed much like one in its current static form.  This allows datasets to be "browsable" by the user, just like the static datasets are today.

In order to deter data-mining, the "Dataset ID#" should probably be a random set of alphanumeric characters, rather than a sequential integer.

Perhaps this should be an optional feature, on a case-by-case basis per dataset, as determined by the dataset's contributor and/or JewishGen.  This feature would require an additional boolean field in the Dataset Table.


Procedures

The method for submitting a dataset to DIDO should be relatively straight-forward, and not as onerous as that for a full-blown JewishGen database — to encourage people to submit data.

This will require a DIDO Database Coordinator and a DIDO Admin Panel.

  • A DIDO Database Coordinator will supervise the entire operation, ongoing: for correspondence, quality control, database maintenance, etc.

  • A DIDO Admin Panel interface should be developed, for the DIDO Database Coordinator to manage the DIDO Dataset Table, similar to the JOWBR Admin Panel.  The DIDO Database Coordinator should be able to add / remove / replace datasets in the live search engine with little or no intervention by the JewishGen staff.

The procedures for data preparation and submission for DIDO could be modeled after the "Database Factory" concept that we discussed and began prototyping in 2002-2003.

There should be a downloadable data template with instructions for submitters, similar to the way we've done the JOWBR Template and the other templates currently available at http://www.jewishgen.org/databases/templates.

The DIDO Database Coordinator would receive the completed templates, do some minimal QA, and add them to the DIDO Admin Panel.  The Coordinator should be able to put the data into "test" mode, and to make the data "live" — all without any intervention by the JewishGen staff or highly-techincal volunteers.

JewishGen Home Page Edmond J. Safra Plaza | 36 Battery Place | New York, NY 10280
646.494.2972 | info@jewishgen.org | © 2025, JewishGen, Inc. All rights reserved.