![]() |
![]() |
****JavaScript based drop down DHTML menu generated by NavStudio. (OpenCube Inc. - http://www.opencube.com)****
|
|
JGFF to ShtetlMaster - Design Specification Version 1.06 |
Warren Blatt May 2004 Last Update: Sep 2, 2005 |
JewishGen needs to migrate the JewishGen Family Finder (JGFF) database from its current format to one that is more flexible and integrated with a geographic-based system (ShtetlMaster).
Phases:The JGFF-to-ShtetlMaster migration is a very large project. Therefore it will need to be accomplished in several discrete phases, over a long period of time. Here is a proposed sequence of implementation.
|
Separate the JGFF's current DATA.DBF table into two separate tables.
Create a new JGFF Towns table, JGFFTOWNS.DBF, which will contain the following columns.
JGFFTOWNS.DBF:
JGFF Town ID# | An artificial, internal-only number. |
Town | Town name |
Country | Country containing the Town. 3-4 letter abbrev. |
USBGN Code | Future: USBGN Feature Code Number |
Num USBGN | Future: Number of USBGN entries with this name |
Notes | Future: Administrator Notes |
The "JGFF Town ID#" is a artifically-created number. It is an arbitrary number, never displayed to users, which is used only to link entries in the DATA.DBF and JGFFTOWNS.DBF files together. A candidate index should be placed on this field, to ensure uniqueness.
The "Town" and "Country" fields are taken directly as is from the current two columns in DATA.DBF. As a pair, the values should be unique (candidate index).
The "USBGN Code", "Num USBGN" and "Notes" fields can all remain blank for now. They are placeholders, to be used in later phases of this project. The "USBGN Code" field should be unique when non-null (candidate index?).
Change the DATA.DBF file — here are the current and proposed data structures for DATA.DBF:
Current DATA.DBF columns:
|
New DATA.DBF proposed columns:
|
Basically, the "Town", "Country", "Source" and "Gotcha" columns are being replaced by the new "TownID" column, which is a link into the new JGFFTOWNS.DBF file. The revised DATA.DBF file should be half of its current size.
Code Changes: There will be a major re-write of all internal JGFF programs, to use the two tables instead of one. This is likely the largest programming task of the entire project, especially the update program jgffupd2.
JGFF Town ID#: Note that we're using an artificial "JGFF Town ID#", rather than a USBGN Feature Code Number as originally planned, because so many items in the JGFF don't have a USBGN Feature Code Number:
Also note that the JGFFTOWNS.DBF is an "intermediate" table, which sits between the DATA.DBF file and the ShtetlSeeker/ShtetlMaster tables. We can't use the ShtetlSeeker table directly, because we need to limit the set of towns searched to be only those towns in the JGFF, and because many entries in the JGFF don't have a USBGN Code Number, as noted above.
The auto-replace town synonyms file, JGFFSYN.DBF, will also need to change its structure. Instead of the 4 columns "OldTown, OldCountry, NewTown, NewCountry", the new structure will be 3 columns: "OldTown, OldCountry, NewJGFFTownID#". The JGFF Town Synonyms Admin Panel web interface will need to change accordingly.
Current JGFFSYN.DBF columns:
|
New JGFFSYN.DBF columns:
|
Any other process which touches the current DATA.DBF file will be affected by this change. This includes the JGFF-to-Goldmine batch processes, the de-dupe and re-index utility (indexjgffdata), the move records utility (jgffmove2), the JGFF Statistics report (jgffstats), etc., etc.
We probably should rename the revised DATA.DBF file to be something like JGFFDATA.DBF, and archive and delete the old DATA.DBF file, so that any process which uses the old file will fail dramatically, rather than subtly.
Note that there are no visible changes in the JGFF's appearance or functionality in this phase; only the internal structures have changed.
STATUS: Michael developed initial prototype of
the search page with dual tables, Oct 27 2004:
http://www.jewishgen.org/jgff/new/jgffweb.htm.
Phases 1-2-3 made LIVE on Jun 27 2005.
We will need an Admin Panel interface to manage the new JGFFTOWNS.DBF file. This can be the usual Admin Panel interface which we use for other DBF files.
However, because there will be about 20,000 towns in this table, for performance reasons, the initial BROWSE view should be filtered — probably by country.
In order to keep the town data clean, we should make all countries "restricted" countries, i.e. users can't add new localities which aren't already in the JGFF database (See JGFF To Do List, Item #27). Users will not be able to add a town which isn't already in the JGFFTOWNS.DBF table; the JGFF Editor needs to add all new towns, via this Admin Panel.
The JGFFTOWNS.DBF Admin Panel's "SHOW" page should contain some tools to help the town experts assign the proper USBGN Code, similar to the tools in ye olde "JGFF Town Cleanup" utility (jg~jgsys~jgffclean2):
The JGFF Editor will use the ADD function of the "JGFF Towns Admin Panel" for inserting new town names, replacing the current "!YES!" suffix method of forcing new town entries.
When a new town is added (or when a town's "Town Name" or "USBGN Code" field is modified), the "Num USBGN" field should be calculated from the ShtetlSeeker USBGN data, and saved in the JGFFTOWNS.DBF record. Also, all of that town's unique synonyms in the USBGN data should be added to the JGFFSYN.DBF table.
We would also like to enlist the aid of some "country experts"
for cleaning up and identifying JGFF entries, and assign the
USBGN Codes. Either we could figure out a way to give
them filtered access to the JGFFTOWNS.DBF Admin Panel, with
access to their country only, or have some form of
export/import via spreadsheet. For example, a spreadsheet
exported from JGFFTOWNS.DBF would contain columns for:
"JGFF Town ID#", "Town", "Country", "USBGN Code", "Num USBGN" and
"Notes" (the six standard columns of JGFFTOWNS.DBF); plus the
calculated "Number of JGFF entries". The "country expert"
would then fill in the missing entries in the "USBGN Code" column.
Town merges would have to be processed if they entered a
USBGN Code that matched one already in the table.
Phases 1 and 2 need to be released simultaneously. The sequence for the switchover procedure should be something like:
STATUS:
- All countries have been made "restricted"
as of Nov 1 2004.
- Prototype of Admin Panel created, April 2005:
https://data.jewishgen.org/wconnect/wc.dll?jg~jgsys~admin~&S=JGFFTOWNS&ADMIN=Y.
- Phases 1-2-3 made LIVE on Jun 27 2005.
We'll need a "JGFF Town Merge Utility", allowing the JGFF Editor (Alex Sharon) and other administrators to fix up the JGFF town data in the JGFFTOWNS.DBF table. This "Town Merge Utility" will allow us to correct all of the bad and erroneous JGFF Town data, which has been piling up for years. This tool will effectively be a "global placename converter", to use for the mass-conversion of all references to a town name in the database. For example, to change all instances of "Ungvar" in the database to "Uzhgorod"; or all instances of "Kishinev" to "Chisinau", etc. — and to then invalidate the removed name. (See former JGFF To Do List, Item #35).
Inputs: The required inputs are: Source town and country, and Destination town and country:
The "Town Merge Utility" should probably be accessible via the SHOW screen of the JGFFTOWNS.DBF Admin Panel, on the record for the Old Town. A button for "Merge / Eliminate this Town record" would go to page where the administrator would select the New Town, via two pulldown select menus — one for Town, one for Country. These pulldowns would be dynamically built from the existing records in the JGFFTOWNS.DBF table, similar to the pulldowns Yizkor Book Admin Panel's EDIT screens. The Country pulldown would default to the Country of the Old Town, and if changed, would update the Town pulldown, to contain the Towns within that Country.
Confirmation screen: This utility should take precautions, so that no un-doable damage could be done easily. Therefore it should have an intermediate "Are you sure?" page, echoing the input command, and listing the number of entries that would be affected by the requested change, before a "commit" button is pressed, i.e.
This will change all xxx instances of
OK? |
Action: The merge function will:
We need to avoid circular references, duplicates, and conflicts within the JGFF Synonyms Table. Therefore, when adding any new entry to the JGFFSYN.DBF table, we need to ensure that the target name (New JGFF Town ID#) isn't anywhere in the source "Old Town, Old Country" columns; and that the source name isn't already a target name. Also, that all existing references to a target are updated. Need to think this through carefully.
Restrictions / Constraints / Candidate Indexes for the JGFFSYN.DBF file:
A town record should not be deleted from the JGFFTOWNS.DBF file if these are still entries in the JGFFSYN table refering to it.
STATUS: Most if not all of the "Town Merge Utility"
can be replaced by revised utility programs (jgsys~syncyclic,
jgsys~synupdate, jgsys~synupdate~test) for the JGFFSYN.DBF table,
so this utility if probably not necessary.
Phases 1-2-3 made LIVE on Jun 27 2005.
Fill in the "USBGN Code" column in the JGFFTOWNS.DBF table.
First, Michael should programatically fill in as many USBGN Codes as possible, using exact-spelling unique matching against the new ShtetlSeeker data. Only those places which are completely unambiguous should be filled in using this programatic method. All others will be done manually.
This programmatic process should also put data into the "Num USBGN" field of the JGFFTOWNS.DBF table, regarding how many matching entries it found in the new ShtetlSeeker USBGN database. If there are no exact spelling matches in the USBGN data, it should write "0"; if there is exactly one match, it should write "1", etc. This information will be useful in later processes.
Second, use the SHTETLS.DBF table. This is our "ShtetlMaster" database, containing specific knowledge of Jewish-populated towns, which has been built up with time and experience, starting with our Yizkor Book bibliographic database. If there is more than one town with the same name in the USBGN (ShtetlSeeker) data, then assume that the one being referenced in the JGFF is the one in the SHTETLS.DBF table.
For example, there are six towns named "Ostrołęka, Poland" in the USBGN data. Based on our knowledge and experience, we will assume that the "Ostrołęka" identified in the SHTETLS.DBF table is the "correct" one (i.e. the largest, most heavily Jewish locality) being referred to by JGFF users. Fill in the "USBGN Code" column in the JGFFTOWNS.DBF table from the corresponding named entry in the SHTETLS.DBF table.
Third, the manual process — to be done using the "JGFF Towns Admin Panel" interface (see Phase 2).
The Admin Panel "BROWSE" screen for JGFFTOWNS.DBF contains a calculated display column containing number of entries in DATA.DBF for this town — similar to the current SHTETLS.DBF Admin Panel's SHOW screen. We are able to sort the data by this "# of entries" column numerically, so that we can attack the most popular towns first.
Number of unique JGFF Towns and Entries | ||
---|---|---|
# of towns with |
# of unique towns |
Total # of JGFF entries |
1,000+ entries | 43 (0.2%) | 140,000 ( 39%) |
100-999 entries | 400 ( 2%) | 107,000 ( 30%) |
50-99 entries | 500 ( 2%) | 32,000 ( 9%) |
25-49 entries | 700 ( 3%) | 25,000 ( 7%) |
10-24 entries | 1,500 ( 7%) | 21,000 ( 6%) |
5-9 entries | 2,100 ( 10%) | 12,000 ( 3%) |
2-4 entries | 5,000 ( 25%) | 14,000 ( 4%) |
1 entry | 10,000 ( 50%) | 10,000 ( 3%) |
TOTAL | 20,000 (100%) | 360,000 (100%) |
I believe that we can fill in the USBGN codes for 50% of all JGFF entries within a day (the top 100 towns comprise about 50% of all the JGFF entries!), and that we can fill in 75% of all JGFF entries within a week, by dealing with the largest towns (most entries) first. The remaining 25% of entries will take much longer — these are the smaller localities, with only a handful JGFF entries each (see chart at right, based on data as of 1/2005).
Our goal should be to first fill in the USBGN codes for all places in Eastern Europe with more than 100 JGFF entries each (about 500 towns, 250,000 total entries); then to fill in the codes for all places with 50-99 JGFF entries each (about 500 towns, 32,000 entries); then all places with 25-49 JGFF entries each (about 700 towns, 25,000 entries), then all places with 10-24 JGFF entries each (about 1,500 towns, 21,000 entries), etc. — working on down the list this way, from the most popular to least popular localities. Using this top-down approach, we should quickly be able to maximize coverage. Having 100% coverage is not a goal. We can probably achieve 90% (all towns with more than 10 JGFF entries each = 3,000 unique towns) with a few months of effort.
(Note that the true numbers will actually be less than the above, since we will only fill in the USBGN codes for towns in the 31 "ShtetlSeeker countries", which comprise about 2/3rds of all JGFF entries).
Note that there are still no visible changes to the JGFF's appearance or functionality at this phase; only the internal structures have changed, and administrative utilities have been prepared.
STATUS: The first two steps were completed in late June 2005 (no visible user impact).
Having the majority of the USBGN Codes filled in the JGFFTOWNS.DBF table (in Phase 4), a one-time process is then written to substitute the modern native name from the new ShtetlSeeker into the "Town" column of JGFFTOWNS.DBF, using the "USBGN Code" column as the key.
At the exact same time, an oldname-to-newname correspondence table should be built, using the names that were substituted. These old-to-new name pairs are added to the JGFFSYN.DBF town synonyms table. (Some existing entries in the JGFFSYN.DBF file may need to be updated/deleted, to avoid circular references and duplicates — see notes above).
For example, the city of "Grodno, Belarus" has the USBGN Feature Code Number -1943562. The modern native Belarusian name for this city is "Hrodna", as per the new ShtetlSeeker. In the JGFFTOWNS.DBF entry for "Grodno", the "Town" field is changed from "Grodno" to "Hrodna", based on a lookup of the USBGN Code field in the new ShtetlSeeker. In the JGFFSYN.DBF table, a row is added for oldname-to-newname "Grodno, Bel --> Hrodna, Bel". (But see notes above, regarding avoiding circular references within the JGFFSYN.DBF table).
Issue: Accented Characters. Many of the names in the new ShtetlSeeker have accented characters, e.g. Kraków, Oświęcim. How do we deal with these? I would hope that we can display them, and use the same algorithms that the JRI-Poland databases do for matching input unaccented characters to displayed accented characters. We'll need to build a larger table of accented characters, encompassing all of the required languages (German, Czech, Slovak, Hungarian, Polish, Lithuanian, Latvian, Estonian, Romanian, Bulgarian, Serbo-Croatian, and Turkish). We can phase this in, perhaps one language group at a time, at a later data.
This phase is the first visible change to end-users. A notice to all users should go out as we implement this phase, informing them of this action.
This phase should be coordinated with the public switchover to the new ShtetlSeeker, which should occur simultaneously.
Phase 6 (Use of town name synonyms in searches) should be implemented as quickly as possible following this phase, for the sake of continuity.
STATUS: Michael did a test run on June 22 2005.
449 town entries identified.
LIVE on July 1 2005.
Finally roll out the new and improved ShtetlSeeker, which was original coded in early 2000(!). See preliminary release notes at http://www.jewishgen.org/ShtetlSeeker/Admin/Announce.txt.
This should be done in conjunction with Phase 5, the switch-over of the town names in the JGFF from the from the circa-1990 USBGN / WOWW1 / Soviet-era Russian-language names, to the newer 1996 USBGN / WOWW2 / post-Soviet Ukrainian, Belarussian and Moldovan native language names. (See JGFF To Do List items #1, 1a, 1b).
A prototype of the new interface can be found at http://www.jewishgen.org/ShtetlSeeker/LocTown.asp.
Additional enhancements which could/should be made to the new ShtetlSeeker before its release:
Get updates/corrections which the USBGN has made to their gazetteers within the last five years. There are reportedly "20,000 changes per month" to their worldwide gazeteers. Most of the data files for Eastern European countries have been revised within the last two years. See http://earth-info.nga.mil/gns/html/cntry_files.html.
Try using the UTF-8 (universal) character set encoding, or HTML entities, instead of Windows-1250 (Central European) characters, in order to display the Turkish characters (e.g. the initial letter of "İstanbul", "İzmir", etc.), Baltic characters (e.g. the dotted-e in "Marijampolė", "Pumpėnai", etc.) and others that are not being displayed correctly under the current Windows-1250 encoding. The current GEOnet country files are in UTF-8 UNICODE format.
Tie to the "ShtetlMaster Locality Page".
If there is an entry in the SHTETLS.DBF table for this town (keyed by the "USGBN Code" column), then hyperlink the town name in the ShtetlSeeker results display to the town's "ShtetlMaster Locality Page" (see Phase 7b). This feature will assist people in finding the correct "Jewish" town among so many similarly-named towns (e.g. distinguishing the correct "Ostrołęka" among the several presented).
This could perhaps be indicated using a special "JewishGen" icon — a small purple "J" or tree 16x16 image to the left of the town name — a link indicating "JewishGen resources" for a locality.
Display the modern "administrative district" (i.e. province, oblast, etc.) for each locality.
See the USBGN website, at http://earth-info.nga.mil/gns/html/gns_faq/gns_faq.html. See the 10th question, where it asks about the values in the column "ADM1". Go to http://gnpswww.nga.mil/geonames/GNS/index.jsp, and click on the list for the "ADM1 Codes" Look-up Table. Using the information in this table, we could then display the name of the current administrative region for each locality, in a new column.
Unfortunately, it appears that the USBGN's data on ADM1 codes is far from complete for many Eastern European countries. So we will have to forget about this feature for now, and can perhaps re-visit it in the future, if and when the USBGN or other source can provide the necessary data.
Will need to change the JGFF instructions (E:\data\jgffphd\view.htm on DATA2) and JGFF-FAQ, to eliminate references to the ShtetlSeeker "(N)" Native name — in the new ShtetlSeeker, it's the Bold name that's the Native name.
Better integration between the two sections of ShtetlSeeker ("Town Name Search" and "Radius Search").
Link directly to the "Radius Search" from the "Town Name Search" results page, without having to manually type in the latitude/longitude. The current implementation is awkward, and makes a poor live demo.
Create a direct hyperlink from each town's row on the "Town Name Search" results page, to a "Radius Search" for that locality. Hyperlink using a tiny 16x16 "Radius Search" icon graphic at the end of each row — the graphic could perhaps depict concentric circles.
For the other "Radius Search" parameters, use reasonable defaults: use the same miles/km as selected for the town name search; and default the radius distance to something reasonable, such as 20 miles/km.
Need to change the "Radius Seach" results, to have the same display style and format as the "Town Name Search" results:
Update our internal "JGFF Synonym Database" jgffsyn2 program from WOWW1 (1991) to WOWW2 (2002) data, to use the new modern locality names. Gary Mokotoff had previously given us permission to use this data, for JewishGen internal purposes only. Some other improvements which could be made to the jgffsyn2 program are:
STATUS: Prototype created, June 2005:
http://www.jewishgen.org/ShtetlSeeker/LocTown.asp.
LIVE on July 1 2005.
See the Release Notes.
All items complete except (a), (c),
(d), and (h).
We begin to use the contents of the JGFFSYN.DBF table during JGFF Searches. (We currently use it only during JGFF data entry). This would most likely apply to exact-spelling-match town searches only.
Using this new approach, someone looking for "Grodno, Belarus" (the former Russian-language town name) would still find matching entries — for "Hrodna, Belarus" (the current Belarasian-language town name). We would report about the name substitution at the top of the JGFF's search results page, similar to the way that Google and Verity Ultraseek do, when you enter a misspelled term.
This synonym search feature will be very important, since most users are not familiar with the new Belarusian, Ukrainian and Moldavan native names — they only know the former Russian-language names, which are switching over to in Phase 5. Few users would ever think of searching for "Hrodna" for Grodno, or "Vawkavysk" for Volkovysk, etc. — some of these new native names don't have the same D-M Soundex code as the old name.
This feature will also be extremely beneficial for all users in general, since users frequently misspell town names, or use Yiddish or phonetic versions. Today, if someone searches the JGFF for "Kovno" or "Chenstahov", the JGFF returns "No matches found". By implementing the JGFFSYN.DBF lookup, the correct town will be located.
We could also allow this feature to be controllable via the user interface. There could be a checkbox on the JGFF Search Page, called "Use Town Name Synonyms", which would be checked by default, if the Search Type for Town was "Standard" (exact spelling); and unchecked for other Search Types (controllable via JavaScript).
Note that this would work best for "Exact Spelling" searches only — if applied to a D-M Soundex search, it would likely yield too many matches (probable false positives). We can experiment with this. As a future possible enhancement, when doing a D-M or Wildcard search, perhaps the user could be presented with a list of matching town entries from the JGFFSYN.DBF synonym database (with some basic info about each locality: e.g. the primary native name, its synonyms from SHTETLS.DBF, and the number of matching JGFF entries), and could then select from that hyperlinked list, to see the corresponding JGFF entries.
If the country field is "Any Country", I guess that the optimal solution would be to ignore the synonym table. But, if the straight JGFFTOWNS search yields zero matches, to then try again with synonyms. Go back and use the synonym table, find any matches based on TownName only, and re-do the JGFFTOWNS search. I think that this "If at first you don't succeed, try, try again" approach will work, for the fuzzy searches (soundex, wildcard, no country, etc.). The program would keep re-searching the JGFFTOWNS data, trying different usage of the synonyms database, until at least one hit is found.
STATUS: Live on Jun 28 2005 (hidden variable).
Applies only to exact-spelling matches.
LIVE on July 1 2005, with user-controllable checkbox.
Still need to work on Soundex, and StartsWith cases.
· 7a: TownInfo Program · 7b: ShtetlMaster Locality Page: · Nearby Jewish Communities |
In the JGFF Search Results display, create a hyperlink for all towns that are in any of the 31 "ShtetlSeeker countries", which leads to a web-page with more information about that locality.
When the user clicks on a town name in the JGFF Search Results, we invoke the "TownInfo Program".
The "TownInfo Program" uses the data in the JGFFTOWNS.DBF table to determine what to display.
The URL of the TownInfo Program would be of the form
The TownInfo Program will use the following algorithm:
|
The messages displayed by the TownInfo Program should all be stored in external text files, so that they are easily modifyable.
The TownInfo Program should be invoked from a number of different places — wherever a JGFF Town name appears:
For initial testing purposes, link only from the JGFFTOWNS Admin Panel. The others can come later, when initial development and testing are complete, and we go live.
Open Issues:
For those town names which have a USBGN code filled in
in the JGFFTOWNS.DBF table, the hyperlink goes to the
Other models for the ShtetlMaster Locality Page are the "Shtetls of Lithuania", "Shtetls of Belarus" sites created by Ed Rosenbaum — which are great prototypes, but have several drawbacks (see 2003 critique).
The URL of a ShtetlMaster Locality Page would be of the form
Each town's ShtetlMaster Locality Page will contain the basic information for that town — which can all be generated from the SHTETLS.DBF and associated tables: Modern town name; alternate and historical names for the town; latitude and longitude; the town's political jurisdictions during various time periods; names of the nearby Jewish communities; links to JewishGen resources for that town; links to JewishGen resources for the containing regions (SIGs, etc.). See details below.
Most of this data already exists in the SHTETLS.DBF table, or can be generated as is currently implemented in the SHTETLS.DBF Admin Panel SHOW page for a town.
All these are part of the overall ShtetlMaster plan, which can be implemented gradually. The initial "ShtetlMaster Locality Page" could even be a small popup dialog window, rather than a full-blown webpage, or come up in a separate window (like the FTJP's Family Display Page).
|
The names of the nearest Jewish Communities within the SHTETLS.DBF table. This concept is similar to the "Surounding shtetls" feature in the Shtetls of Lithuania pages. The list of towns could either be generated on-the-fly, or if that is too compute-intensive, could be generated for all towns periodically via a batch job and cached/stored somewhere. The set of communities in SHTETLS.DBF will change infrequently.
The town list should be displayed as a bulleted list, in nearest-distance order, with each hyperlinked town name followed by its distance/direction from the central town (similar to the ShtetlSeeker Radius Search results display). The list should be limited to communities in the SHTETLS.DBF file within 20 miles of the central town, or perhaps 10 towns, whichever value is smaller. An mock-up example for the town of Łódź is illustrated at the right.
The last item in the list should be the hyperlinked word "more", which would be a link to a generated longer list of the nearby Jewish communities — a page similar in form to the "LDS Microfilm Master" results display.
There are some locatities which have the same name, or the same town name synonym. For example, "Sokołów, Poland" can refer to either "Sokołów Podlaski" or "Sokołów Małopolski". These are two different towns, both commonly referred to as "Sokołów". In the current JGFF (Sept 2004), there are 104 hits for "Sokolow Podlaski", and 43 hits for "Sokolow Malopolski" — but the largest number, 166 hits, is for "Sokolow", which is ambiguous.
Technically, we should not allow these ambiguous references, since they don't refer to a specific place with a USBGN code. However, I feel that we must allow for it in some way, since sometimes people truly do not know which of the two is 'their' town — i.e. they know the town name only from their oral history or a passenger manifest, etc., and have not yet researched any further to determine what the specific town is.
When someone enters a town name which is known to be ambiguous, we should guide people into selecting one of the specific non-ambiguous towns, or let them select the ambiguous version, which will be explicitly labelled as such. This is the strategy which the Shoah Foundation uses when indexing its video testimonies: If it is unknown which town a reference refers to, it is listed as "generic", i.e. "Sokolow, (generic)".
The hyperlink to the "ShtetlMaster Locality Page" (see Phase 7) for a generic entry should explain this, and list the possible specific matches — with their lat/long, and other historical jurisdictional data, from the ShtelMaster database. We should develop an interface to allow users to select from the available options.
For towns with identical names within the same country, we will continue to use the same method that Where Once We Walked does: We add a parenthetical phrase, e.g. "(near Krakow)", or some other regional qualifier to the smaller localities, to differentiate them from the larger town of the same name. The larger "Jewish" town (i.e. the town with the largest pre-WWII Jewish population) will appear without a parenthetical qualifier. The use of parenthesis allows the search engine to continue to work properly — since it ignores all text within parenthesis. Behind the scenes, each will be assigned its appropriate USBGN Feature Code, so that when the locality name is clicked on (see Phase 7 above), more details about that locality are presented.
We also need a way to handle "dead" towns — towns which do not appear in the USBGN data. These towns — towns that no longer exist, or which have been absorbed into larger nearby towns — have no USBGN Feature Code Number. We need to come up with some consistent way of handling these.
Implementation: I suggest that these "generic" towns be implemented via an "Artificial USBGN Code Number" in the USBGN-Code field of the JGFFTOWNS.DBF table. These "Artifical USBGN Code Numbers" would be assigned by JewishGen, and would start with a letter (e.g. "A"), to ensure that they didn't overlap with any real USBGN Feature Code Number. In the SHTETLS.DBF table, the matching entry would have the same Artificial USBGN Code. Also, the SHTETLS.DBF's "Locality Type" field would have a new possible value: "Generic" (in addition to the current possible values of "Town" and "Region"). For a "Generic" entry, the ShtetlMaster Locality Page (SMLP) would explain the particular situation for this locality, and perhaps list the possible choices for the real town which this might represent.
Currently, the "Search Type" (Standard / Wildcard / D-M Soundex / Partial Text) applies to both the Surname and Town Name parameters. A very useful enhancement would be to allow the search type for Surname and TownName to be specified separately — this would provide for more refined searches.
On the search form, there would be two "Search Type" pulldowns — one underneath the "Surname" input box, and one underneath the "Town" input box.
While this might be a little more confusing for some users, it has the potential to be very useful. For example, someone might want to do a soundex search for a surname within a specific locality — actually, that should be the default setting: "Standard" (exact spelling match) for the TownName, and "D-M Soundex" for the Surname.
Having the surnames and towns in separate tables (Phase 1) will probably make the implementation of this much easier... as well as resolve some of the existing odd search bugs (see JGFF To Do List, Item #2a).
STATUS: Prototype created, June 29 2005.
LIVE on July 1 2005.
· Search Interface · Results Display · Browse Interface |
The JewishGen ShtetlSeeker is a great tool, but it has several drawbacks as a shtetl locator, namely:
For example, finding a place listed on a document as "Ruzin, Kiev gubernia" is currently a several step process, involving the knowledge and use of several resources, both online and offline. This is a non-trivial task for the inexperienced. The ShtetlMaster search function should make this task much easier.
One should be able to input a name (and any optional filters) and get back a table of matching localities. The results will be in a table, similar to the revised ShtetlSeeker results, but it will include only Jewish communities, together with columns containing more useful qualifying information, such as Jewish population figures, gubernia/district names, etc., which will make it much easier to identify which one is the correct location. The name of each locality will be hyperlinked to the ShtetlMaster Locality Page for that locality.
The search interface should be similar to that of the JewishGen ShtetlSeeker, but with some additional filters.
Some possible filters include:
The usual search criteria should be available: Exact Spelling, Daitch-Mokotoff Soundex, Wildcard, Starts With, etc.
The search engine would search all town name synonyms in the SHTETLS.DBF file, similar to the "Global Text / DM Soundex Search" button at the top right of the Shtetls Admin Panel Browse screen, and the Yizkor Book bibliographic database.
The localities should also be browsable — see Browse Interface, below.
The display columns in the results display might be:
Today: | 2000 Town, 2000 Country name. Hyperlink to the town's ShtetlMaster Locality Page. |
After WWII: | 1950 Town, 1950 Country. |
Between the wars: | 1930 Town, 1930 District, 1930 Province, 1930 Country. |
Before WWI: | 1900 Town, 1900 District, 1900 Province, 1900 Country. |
Other names: | List of all altername town names. |
Location: | Lat/Long, Distance/Direction from reference city, with hyperlink to MapQuest map (similar to ShtetlSeeker). |
# of JGFF entries: | Number of JGFF entries for this town. |
Clicking on a District or Province name (1900 or 1930) will bring up a search results display for all of the towns in that District or Province.
Clicking on the "# of JGFF entries" will bring up the usual JGFF exact-match hits for that town. Displaying this number will help the user know the relative size of the town.
The localities should also be browsable, as they are in the current "Shtetls of Lithuania" interface — only more so. Users should be able to view a geographically hierarchical "tree" arrangement of localities, i.e. display a table of all the localities within a particular gubernia, arranged by uyezd — perhaps in a "folding menu tree" design, similar to the JOWBR Cemetery Inventory.
(The "JOWBR Cemetery Inventory" is generated from the data in the JOWBR database, via the program https://data.jewishgen.org/wconnect/wc.dll?jg~jgsys~admin~&S=JOWBR&ADMIN=Y&Action=cemlist which is linked to from the words "Create Cemetery List Files" at the top of the JOWBR Admin Panel.)
The "ShtetlTree" will enable people to see where their shtetl was located at different periods in history, and what shtetls were nearby (i.e. in the same political district).
The tree view will also enable us to easily find errors, omissions and inconsistancies in our SHTETLS.DBF data.
There should be three different "ShtetlTree"s generated — for three different historical time periods:
The SHTETLS.DBF database contains the names and political jurisdictions for all three time periods. (It also contains data for a fourth time period, circa 1950, right after WWII — but I don't think that data for this period is particularly useful for Jewish genealogists). Here are the hierarchies for the three trees:
|
Each of these three trees should be generated to a separate file (similar to the two "JOWBR Cemetery Inventory" files — one for "live" data; one for "all" data).
The "folder level" lines (i.e. Country, Province, District) should display a count of the number of contained shtetls under it in the hierarchy, similar to the way that the "JOWBR Cemetery Inventory" displays the number of cemeteries.
The "leaf" level items (i.e. the shtetl level) should display the shtetl name.
Other items could be added to this line, such as the Contemporary Town and Country Name, in parenthesis — IFF different from the historical name. For example:
Here's a mock-up of a portion of the ShtetlTree for 1900, using sample data for Vilna Gubernia. Click on any of Vilna's seven districts to see the contained shtetls.
|
Clicking on a leaf level item (i.e. shtetl name) should take you to the "ShtetlMaster Locality Page" for that shtetl. For now, just generate a hyperlink to "g.asp?ID=shtetlID", just like the "JOWBR Cemetery Inventory" does — and we can link to the standard SHOW page for now, and then later modify the local "g.asp" file accordingly as needed.
Admin: In the Shtetl Admin Panel's BROWSE screen, there should be a link in the top menu-bar for "Generate ShtetlTree Files", similar to the JOWBR Admin Panel's "Create Cemetery List Files". The list of localities in SHTETLS.DBF should become stable over time, and will thus need to be re-generated infrequently.
Once the majority of towns in the JGFF database are tagged with their USBGN Feature Codes (which map to latitude/longitudes), we could then implement a feature that would do something like "Show me all LEVINs within 50 miles of Warsaw".
Other possibilities might be the capability to find all matches within a particular district, gubernia, or other geo-political jurisdiction — after that jurisdictional data is added to the SHTETLS.DBF table.
Restrictions:
We would have to limit the radius distance for "town only"
searches, to avoid returning too much data (i.e. "show me
everyone researching any family within 999 miles of Warsaw"
should not be allowed), to prevent data-mining.
This feature could be restricted to be available only to donors to JewishGen, as one of the "value-added services". This could be implemeneted by using ASP — converting the search page "/jgff/jgffweb.htm" to be "/jgff/jgffweb.asp". The "proximity search" inputs would be greyed-out for non-donors, as an incentive/teaser to encourage them to donate.