|
![]() |
|
****JavaScript based drop down DHTML menu generated by NavStudio. (OpenCube Inc. - http://www.opencube.com)****
|
|
Here are the guidelines for the data to be submitted to a JewishGen searchable database, as per the standard Contributing Databases to JewishGen procedures.
Data sent to the JewishGen database managers should be in a database or spreadsheet format. dBase (DBF) format is preferred, but just about any standard database or spreadsheet format (rows of records and columns of fields) is acceptable: Microsoft Access, Microsoft Excel (any version), Lotus 1-2-3, Borland/Corel Paradox, or Quattro Pro spreadsheets, etc.
Word processor files are more difficult to work with than spreadsheet or database file, but may be acceptable if the data is in a regular format (one record per line, with each field separated by commas, or tabs, or otherwise delimited).
In all cases, please be sure that each field in your database is clearly labelled, and that a full database description is provided, using the guidelines.
The manager of each transcription project should create a data entry template to contain the transcribed data. The template design and data entry instructions should be reviewed by JewishGen before proceeding with data entry. The template may evolve over time, as you gain experience with transcribing the original records.
Templates for certain standard types of records may be found at http://www.jewishgen.org/databases/templates.
Surnames:
Town Names:
Dates:
Sparse columns: Columns which rarely contain data should be avoided, because they can take up considerable horizontal space when displaying search results. The more columns a spreadsheet has, the more difficult it is to display to the data meaningfully. Try to have as few columns as is reasonably possible. Consider combining several sparse columns into a single more generic "Comments" or "Notes" column.
Source Indicator: Every record (i.e. each row) should have some type of source information — column(s) containing an identifier by which a researcher using the database can independently find this record in the original source: A page number, a record number, a line number, etc., or any necessary combination thereof.
All data should be transcribed as faithfully as possible to the original source document, with as little interpretation as possible. Interpretation is the job of the researcher using the resulting database, not the job of the transcriber. The data transcriber should write only what is in the original source document.
If the transcriber or editor of the database wishes to add conjectures, interpretation, or editorial comments, these all should be made within square brackets ("[]"), to indicate that these comments are not part of the original source. (See Section II.4, below).
Missing Data: If a data item is missing in the original source, indicate this with a dash or hyphen ("-") character, rather than leaving a blank field or using any other indicator.
Illegible Data: If a data item is illegible or questionable in the original record, transcribe as much as you can, and use the following indicators:
Ditto fields: Data which is the same as the previous row must be filled in; you can not leave any cell blank — because when the data is sorted by a different criteria, the context is lost. For example:
| Incorrect | Correct | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Conjectural Information:
Should always be indicated within square brackets ("[ ]").
Conjectural information is information which is not in the
original source document, but has been conjectured by a
database transcriber or editor.
For example, a conjectural surname for a record which has no surname, but for which the surname has been deduced from other sources, should appear as "[EPSZTEIN]". Other uses of conjectural data are the expansion of abbreviations, and corrections to items misspelled in the original source (also see Section I.2.b above). Any other editorial comments and explanations should also appear within square brackets, to indicate that those items are not in the original record.
Prohibited Characters:
Avoid the use of the double-quote character (").
The inclusion of double-quote characters causes problems with our
internal data conversion routines (the procedures which convert
data from Excel to dBase format).
Use single quote characters (') instead.
Maximum Field Size: The maximum size of any field is 254 characters.
Some sources, such as Census Records, Czarist Revision Lists, etc., group people together into households or families. When transcribing data like this, each person in the data should still have their own record (their own row in the spreadsheet), but we can also group the family/household together in the database's results display, if a "Glue" field in used in the spreadsheet to group rows together.
For example, here's an input spreadsheet containing the two family groups:
| Family # | Surname | Forename | Patronymic | Age | Relation | Birthplace | Gubernia | District | Town | Address | Fond # |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 4118 | LEWIN | Haim | Mowscha | 40 | head | Jekapils | Vitebsk | Dvinsk | Rezekne | Soldatskaya 12-3 | 2706-1-156 |
| 4118 | LEWIN | Rocha | Shmuel | 38 | wife | Jekapils | Vitebsk | Dvinsk | Rezekne | Soldatskaya 12-3 | 2706-1-156 |
| 4118 | GLEBERMAN | Pesia | Haim | 21 | daughter | Ludza | Vitebsk | Dvinsk | Rezekne | Soldatskaya 12-3 | 2706-1-156 |
| 4119 | DORFMANN | Simon | Itzik | 28 | head | Rezekne | Vitebsk | Dvinsk | Rezekne | Ludzenskaya 45-6 | 2706-1-156 |
| 4119 | DORFMANN | Esther | Abram | 25 | wife | Rezekne | Vitebsk | Dvinsk | Rezekne | Ludzenskaya 45-6 | 2706-1-156 |
| 4119 | DORFMANN | Gita | Mowscha | 50 | mother | Rezekne | Vitebsk | Dvinsk | Rezekne | Ludzenskaya 45-6 | 2706-1-156 |
| 4119 | KAGANSKI | Hana | Mowscha | 60 | aunt | Rezekne | Vitebsk | Dvinsk | Rezekne | Ludzenskaya 45-6 | 2706-1-156 |
| 4119 | LEWIN | Malka Sura | Rachmiel | 30 | cousin | Ludza | Vitebsk | Dvinsk | Rezekne | Ludzenskaya 45-6 | 2706-1-156 |
Could be displayed as:
| Town District Gubernia |
Surname, Forename | Patronymic | Age | Relation | Birthplace | Address Fond # |
|---|---|---|---|---|---|---|
|
Rezekne Dvinsk Vitebsk |
LEWIN, Haim | Mowscha | 40 | head | Jekapils |
Soldatskaya 12-3 2706-1-156 |
| LEWIN, Rocha | Shmuel | 38 | wife | Jekapils | ||
| GLEBERMAN, Pesia | Haim | 21 | daughter | Ludza | ||
|
| ||||||
|
Rezekne Dvinsk Vitebsk |
DORFMANN, Simon | Itzik | 28 | head | Rezekne |
Ludzenskaya 45-6 2706-1-156 |
| DORFMANN, Esther | Abram | 25 | wife | Rezekne | ||
| DORFMANN, Gita | Mowscha | 50 | mother | Rezekne | ||
| KAGANSKI, Hana | Mowscha | 60 | aunt | Rezekne | ||
| LEWIN, Malka Sura | Rachmiel | 30 | cousin | Ludza | ||
Here we are using the "Family #" column as the "glue" field, to glue all members of the household together, for a more attractive and meaningful display of the data.
Note how the common fields (data common to every member of the household/family) are "banded" together, in the yellow row-spanning fields on the left and right. This redundant data is displayed only once per family/household, in a vertically "stacked" fashion, saving considerable display space.
The "glue" field is also needed to ensure that the entire family group is presented together, when only one member of the family matches the search criteria. The entire family group (i.e. all rows with the same "glue" field) is displayed if only one member of a family matches the search criteria.
For example, the above display would result from a search for the surname "LEWIN" — When only one member of a family has the surname "LEWIN", the entire family group is displayed, because the "glue" field keeps the entire family together.
The simplest use of a "glue" field is in a marriage record — to tie the bride and groom together. If the groom and bride are each entered in their own row in the spreadsheet, the use of a "glue" field will ensure that both rows are displayed when a user searches for either one of the parties' surnames.
Also note that the "glue" field is not necessarily a displayed field. (In the example above, the "Family #" is not displayed in the search results screen). The "glue" field can be a hidden column, which is not displayed in the search results — this column is used only for the internal purpose of creating the database indexes.
JewishGen has established no universal transliteration standards for data written in non-Latin alphabets (i.e. Hebrew, Cyrillic alphabets) since each database is different, and there are so many languages, alphabets, dialects, and regional variants across the wide scope of Jewish genealogical data. Each database is free to use their own transliteration methods, as long as they are reasonable. The introductory remarks for each database should indicate or explain which transliteration method has been used for that database.
Here are some general ideas and guidelines:
Reflect the original: The transliteration should reflect the original document, to the degree possible. Names should not be 'standardized'; they should be entered exactly as written on the original document. For example: 'Movsha', 'Moishe', etc., should not become 'Moshe'; and should certainly never be 'translated' or 'transformed' to 'Moses'.
Pronunciation should reflect local use, e.g. distinctions between Litvak and Galitzianer pronunciations can be retained.
Soundex: Since Daitch-Mokotoff Soundex searching will find most evident name variations, we needn't worry excessively about standard transliteration of Cyrillic-to-English, Yiddish-to-English, or Hebrew-to-English.
Cyrillic: Transliteration from Cyrillic to Latin characters should reflect the local language, if that local language uses the Latin alphabet. For example, civil records in the Kingdom of Poland (Congress Poland) after 1868 were written in Cyrillic, and should be transliterated into Polish spelling rather than English spelling (as JRI-Poland does). Where the local language does not use the Latin alphabet (e.g. Belarus, Ukraine), Cyrillic should be transliterated into English phonetics.
If your original source data is in Russian (Cyrillic alphabet), you may do your data entry directly in Cyrillic, if you are more comfortable in that language, and have the appropriate keyboard. We have Excel macros that can transliterate data in Cyrillic into the Latin alphabet.
Retain the Original: If possible, data in Latin characters should be transcribed in the original language (i.e., leave occupations written in German in German), rather than translated; and then provide a separate table of translations. It is always best to keep the transcript as close to the original as possible, without any interpretation — and let the end-users of the database do that interpretation.
As mentioned above in sections I.1.c and I.2.c, certain datasets might want to make use of the special hidden columns called "Other Surnames" and/or "Other Towns". These columns are needed when there are surnames or town names embedded within the text of other columns, and you wish those items to be fully searchable.
For example: if you have a column entitled "Survived by" which
contains "his daughter Mollie SMITH, and his brother Robert BERNSTEIN",
and you want the surnames SMITH and BERNSTEIN to be searchable as
surnames, then you will need to copy those names into a separate
column, called "Other Surnames". In this case, the "Other Surnames"
column should contain
Another example: if you have a "Comments" columns which contains
miscellaneous information, such as "Father was born in Minsk, is
currently residing in Pinsk, and working in Linsk", and you want
the town names Minsk, Pinsk and Linsk to be searchable as town names,
then you will need to copy those town names into a separate
column, called "Other Towns". In this case, the "Other Towns"
column should contain
The sole purpose of the hidden "Other Surnames" and "Other Towns" columns is for database indexing only — so that the database search engine knows that a particular word is a surname or is a town name, and thus can locate it when doing a Soundex search. These columns will not be displayed in the search results.
When a surname or town name is buried within a larger text field (such as a "Comments" field), the database search engine doesn't know that that particular word is a surname or town name. Copying these words into an "Other Surnames" or "Other Towns" column makes this association explicit. While a search for "BERNSTEIN" using a global text search would find a record with the word "BERNSTEIN" anywhere within any column, a Soundex search would not. So if a Soundex search for the surname "BURNSTINE" is done, it wouldn't find "BERNSTEIN" within the context of the "Comments" field. To enable its Soundex searchability, the word "BERNSTEIN" needs to be copied into an "Other Surnames" column.
The database creator/editor should copy all surnames and town names contained within the text of these other fields into a separate "Other Surnames" or "Other Towns" column. This action allows those words to be identifyable and fully searchable as surnames and/or town names, respectively.
[Note that if a particular town name is already in another indexed Town column, then you don't really need to copy it into the "Other Towns" column — although it doesn't hurt, it's simply redundant. For example, if you have a "Town of Birth" column which contains "Minsk", and you also have a "Comments" column that contains the words "Father is a resident of Minsk", then in this instance you really don't need to copy "Minsk" into the "Other Towns" column, because this record already contains "Minsk" in the searchable "Town of Birth" column — a search for "Minsk" would already find this record. However, it does no harm to have "Minsk" in the "Other Towns" column in this instance.]
There should never be anything in the "Other Surnames" or "Other Towns" columns which doesn't also appear somewhere else in the row. The "Other Surnames" and "Other Towns" columns are hidden columns, which are not displayed in the search results — these columns are used for only for the purpose of creating database indexes.
|
|
Warren Blatt, Last Revised Jan 13, 2006.