Pseudo Table

Read(1103) Label: pseudo table,

 

We can use data in a composite table by not only accessing it directly but by defining a pseudo table. A pseudo table is not a physical table but one that retrieves data from an existing composite table according to the current definition. Now pseudo tables are supported in esProc Enterprise edition only.

11.7.1 Basic uses

For the convenience of illustration, we first generate three composite tables. The employee table consisting of fields of employee ID, department ID, gender, marital status and name, among which gender and marital status are stored together in one field named Bools. Both of the other two tables store orders in fields of salespeople ID, order code, datetime and amount. Below is the code for generating the composite tables:

 

A

B

1

=demo.query("select NAME, SURNAME, GENDER, STATE from employee")

 

2

=A1.select(GENDER=="M")

=A1\A2

3

=A2.(NAME).id()

=A3.len()

4

=B2.(NAME).id()

=A4.len()

5

=A1.(SURNAME).id()

=A5.len()

6

=A1.(STATE).id()

=A6.len()

7

[Sales,Technology,R&D,Financial,Admin]

[0,0.5,0.75,0.9,0.97,1]

8

=to(1000).new(#:EID, B7.pseg(rand()):DeptID, if(rand()<0.5,0,1):Gender, if(rand()>0.8,1,0):Married, if(Gender==0,A3(rand(B3)+1), A4(rand(B4)+1))/" "/A5(rand(B5)+1):Name, bits(Gender, Married):Bools )

 

9

=A8.select(DeptID==1)

=A9.len()

10

2020-01-01

2021-01-01

11

=periods@x(A10,B10)

=periods@x(B10,elapse@y(B10,1))

12

=A11.((a=string(~,"yyMMdd"), to(rand(100)+ 10).new(A9 if(rand()>0.9, (rand(B9)+1, rand(B9-20)+ 21)).EID:SID, a/string(#, "0000"):OID, datetime(A11.~, time(rand(8)+8, rand(60), 0)):OTime, rand(100)*10+200:Amount))).conj().sort(SID, OID)

 

13

=B11.((a=string(~,"yyMMdd"), to(rand(100)+ 10).new(A9 if(rand()>0.9, (rand(B9)+1, rand(B9-20)+ 21)).EID:SID, a/string(#, "0000"):OID, datetime(B11.~, time(rand(8)+8, rand(60), 0)):OTime, rand(100)*10+200:Amount))).conj().sort(SID, OID)

 

14

=file("D:/file/pseudo/emps.ctx")

=A14.create(#EID,DeptID,Gender, Married, Name, Bools)

15

 

>B14.append@i(A8.cursor())

16

=file("D:/file/pseudo/1.orders.ctx")

=file("D:/file/pseudo/2.orders.ctx")

17

=A16.create(SID, #OID, OTime, Amount)

>A17.append(A12.cursor())

18

=B16.create(SID, #OID, OTime, Amount)

>A18.append(A13.cursor())

19

>B14.close()

 

20

>A17.close()

>A18.close()

The given data is of small size. In Gender field, 0 represents male and 1 represents female; and in Married field, 0 means unmarried and 1 means married. A8 generates employee data as follows:

Below are orders data tables of years 2020 and 2021 A12 and A13 generate respectively:

The three tables are stored in three composite table files – emps.ctx, 1.orders.ctx and 2.orders.ctx – respectively in the pseudo path. The last two composite table files form a multi-zone composite table and we’ll take them as examples to look at how to define and use pseudo tables. Now we backup them in case data is modified later during the testing.

 

A

B

C

1

=create(file).record(["pseudo/emps.ctx"])

=pseudo(A1)

=B1.cursor().fetch@x(100)

2

=create(file, zone).record(["pseudo/orders.ctx", [1,2]])

=pseudo(A2)

=B2.import()

A pseudo table definition record is a table sequence’s record of specified structure. Its simplest form should include a file field to specify data source of the pseudo table. In the above code, A1 generates a pseudo table definition record as follows:

The source of data in a pseudo table should be a composite table whose extension is ctx.

Based on the above record, we use pseudo(pd) function to officially define a pseudo table. Parameter pd is the pseudo table definition record. B1 generates a pseudo table definition:

Like a composite table file, a pseudo table definition feeds data using T.cursor() function that generates a cursor or T.import() function that retrieves all data. C1 retreives the first 100 records using the cursor:

A pseudo table can also be generated from a multi-zone composite table file. A2 in the above code generates a pseudo table definition as follows:

C2 retrieves data from this pseudo table:

Data is retrieved from the multi-zone composite table file in the original order.

We can use delete/update/append functions to delete, update or add one or more record from/in/to a pseudo table. The action will directly entail a corresponding change in the source composite table file. For example:

 

A

B

1

=create(file).record(["pseudo/emps.ctx"])

=pseudo(A1)

2

=B1.select(DeptID==2 && EID<10)

=A2.import()

3

>B2.run(DeptID=0)

=B1.update(B2)

4

=B1.import()

=A2.import()

5

>B2.run(DeptID=2, EID=EID+1000)

=B1.update(B2)

6

=B1.import()

 

Like using a composite table, we can select certain records from a pseudo table through the T.select() function. In the above code, A2 selects employee records where DeptID is 2 and EID is less than 10, and B2 retrieves them using the T.import() function, as shown below:

A3 resets values of DeptID field of the selected employee records as 0, and B3 updates these modified records stored in record sequence P to the pseudo table using T.update(P) function. Keep in mind that, when you use update function to update a pseudo table, you must first define a dimension for the corresponding composite table, because all these operations are performed based on the composite table’s dimension. With the modification, A4 and B4 get the following results:

We can see that corresponding records are modified. Now as no records can meet the filtering conditions specified in A2, the record sequence returned by B4 is empty.

 

A5 changes DeptID values of B2’s two records to their original values and adds 1000 to each EID value. B5 uses T.update() function to update the change to the pseudo table. Below is A6’s result:

If primary key values in the records are updated, dimension of the composite table containing these records cannot remain ordered and the pseudo table cannot be updated as expected.

We can also generate a pseudo table from one or more bin files using the same way of generating one from composite table. For example:

 

A

B

1

=create(file).record(["D:/files/txt/PersonnelInfo.btx "])

=pseudo(A1)

2

=B1.cursor().fetch@x(100)

 

3

=B1.select(City=="Columbus" && ID<500)

=A3.import()

A2 uses the cursor to access B1’s pseudo table and returns the following result:

A3 selects employees coming from Columbus city and with IDs less than 500. B3 imports records from A3 and returns result as follows:

11.7.2 Pseudo table merge

Usually, data is stored and used by user in a multi-zone composite table file, such as the orders records stored by salespeople in the previous section., and will be grouped by the first field. For example:

 

A

1

=create(file,zone).record(["pseudo/orders.ctx", [1,2]])

2

=pseudo(A1)

3

=A2.import()

4

=A2.group(year(OTime)).import()

5

=A4.new(SID,year(OTime):Year,sum(Amount):Total)

6

=A2.groups(SID, year(OTime):Year;sum(Amount):Total)

A1 generates a pseudo table definition record:

And data is retrieved from the multi-zone composite table file by zone tables in order. A3 imports data from the pseudo table as follows, which has been displayed in the previous section:

 

A4 groups the pseudo table by year:

Data is grouped by the specified field or expression. Summing amounts in A5 based on the grouped pseudo table can make this easy to understand, and below is the result:

We can also use T.groups() function to perform the above grouping and sum operation. Note that this method groups data by two layers of fields – SID and order year. Here’s A6’s result:

In a multi-zone composite table, data is stored in zone tables generally by a time type field. Each zone table contains data of a certain time range. Such a time field is called zone field or zone column, which can be set using a date field. Zone tables on the whole – rather than data in each separate zone table – need to be ordered by the date field when the multi-zone composite table file is in use. The interval of date field values will be recorded for each zone table at creation of the pseudo table and used to automatically locate the corresponding zone table file for a query or filter operation. For example:

 

A

1

=create(file,zone,date).record(["pseudo/orders.ctx", [1,2],"OTime"])

2

=pseudo(A1)

3

=A2.import()

4

=A2.group(year(OTime)).import()

5

=A4.new(SID,year(OTime):Year,sum(Amount):Total)

6

=A2.select(OTime>date(2020,12,20) && OTime<date(2021,1,10))

7

=A6.import()

A1 generates a pseudo table definition record:

Once date field OTime is defined, the pseudo table will definitely identify that data is stored in zone tables by this zone field. Data retrieved from different zone tables of the pseudo table will thus be merged together automatically by the dimension field and returned. Below is A3’s result of importing data from the pseudo table:

In this result table, records of same salesperson in different years that originally stored in different zone tables are now arranged together.

The first field of data stored in a pseudo table, especially one generated from a multi-zone composite table, is usually seller, customer, account code and so on. When a pseudo table is generated based on data of a multi-zone composite table and the zone field is already defined, a merge by the first field will be performed for a query.

A4 groups data by order year and gets the following result:

As zone field is defined, the grouping operation on the pseudo table groups data first by the first field and then by the specified field/expression. A5 sums sales amounts according to the grouped pseudo table and gets the following result:

A6 queries records of orders whose dates belong to a time period between 2020-12-20 and 2021-01-10. A7 gets the desired records:

Data retrieved from zone tables will be merged by the first field if it is already filtered by zone field. A filtering by zone field can automatically find out the target zone tables from which data will be retrieved and merge data by the first field.

As can be seen from the previous example, data is retrieved from zone tables of the multi-zone composite table by the first field segment by segment and then merged to get result. But at times segmenting data by the first field leaves unevenly distributed data or generates groups containing relatively small data. In the previous example, the number of order records of the first 20 salespeople is small and that of the other salespeople is not large. In such cases, the merge operation is likely to frequently switch between zone tables for retrieving data because each segment contains relatively small data or there are too many segments, resulting in low efficiency of retrieving data from each zone table. In order to improve the situation, we can restructure data by setting “larger” groups on the first field in an effort to get higher efficiency. For example:

 

A

B

1

=demo.query("select distinct(NAME) from CITIES")

=A1.(NAME)

2

=create(file,zone).record(["pseudo/orders.ctx", [1]])

=pseudo(A2).import()

3

=create(file,zone).record(["pseudo/orders.ctx", [2]])

=pseudo(A3).import()

4

=B1.len()

=1000.(B1(rand(A4)+1))

5

=B2.new(B4(SID):City, SID, OID, OTime, Amount)

=B3.new(B4(SID):City, SID, OID, OTime, Amount)

6

=A5.sort(City)

=B5.sort(City)

7

=file("pseudo/1.orders2.ctx")

=file("pseudo/2.orders2.ctx")

8

=A7.create(City, SID, #OID, OTime, Amount)

>A8.append(A6.cursor())

9

=B7.create(City, SID, #OID, OTime, Amount)

>A9.append(B6.cursor())

The original order data does not contain city data (City), but the above testing data adds City information for the salespeople, sorts data by City, SID and OID in order and stores data in multi-zone composite table orders2.ctx. In real-world situations, a first field by which data will be first grouped can contain cities people come from, departments employees work in, transaction type codes, banks via which orders are paid, etc. Data should be arranged by the first field.

Now we perform a query using pseudo table orders2.ctx:

 

A

B

1

=create(file,zone, date).record(["pseudo/orders2.ctx", [1,2], "OTime"])

=pseudo(A1).import()

A1’s pseudo table gets data from the multi-zone composite table and merges them by the first field. Here’s B1’s result:

The retrieved data is merged by the multi-zone composite table’s first field City, during which records of two years of same city will be put together but won’t be sorted by seller ID.

User-defined fields

Apart from using fields of the original data tables, we can define fields through adding column field in the pseudo table definition record. For example:

 

A

1

[Sales,Technology,R&D,Financial,Admin]

2

=create(name,enum,list).record(["DeptID","Dept",A1])

3

=create(name,bits).record(["Bools",["IfMarried","IfLady"]])

4

=create(file,column).record(["pseudo/emps.ctx",A2|A3])

5

=pseudo(A4)

6

=A5.select(Dept=="Sales").import()

7

=A5.import(EID, Name, DeptID, Dept)

8

=A5.select(!IfLady && IfMarried).import()

9

=A5.import(EID, Name, Gender, IfLady,Married,IfMarried)

Two fields are defined in the above code. A2 defines a Dept field:

In the column definition record, name is a field name. When it is a field in the original data table, it is a real field. It also can be a pseudo field generated from a real field. If values of the real field DeptID need to be transformed into corresponding department names, we call such a pseudo field enumerated pseudo field. In the column definition record again, we define enumerated pseudo field name through the enum field. The transformation converts department IDs into corresponding values in a specified sequence, which is configured under list field, by matching IDs to values of the sequence. The matching begins from place 1.

A3 defines another field. In the original data table, Bools field stores data of two binary fields – Gender and Married. Below is A3’s column definition record:

The binary field that corresponds to each bit will be retrieved according to the definition. The pseudo fields under Bools field are called binary dimension pseudo fields. Their names are defined through bits field in the column definition record. Pseudo field names cannot be same as original fields, we use new names IfMarried and IfLady. The definition arranges fields under Bools from low order to high order. Values of each field are bool value true or false. A binary dimension pseudo field can store 32 binary fields at most.

A4 defines a pseudo table by setting data source (file field) and column definitions (column field):

A5 generates a pseudo table according to A4’s definition. The previously defined column field will play its part for getting data for the pseudo table. A6 selects employees in Sales department using the enumerated pseudo field Dept:

Here T.import() function retrieves records from the pseudo table and returns only the original fields by excluding the pseudo fields. Yet the selection according to the pseudo field is successful as the DeptID for all employees is 1.

In order to return a pseudo field, we need to specify the desired fields in T.import() function. A7, for instance, returns the following query result:

The result set shows clearly the correspondence between values of the enumerated pseudo field and those of the corresponding real field.

A8 performs the filtering operation to retrieve records of married male employees using the binary dimension pseudo field and return the following result:

A query on binary dimension pseudo field needs to specify the column names. Below is A9’s result:

We can also use update function to update a pseudo table using user-defined fields:

 

A

1

[Sales,Technology,R&D,Financial,Admin]

2

=create(name,enum,list).record(["DeptID","Dept",A1])

3

=create(name,bits).record(["Bools",["IfMarried","IfLady"]])

4

=create(file,column).record(["pseudo/emps.ctx",A2|A3])

5

=pseudo(A4)

6

=A5.select(EID<8)

7

=A6.select(Dept=="Sales").import(EID,Name,Dept,IfMarried,IfLady)

8

>A7.run(Dept="Technology", IfLady=!IfLady)

9

>A5.update(A7)

10

=A5.import(EID,Name,Dept,IfMarried,IfLady)

11

=A5.import()

A5’s pseudo table uses an enumerated pseudo field and a binary dimension pseudo field. A7 selects employee records where EID is less than 8 and gets the following result:

A8 modifies their departments and genders, as well as executes the update. A10 retrieves records from the updated pseudo table. Note that Dept field and IfLady field of the three records have been changed after update in the following A10’s result:

We can check in A11 the new data after modification in the composite table file the pseudo table is using:

The update automatically calculates the actual data according to the current values when trying to change values in a pseudo table’s pseudo field, and updates the data to the corresponding composite table. As the update action is based on Bools instead of Gender and Married fields used for look-up, their values become null after the update.

 

At times a field in the original data table associates with records of another table. The field is thus called foreign key column. Besides enumerated pseudo field, you can use a regular pseudo field in the original data table, for instance:

 

A

1

=create(name,exp).record(["OYear","year(OTime)"])

2

=create(file,zone,column).record(["pseudo/orders.ctx",[1],A1])

3

=pseudo(A2)

4

=A3.import(SID, OTime, OYear, Amount)

A4 imports data from A3’s pseudo table. In the following result, pseudo field OYear is displayed:

 

A1 adds a user-defined field:

The field name OYear does not exist in the multi-zone composite table file. It is an ordinary pseudo field. We can define its expression under exp field in the column definition record to calculate its values from a real field.

A2 generates a pseudo table definition record:

 

A pseudo table can be generated from in-memory table sequence, in-memory table and cluster in-memory table, too. With these source tables, definition of the pseudo table structure uses a var field to specify name of the variable stored in the in-memory table rather than using the file field to specify the source file name. For example:

 

A

1

>tab=demo.query("select * from CITIES")

2

=create(var).record(["tab"])

3

=pseudo(A2)

4

=A3.select(STATEID==3)

5

=A4.import()

A1 retrieves CITIES table from the database and stores the result table sequence in variable tab. A2 defines a pseudo table using an in-memory table. A4 gets information of cities whose STATEID is 3. A5 imports the selected data as follows:

A pseudo table defined based on an in-memory table has same uses as any other pseudo tables.