Basically, we need to determine a path from point X to point Y. And how that person reached point Y, always beginning at X. Anytime a person reaches a point, an event is logged that he reached that point. Sample data could be a simple json. And the returned transaction would be a list of these events.
{ "location": "A", "id":35, "time": 1454532214 }
{ "location": "B", "id":35, "time": 1454532215 }
{ "location": "C", "id":35, "time": 1454532216 }
{ "location": "B", "id":35, "time": 1454532217 }
{ "location": "C", "id":35, "time": 1454532218 }
{ "location": "D", "id":35, "time": 1454532219 }
{ "location": "C", "id":35, "time": 1454532220 }
{ "location": "D", "id":35, "time": 1454532221 }
{ "location": "E", "id":35, "time": 1454532222 }
{ "location": "A", "id":35, "time": 1454532223 }
I am thinking that once I get the transaction that is separated by say "A", then I could use regex to grab the transactions that would fit my criteria, say "A*B*C*D" where an "asterisk" represents possible loops of any OTHER location than the location following the "asterisk" (which would be NOT "B" for A*B).
The final outcome would be to find:
1) The count of the number of transactions that match the criteria. ie A*B*C*D
2) Compare that transaction count to the previous supersets to determine how many people never reached then next location.
a) A* - count 500
b) A*B - count 490 (10 persons never reached B)
c) A*B*C -count 300 (190 persons never reached C)
d) A*B*C*D - count 130 (170 persons never reached D)
3) List the most likely transactions that match the criteria and group them. ie A*B*C*D returned the following possible transactions. There will likely be more than 6 transaction types but for brevity...
a) ABCD - count 10
b) AXBCXD - count 16
c) AYXBXYCXYXYD - count 49 - this would be something I would like to know as the most favorite path.
d) ABYXCYXYXYXYXYD - count 3
e) AXYXYBCYXYXD - count 22
f) ABCXYXYXYXYXD - count 30 - this would be something I would like to know as the 2nd most favorite path.
... View more